Python Exif Utilities
Someone might find these Exif-related python utilities handy. I wanted a way to embed comments in jpg files and a way to extract those comments. The idea being that I could display the jpg and an associated caption on a web page without having to store any external caption in a separate file or a database. It would all be in one neat package. The Exif specification for digital camera files describes quite well the facilities for storing "metadata" in the image. You can add info about the "artist", a copyright and any kind of freeform "image description" you'd like, if you have the tools. Sounds nice. I wanted something in Python, at least for the extraction of that metadata, since my websites are all written in Python and I had in mind a dynamic photo display page that would pull the comment info out on the fly. So I set about to write something myself.
My initial thought was to of course find something already written. There are a couple of Python Exif modules out there, most notably exifdump.py, which appears to be the earliest written (followed by this update by another author, exif.py). These proved helpful, especially in understanding how to parse jpg files and deal with binary data in Python, but did not do what I needed. They simply dump out Exif info for a jpg; I wanted to modify Exif info also. So I started writing my own. The Exif spec is pretty clear, and the file layout is understandable, but I soon found that the bit twiddling was getting more and more hairy as I got further into it. When inserting new items you have to shift offsets all over...gets confusing. As I was getting a bit frustrated with trying to figure out how to insert my own Exif tags, I stumbled upon another jpg comment facility. This is the "COM marker segment" which is one of the several "marker segment" types in the jpg spec. This seemed simple enough--just write the COM marker bytes, the length of the comment section, then the comment text, and you're done. No messy offsets to adjust.
The only problem with the COM comments is that they can appear anywhere in the file (I think), so you have to read the entire file to search for the COM marker (I think). This makes adding a COM segment easy, but finding it more costly. With the Exif Image Description tag, you only have to read the length of the APP1 section of the file, which always appears at the beginning of the file and seems to be around only 3k in the files from my digital camera. All Exif info has to reside in this APP1 section. So if your jpg is large, you can get to the Image Description tag by only reading a small chunk at the beginning of the file instead of the entire file.
Which brings me to these two utilities. I finally decided that instead of shoe-horning in an Image Description tag in an existing Exif section created by my digital camera, I would just delete the entire Exif section and create a new one with just the tags I wanted, the Image Description and the Copyright (and also saving the tag with the date and time the picture was taken). This has the side benefit of reducing the image file size by eliminating the Exif tags and the embedded thumbnail, none of which is needed for a jpg on this website. So that's what these little utilities do for you. And they are in use on the pages of this site where a "large" photo is viewed (by clicking on a thumbnail where available). Any caption, date and copyright for the resulting page is pulled from the jpg Exif data. Here's an example.
From reading the spec, it appears that there is a minimum set of tags that are mandatory in the Exif section, so the files produced by adding only the specific tags I want to keep are probably not officially valid, but they work in all the image programs I've tried.
The Python Modules
Caveats: I am no authority on jpgs or Exif or the like. But these seem to do what I want. Also, they are probably pretty rough around the edges. Let me know if you find problems, find them useful or have any improvements.
- minimal_exif_writer.py - For removing existing exif info from a jpg, and adding an image description and/or copyright tag. Will also save existing date/time picture taken tag. See file for documentation. Note: this doesn't seem to work on unix, at least on Solaris. I tried it and got "SystemError: mmap: resizing not available--no mremap()". I don't need it on unix, so I'm not investigating further.
- minimal_exif_reader.py - Reads ImageDescription, Copyright and DateTimeOriginal tags from jpg. Seems to work on Windows and Linux (works at my web host, which uses some flavor of Linux).
Loading comments...