Decoding RW2 Raw Image Files

After owning a Panasonic Lumix digital camera for a while, and recently deciding to shoot images in raw rather than jpeg, I was fairly surprised to find that Panasonic opted to store their images in a non standard (and as best I can tell) undocumented format. Obviously this makes the images harder to work with than I would like. I’d never really considered this before, but it appears that there are large numbers of RAW formats using similar conventions and a whole world of format converters, plugins, and other tools.

The problem is that these files don’t ‘just work’ in lots of apps, even when those apps have good image format support generally, so I feel I need to convert them into some other format that I’ll find easier to work with. In particular I’d like to be able to load the images into Photoshop CS2 as this version is available for free download, and I’d like to see thumbnails in Windows explorer. Somewhat surprisingly they do seem to show thumbnails in Windows Explorer (using Windows 10), but they don’t work in Photoshop, so some research is needed.

So what is an RW2 file? Well, it turns out that RW2 files are basically TIFF files with various custom tags and with sensor data in place of the image data, some of which are important to pulling the raw image data out, and some of which are not.

LibTIFF

My first approach to decoding them was to attempt to lean on libtiff for much of the work. This is an open source TIFF parser than can be found here. At first this seemed promising. I quickly learned that RW2 uses a custom file signature. The first 4 bytes of a TIFF file are mostly always the same, but RW2 replaces them with something different. We can setup libtiff such that we hook in callbacks for file I/O and this made it pretty easy to swap out the data before the library saw it, which gets us into the parser. From there I quickly determined that libtiff wasn’t going to cut it though. After faking the header it complains that the image height tag is missing, and after some analysis of the file I concluded that it actually was missing as was lots of other important data the library appeared to depend on. At this point I decided that the format was too ‘non standard’ and that libtiff wasn’t going to be of much use, and gave up.

Building my own RW2 parser

Building a parser for the TIFF style format is fairly easy. The format of a TIFF file is a small header (8 bytes I think) containing a signature and the offset to a list of tags. One of the tags then points to a further list of tags. Some of the tags point to blocks of data. This format is documented in various places and is easy to parse. Getting to the point where we have all the tags is fairly easy. Making sense of them and finding the image data is much harder though.

As I found when trying to read these files with libtiff, the image dimensions were missing. This turns out to be because Panasonic store the sensor dimensions using custom tags. This makes sense when you realize that instead of storing image data, the file quite literally stores the values delivered by the sensor as the photo is taken. The sensor data is compressed using what appears to be a fairly standard RAW compression format and there are a few examples of how you decompress this data if you look for similar projects in github for example or elsewhere. Once decompressed, it turns out the sensor isn’t an array of RGB sensors so we don’t get RGB pixel values, but rather a matrix of R, G and B sensors. Typically the color values we get back are laid out in a repeating 2×2 grid containing an R, B and 2 G values per cell. From this data we can reconstruct an RGB image. The pair of G values provide the details and when combined with the lower frequency R and B values we get the colours too. This is all sounding more complex than I’d like, so before I put a lot of effort into parsing the format myself, I’m hoping I can find that someone has done the work for me.

dcraw

After searching online for a bit I found dcraw. This seems to be a fairly well maintained tool, in widespread use, that converts raw images in various formats to either PPM or TIFF files, both containing 16 bit colour data. The tool is available for download as a Windows executable from a few places. Here are some examples of how this works. The first converts my RW2 file to a TIFF and the second two to a PPM file with various options applied.

dcraw -T IMAGE.RW2
dcraw -6 IMAGE.RW2
dcraw -6 -4 IMAGE.RW2

The first option produces a TIFF file which both Windows Explorer and Photoshop CS2 understand. The last two options produce PPM files which Photoshop CS2 is happy to consume at least. -6 and -4 tell the tool to write 16 bit data linear data. I think I want both of these settings. Unfortunately -4 only seems to work when outputting PPM files, or if not it does something different when using TIFF files. If I add this option to the command lines used to produce TIFF files the result isn’t affected, or at least that’s how it seems anyway.

Something else I’m noticing now that I have images I can load in Photoshop, is that even with the higher precision, there are large sections of the image that have hit the upper limit of the range. The image in question shows some trees with a bright sky behind them, and the RGB values for the sky in photoshop show as 32767 or 32768 across all channels. I’d hoped that working in RAW would reveal a wider range, rather than just more precision, but I’m now wondering if that’s not the case. Am I hitting the limits of the sensor, or has the software applied some sort of normalization process to bring the sensor data into a 0-1 range? If so then I’m wondering if there isn’t some value in intentionally under exposing my photos in order that this not happen, as doing so I think would give me maximum freedom to adjust the photo’s later, because if a region of the photo hits the limits of the range and ‘whites out’ then the true data has in effect been lost at the point the image was captured.

libraw and other solutions

At this point I still don’t have as full an understanding of the data being produced as I would like, or what I consider to be a complete solution should I wish to batch reconvert the images to another format with minimal loss of data and maximum flexibility.

I was also really hoping to get the data into a format that supported a wider color range assuming the raw file contained data beyond the range I’m seeing in the TIFF file. A format like EXR would be ideal, as where TIFF only appears to give us high precision, I’m fairly sure it gives us range too, assuming the source data had the range to start with.

There are libraries like libraw for example that might allow me to extract the data myself with a little work.

I’ll settle for TIFFs, but I’m wondering if I need to go back to writing a tool of my own.

Leave a Reply

Your email address will not be published. Required fields are marked *