05/12/2010 @09:16:02 ^10:31:11

serialisation of floating point data

Yesterday the topic came up of how to store floating point variables in a file or send them over the network so they can be reloaded later, possibly by another machine, possibly with a different architecture, and be exactly the same as when they were stored. This is a general problem, called serialisation.

Consider for example the problem of one human brain, containing a complex image or abstract concept, having to transfer that image or concept into another human brain. This is a serialisation problem, one at which I am very bad... better known as "talking to people".

For integral data types it's easy. You just write the thing as a stream of bytes. The only thing you really have to do is make sure you're reading the bytes in the right order (endianness.) This is a solved problem, go read about network byte order and host byte order for instance.

For floats and doubles it's a bit more complicated though. How do you write out a double and then read it back later, knowing you have the same thing? Here are the outlines of three approaches.

Type punning

Stick the double in a union along with an array of the same size as a double.

union { double d; uint8_t c[sizeof(double)]; } u;

Then you write your double into u.d and loop over the bytes, writing each one out at a time, just like writing out an int32_t or whatever.

Write the double as a string

Convert the double into an array of characters with sprintf. To read it back, use strtod. C99 has a printf format specifier "%a" that seems made for this.

frexp/ldexp

This is an approach I didn't think of and had to try out myself.

These are two floating point functions (defined in math.h, link with -m). The first decomposes a double into an integer exponent and a normalized fractional mantissa in the interval [½,1). The second puts it back again. Now what?

My naive approach was to multiply the fractional part by INT64_MAX to convert it into an int64_t. Now you have two integers (of sizes 4 and 8 bytes respectively, so you're using up 4 extra bytes per double stored) and these can be serialised using your existing functions for integral data types.

struct fpack {
  int32_t exp;
  int64_t frac;
};

void serialize(struct fpack *pack, double d)
{
  double mantissa = frexp(d, &(pack->exp));
  pack->frac = INT64_MAX * mantissa;
}

double unserialize(struct fpack *pack)
{
  double mantissa = (double)(pack->frac) / INT64_MAX;
  return ldexp(mantissa, pack->exp);
}

I've tested this a bit - get a double, convert it to a struct and back, compare it to the original - and it seems to work for most everything I threw at it (several thousand random strings of digits and a decimal point somewhere) but with the exceptions of infinities, "Not-A-Number" cases etc.

I suspect printf("%a") is doing something like this but a more clever variant. But ultimately it begins by calling frexp.

Conclusion

Strings are probably best, but I like that trick with frexp/ldexp. It just needs more testing and attention to edge cases.

And of course if you want strings but your platform doesn't have printf("%a") you could always pilfer a free implementation that's compatible with your licence.