### Sunday, December 10, 2006

## double to float conversion

When we must convert float number to integer in C, we have two convenient functions at our disposal: floor and ceil. For some reason however, there are no natural counterparts to these functions when we deal with double -> float conversion.

I wrote a simple implementation of these utilities using existing C library functions frexp and ldexp:

float fdround ( double x, int isfloor ) { const int Nf = 23; /* IEEE 754: http://en.wikipedia.org/wiki/IEEE_754 */ double m; int exp, sx; if (0 == (sx = ((x < 0.0) ? (-1) : ((x > 0.0) ? 1 : 0)))) return 0.0; m = frexp ( x * sx, &exp ); return (float)(ldexp ( (double)sx * ((isfloor ^ (sx == -1))? floor : ceil) (m * (1 << (Nf + 1))), exp - Nf - 1)); } #define fceil(x) (fdround ( (x), 0 )) #define ffloor(x) (fdround ( (x), 1 ))

This implementation of course depends on the correct knowledge of the number of bits for mantissa as per IEEE standard, which is 23 for 32-bit floating-point numbers.

Here is a simple utility to test whether the above function generates correct numbers:

void test(double x) { float f = ffloor(x), c = fceil(x); float a = 0.4 * f + 0.6 * c, b = 0.6 * f + 0.4 * c, y = (f + c)/2; assert ( f <= x && c >= x ); assert ( a == c && b == f ); assert ( y == f || y == c ); }

Enjoy!

Labels: C