[FFmpeg-devel] [PATCH] Fast half-float to float conversion

Reimar Döffinger Reimar.Doeffinger
Wed Jul 1 08:38:47 CEST 2009


On Wed, Jul 01, 2009 at 07:42:45AM +0200, Jimmy Christensen wrote:
> On 2009-06-30 15:50, Reimar D?ffinger wrote:
> > On Tue, Jun 30, 2009 at 03:09:33PM +0200, Jimmy Christensen wrote:
> >>> Lastly it seems to me that this code would belong into
> >>> libavutil/intfloat_readwrite.*
> >>
> >> You are probably right.
> >
> > Personally I'd suggest doing a stupid implementation for
> > libavutil/intfloat_readwrite.*, something like this (certainly
> > not a beauty, possibly buggy, does not support denormals, though since
> > they are rare it probably wouldn't be a big issue to add handling for
> > them in if (nosign<  0x0400)).
> > float av_int2halflt(int16_t v){
> >      uint16_t nosign = v+v;
> >      if (nosign>= 0xfc00) {
> >          if (nosign == 0xfc00) return v>>15 ? -1.0/0.0 : 1.0/0.0;
> >          else return 0.0/0.0;
> >      }
> >      if (nosign<  0x0400) return 0; // denormal or 0
> >      return ldexp((v&0x3ff) + (1<<11)) * (v>>15|1), (v>>10&0x1f)-26);
> > }
> >
> >
> > And consider an optimized version only after it is actually used and
> > thus it is clear what kind of optimizations are necessary/most useful.
> > E.g. the table based approach probably is hard to SIMD and thus might be
> > a bad idea if you want to convert a lot of data.
> 
> I tried testing with the code that you posted but it doesn't seem to 
> work? The image looks quite clamped.

Well, I already said it is possibly buggy. E.g. the (1<<11) should be
(1<<10).

> Also according to the original author the table conversion does 
> implement denormals, zero, infinite and NaN.

Yes, I know now. Implementing denormals for this code isn't hard either.
But optimizations when the code that is to be optimized is not even
available yet makes very little sense to me.
If you can get a away with ignoring the INF/NaN and denormal case, a
SIMD version can probably convert at least 4 floats in as many cycles as
the table code needs for 1, not even counting possible memory latencies.



More information about the ffmpeg-devel mailing list