[FFmpeg-devel] [PATCH] SIMD-optimized float_to_int32_fmul_scalar()

Justin Ruggles justin.ruggles
Fri Jan 7 21:38:05 CET 2011


On 01/07/2011 01:52 PM, Justin Ruggles wrote:

> On 01/07/2011 01:31 PM, Michael Niedermayer wrote:
>> also some of these can be unrolled to gain a bit more speed
> 
> 
> unrolling didn't give me any benefit in testing, but that was just on
> Athlon.  I'll do more tests and try it on Atom as well.


dang. well, I didn't test very thoroughly before apparently.

AMD Athlon
loop2 3DNow: 51221
loop4 3DNow: 49101
loop8 3DNow: 43870
loop4   SSE: 50267
loop8   SSE: 51038
loop4  SSE2: 53008
loop8  SSE2: 50139

Intel Atom
loop4   SSE: 149126
loop8   SSE: 107183
loop4  SSE2: 148860
loop8  SSE2: 104592

Based on this data it seems my best option would be to loop over 8
values for all versions and set function pointers like so:

if(mm_flags & AV_CPU_FLAG_SSE){
    c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse;
}
if(mm_flags & AV_CPU_FLAG_SSE2){
    c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse2;
}
if((mm_flags & AV_CPU_FLAG_3DNOW) && !(avctx->flags & CODEC_FLAG_BITEXACT)){
    // faster than sse2
    c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_3dnow;
}

-Justin




More information about the ffmpeg-devel mailing list