[FFmpeg-devel] [PATCH] SIMD-optimized float_to_int32_fmul_scalar()

Jason Garrett-Glaser jason
Fri Jan 7 21:38:35 CET 2011


On Fri, Jan 7, 2011 at 3:38 PM, Justin Ruggles <justin.ruggles at gmail.com> wrote:
> On 01/07/2011 01:52 PM, Justin Ruggles wrote:
>
>> On 01/07/2011 01:31 PM, Michael Niedermayer wrote:
>>> also some of these can be unrolled to gain a bit more speed
>>
>>
>> unrolling didn't give me any benefit in testing, but that was just on
>> Athlon. ?I'll do more tests and try it on Atom as well.
>
>
> dang. well, I didn't test very thoroughly before apparently.
>
> AMD Athlon
> loop2 3DNow: 51221
> loop4 3DNow: 49101
> loop8 3DNow: 43870
> loop4 ? SSE: 50267
> loop8 ? SSE: 51038
> loop4 ?SSE2: 53008
> loop8 ?SSE2: 50139
>
> Intel Atom
> loop4 ? SSE: 149126
> loop8 ? SSE: 107183
> loop4 ?SSE2: 148860
> loop8 ?SSE2: 104592
>
> Based on this data it seems my best option would be to loop over 8
> values for all versions and set function pointers like so:
>
> if(mm_flags & AV_CPU_FLAG_SSE){
> ? ?c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse;
> }
> if(mm_flags & AV_CPU_FLAG_SSE2){
> ? ?c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse2;
> }
> if((mm_flags & AV_CPU_FLAG_3DNOW) && !(avctx->flags & CODEC_FLAG_BITEXACT)){
> ? ?// faster than sse2
> ? ?c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_3dnow;
> }

Have you forgotten about the existence of the Phenom, a much more
commonly used CPU than the Athlon 64?

Jason



More information about the ffmpeg-devel mailing list