[FFmpeg-devel] [PATCH] SIMD-optimized float_to_int32_fmul_scalar()

Loren Merritt lorenm
Fri Jan 7 19:49:55 CET 2011


On Fri, 7 Jan 2011, Justin Ruggles wrote:

> This patch implements float_to_int32_fmul_scalar() for 3dnow, sse, and
> sse2 and uses it in the AC3 encoder.

>@@ -2303,6 +2303,65 @@ static void int32_to_float_fmul_scalar_sse2(float *dst, const int *src, float mu
>     );
> }
>
>+static void float_to_int32_fmul_scalar_3dnow(int32_t *dst, const float *src, float mul, int len)
>+{
>+    /* note: pf2id conversion uses truncation, not round-to-nearest */
>+    x86_reg i = (len-4)*4;
>+    __asm__ volatile(
>+        "movq          %3,   %%mm1      \n\t"

movd

>@@ -2910,6 +2971,8 @@ void dsputil_init_mmx(DSPContext* c, AVCodecContext *avctx)
>             c->vector_fmul_add = vector_fmul_add_3dnow; // faster than sse
>         if(mm_flags & AV_CPU_FLAG_SSE2){
>             c->int32_to_float_fmul_scalar = int32_to_float_fmul_scalar_sse2;
>+            if (!(mm_flags & AV_CPU_FLAG_SSE2SLOW))
>+                c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse2;

AV_CPU_FLAG_SSE2SLOW is an alternative to AV_CPU_FLAG_SSE2. They won't 
both be set at once. It means "pentium-m's SSE2 is so slow that by default 
we pretend it doesn't exist, and only make an exception if specifically 
tested".
If you intended it to detect athlon64, then you picked the wrong flag, and 
there isn't a right one yet.

--Loren Merritt



More information about the ffmpeg-devel mailing list