[FFmpeg-devel] [PATCH] SSE-optimized vector_clipf()

Michael Niedermayer michaelni
Sat Aug 8 13:26:25 CEST 2009


On Sat, Aug 08, 2009 at 09:04:14AM +0200, Vitor Sessak wrote:
> Michael Niedermayer wrote:
>> On Thu, Aug 06, 2009 at 02:55:30AM +0200, Vitor Sessak wrote:
>>> Vitor Sessak wrote:
>>>> $subj, 10% speedup for twinvq decoding (but should be useful also for 
>>>> AMR and wmapro).
>>> err, I mean, attached.
>>>
>>> -Vitor
>>>  dsputil.c         |   15 +++++++++++++++
>>>  dsputil.h         |    3 ++-
>>>  x86/dsputil_mmx.c |   34 ++++++++++++++++++++++++++++++++++
>>>  3 files changed, 51 insertions(+), 1 deletion(-)
>>> 8a95f5f2f3d267089056d6a571b2e6cc37d1569e  dsp_vector_clipf.diff
>>> Index: libavcodec/dsputil.c
>>> ===================================================================
>>> --- libavcodec/dsputil.c	(revision 19598)
>>> +++ libavcodec/dsputil.c	(working copy)
>>> @@ -4093,6 +4093,20 @@
>>>          dst[i] = src[i] * mul;
>>>  }
>>>  +void vector_clipf_c(float *dst, float min, float max, int len) {
>>> +    int i;
>>> +    for (i=0; i < len; i+=8) {
>>> +        dst[i    ] = av_clipf(dst[i    ], min, max);
>>> +        dst[i + 1] = av_clipf(dst[i + 1], min, max);
>>> +        dst[i + 2] = av_clipf(dst[i + 2], min, max);
>>> +        dst[i + 3] = av_clipf(dst[i + 3], min, max);
>>> +        dst[i + 4] = av_clipf(dst[i + 4], min, max);
>>> +        dst[i + 5] = av_clipf(dst[i + 5], min, max);
>>> +        dst[i + 6] = av_clipf(dst[i + 6], min, max);
>>> +        dst[i + 7] = av_clipf(dst[i + 7], min, max);
>>> +    }
>>> +}
>> this one could be tried by using integer math instead of floats
>> (assuming IEEE floats of course)
>
> How could this possibly be faster? It would just clip the sign, then the 
> exponent, then the mantissa. It seems like much more work for me, unless 
> I'm missing something.

we arent comparing integers by first checking the first bit then seperately
the next 8 and then again seperately the last 23. Why should we here?


>
>>>  static av_always_inline int float_to_int16_one(const float *src){
>>>      int_fast32_t tmp = *(const int32_t*)src;
>>>      if(tmp & 0xf0000){
>>> @@ -4669,6 +4683,7 @@
>>>      c->vector_fmul_add_add = ff_vector_fmul_add_add_c;
>>>      c->vector_fmul_window = ff_vector_fmul_window_c;
>>>      c->int32_to_float_fmul_scalar = int32_to_float_fmul_scalar_c;
>>> +    c->vector_clipf = vector_clipf_c;
>>>      c->float_to_int16 = ff_float_to_int16_c;
>>>      c->float_to_int16_interleave = ff_float_to_int16_interleave_c;
>>>      c->add_int16 = add_int16_c;
>>> Index: libavcodec/dsputil.h
>>> ===================================================================
>>> --- libavcodec/dsputil.h	(revision 19598)
>>> +++ libavcodec/dsputil.h	(working copy)
>>> @@ -396,7 +396,8 @@
>>>      void (*vector_fmul_window)(float *dst, const float *src0, const 
>>> float *src1, const float *win, float add_bias, int len);
>>>      /* assume len is a multiple of 8, and arrays are 16-byte aligned */
>>>      void (*int32_to_float_fmul_scalar)(float *dst, const int *src, float 
>>> mul, int len);
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> -
>>> +    /* assume len is a multiple of 16, and dst is 16-byte aligned */
>>> +    void (*vector_clipf)(float *dst, float min, float max, int len);
>> align requirements are generally writen like:
>> void (*get_pixels)(DCTELEM *block/*align 16*/, const uint8_t 
>> *pixels/*align 8*/, int line_size);
>
> Changed, but in this case dsputils.h is inconsistent (see above)

the above must have slipped through the reviews ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Into a blind darkness they enter who follow after the Ignorance,
they as if into a greater darkness enter who devote themselves
to the Knowledge alone. -- Isha Upanishad
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090808/21de1501/attachment.pgp>



More information about the ffmpeg-devel mailing list