[FFmpeg-devel] [PATCH] iirfilter: Use local variables for state in loop for FILTER_O2().

Justin Ruggles justin.ruggles
Sun Jan 30 20:26:09 CET 2011


On 01/30/2011 02:05 PM, Justin Ruggles wrote:

> On 01/30/2011 01:51 PM, M?ns Rullg?rd wrote:
> 
>> Justin Ruggles <justin.ruggles at gmail.com> writes:
>>
>>> 4% faster 2nd order ff_iir_filter_flt().
>>> ---
>>>  libavcodec/iirfilter.c |   12 +++++++-----
>>>  1 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> I tried most of yesterday and this morning trying to make an asm
>>> version of the float biquad filter, but nothing I came up with was
>>> faster than what gcc did with the C version.  I did, however manage
>>> to speed up the C version by about 4% by adding local variables
>>> inside the inner loop for the 2 states.
>>>
>>> diff --git a/libavcodec/iirfilter.c b/libavcodec/iirfilter.c
>>> index bc63c39..dd593dd 100644
>>> --- a/libavcodec/iirfilter.c
>>> +++ b/libavcodec/iirfilter.c
>>> @@ -261,11 +261,13 @@ av_cold struct FFIIRFilterState* ff_iir_filter_init_state(int order)
>>>      const type *src0 = src;                                             \
>>>      type       *dst0 = dst;                                             \
>>>      for (i = 0; i < size; i++) {                                        \
>>> -        float in = *src0   * c->gain  +                                 \
>>> -                   s->x[0] * c->cy[0] +                                 \
>>> -                   s->x[1] * c->cy[1];                                  \
>>> -        CONV_##fmt(*dst0, s->x[0] + in + s->x[1] * c->cx[1])            \
>>> -        s->x[0] = s->x[1];                                              \
>>> +        float s0 = s->x[0];                                             \
>>> +        float s1 = s->x[1];                                             \
>>> +        float in = *src0 * c->gain  +                                   \
>>> +                   s0    * c->cy[0] +                                   \
>>> +                   s1    * c->cy[1];                                    \
>>> +        CONV_##fmt(*dst0, in + s0 + s1 * c->cx[1])                      \
>>> +        s->x[0] = s1;                                                   \
>>>          s->x[1] = in;                                                   \
>>>          src0 += sstep;                                                  \
>>>          dst0 += dstep;                                                  \
>>
>> Why do you do load/store the struct values in the loop?  Wouldn't it
>> be better to load the x[] values to locals before the loop and write
>> them back after?  You might try doing the same with c[xy] as well.
> 
> 
> Yes, that was my first thought, and I tried with state and with coefs
> outside the loop.  They were slower.


Looking at the disassembly of iirfilter.o, it seems that gcc does this
anyway, but it probably does it more efficiently by itself than with me
trying to force it.

-Justin



More information about the ffmpeg-devel mailing list