[FFmpeg-devel] [PATCH] iirfilter: Use local variables for state in loop for FILTER_O2().

Måns Rullgård mans
Sun Jan 30 20:26:20 CET 2011


Justin Ruggles <justin.ruggles at gmail.com> writes:

> On 01/30/2011 01:51 PM, M?ns Rullg?rd wrote:
>
>> Justin Ruggles <justin.ruggles at gmail.com> writes:
>> 
>>> 4% faster 2nd order ff_iir_filter_flt().
>>> ---
>>>  libavcodec/iirfilter.c |   12 +++++++-----
>>>  1 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> I tried most of yesterday and this morning trying to make an asm
>>> version of the float biquad filter, but nothing I came up with was
>>> faster than what gcc did with the C version.  I did, however manage
>>> to speed up the C version by about 4% by adding local variables
>>> inside the inner loop for the 2 states.
>>>
>>> diff --git a/libavcodec/iirfilter.c b/libavcodec/iirfilter.c
>>> index bc63c39..dd593dd 100644
>>> --- a/libavcodec/iirfilter.c
>>> +++ b/libavcodec/iirfilter.c
>>> @@ -261,11 +261,13 @@ av_cold struct FFIIRFilterState* ff_iir_filter_init_state(int order)
>>>      const type *src0 = src;                                             \
>>>      type       *dst0 = dst;                                             \
>>>      for (i = 0; i < size; i++) {                                        \
>>> -        float in = *src0   * c->gain  +                                 \
>>> -                   s->x[0] * c->cy[0] +                                 \
>>> -                   s->x[1] * c->cy[1];                                  \
>>> -        CONV_##fmt(*dst0, s->x[0] + in + s->x[1] * c->cx[1])            \
>>> -        s->x[0] = s->x[1];                                              \
>>> +        float s0 = s->x[0];                                             \
>>> +        float s1 = s->x[1];                                             \
>>> +        float in = *src0 * c->gain  +                                   \
>>> +                   s0    * c->cy[0] +                                   \
>>> +                   s1    * c->cy[1];                                    \
>>> +        CONV_##fmt(*dst0, in + s0 + s1 * c->cx[1])                      \
>>> +        s->x[0] = s1;                                                   \
>>>          s->x[1] = in;                                                   \
>>>          src0 += sstep;                                                  \
>>>          dst0 += dstep;                                                  \
>> 
>> Why do you do load/store the struct values in the loop?  Wouldn't it
>> be better to load the x[] values to locals before the loop and write
>> them back after?  You might try doing the same with c[xy] as well.
>
> Yes, that was my first thought, and I tried with state and with coefs
> outside the loop.  They were slower.

Weird.  On what CPU and what gcc version was this?  Some mention of
this observation in the commit message might be useful.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list