[FFmpeg-devel] [PATCH] iirfilter: Use local variables for state in loop for FILTER_O2().

Justin Ruggles justin.ruggles
Sun Jan 30 22:04:36 CET 2011


On 01/30/2011 02:26 PM, Justin Ruggles wrote:

> On 01/30/2011 02:05 PM, Justin Ruggles wrote:
> 
>> On 01/30/2011 01:51 PM, M?ns Rullg?rd wrote:
>>
>>> Justin Ruggles <justin.ruggles at gmail.com> writes:
>>>
>>>> 4% faster 2nd order ff_iir_filter_flt().
>>>> ---
>>>>  libavcodec/iirfilter.c |   12 +++++++-----
>>>>  1 files changed, 7 insertions(+), 5 deletions(-)
>>>>
>>>> I tried most of yesterday and this morning trying to make an asm
>>>> version of the float biquad filter, but nothing I came up with was
>>>> faster than what gcc did with the C version.  I did, however manage
>>>> to speed up the C version by about 4% by adding local variables
>>>> inside the inner loop for the 2 states.
>>>>
>>>> diff --git a/libavcodec/iirfilter.c b/libavcodec/iirfilter.c
>>>> index bc63c39..dd593dd 100644
>>>> --- a/libavcodec/iirfilter.c
>>>> +++ b/libavcodec/iirfilter.c
>>>> @@ -261,11 +261,13 @@ av_cold struct FFIIRFilterState* ff_iir_filter_init_state(int order)
>>>>      const type *src0 = src;                                             \
>>>>      type       *dst0 = dst;                                             \
>>>>      for (i = 0; i < size; i++) {                                        \
>>>> -        float in = *src0   * c->gain  +                                 \
>>>> -                   s->x[0] * c->cy[0] +                                 \
>>>> -                   s->x[1] * c->cy[1];                                  \
>>>> -        CONV_##fmt(*dst0, s->x[0] + in + s->x[1] * c->cx[1])            \
>>>> -        s->x[0] = s->x[1];                                              \
>>>> +        float s0 = s->x[0];                                             \
>>>> +        float s1 = s->x[1];                                             \
>>>> +        float in = *src0 * c->gain  +                                   \
>>>> +                   s0    * c->cy[0] +                                   \
>>>> +                   s1    * c->cy[1];                                    \
>>>> +        CONV_##fmt(*dst0, in + s0 + s1 * c->cx[1])                      \
>>>> +        s->x[0] = s1;                                                   \
>>>>          s->x[1] = in;                                                   \
>>>>          src0 += sstep;                                                  \
>>>>          dst0 += dstep;                                                  \
>>>
>>> Why do you do load/store the struct values in the loop?  Wouldn't it
>>> be better to load the x[] values to locals before the loop and write
>>> them back after?  You might try doing the same with c[xy] as well.
>>
>>
>> Yes, that was my first thought, and I tried with state and with coefs
>> outside the loop.  They were slower.
> 
> 
> Looking at the disassembly of iirfilter.o, it seems that gcc does this
> anyway, but it probably does it more efficiently by itself than with me
> trying to force it.


Ok, ignore this patch. I'll send a new one shortly. I took a closer look
at the gcc output and figured out how to move the 2 state variables
outside the loop in a way that makes gcc not mess it up.

current:                      217731 dezicycles
this patch:                   209010
new patch I haven't sent yet: 189568

I'm going to play around more with the asm to see if I can tweak anymore
speed out of it.  This function is a big portion of CPU time for AC3
transient detection, so I want to get it as fast as I can before adding
that to the AC3 encoder.

-Justin



More information about the ffmpeg-devel mailing list