[FFmpeg-devel] [PATCH] ALS: Solve Issue 1657

Wed Jan 6 01:21:10 CET 2010

Am 05.01.10 23:38, schrieb Michael Niedermayer:
> On Tue, Jan 05, 2010 at 02:58:56PM +0100, Thilo Borgmann wrote:
>> Am 05.01.10 12:11, schrieb Michael Niedermayer:
>>> On Tue, Jan 05, 2010 at 03:59:41AM +0100, Thilo Borgmann wrote:
>>>> Am 05.01.10 02:43, schrieb Michael Niedermayer:
>>>>> On Tue, Jan 05, 2010 at 02:23:09AM +0100, Thilo Borgmann wrote:
>>>>>> Am 05.01.10 01:43, schrieb Michael Niedermayer:
>>>>>>> On Tue, Jan 05, 2010 at 12:34:53AM +0100, Thilo Borgmann wrote:
>>>>>>>> Am 05.01.10 00:30, schrieb Thilo Borgmann:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> issue 1657 seems to be caused by negative indices used in [].
>>>>>>>>> See: http://roundup.ffmpeg.org/roundup/ffmpeg/issue1657
>>>>>>>>>
>>>>>>>>> Using *() resolves this issue.
>>>>>>>>>
>>>>>>>>> Tested with gcc 4.0 on MacOS 10.6. There were other versions/compilers
>>>>>>>>> mentioned in roundup, maybe these could be tested by someone (you)?
>>>>>>>>>
>>>>>>>>> I'm sorry, my svn still seems to be broken and produces unusable patches
>>>>>>>>> (%ld...). Nevertheless I can apply them if the workaround is ok.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Some artifacts left in als_data.h. Ignore the old patch, updated patch
>>>>>>>> attached.
>>>>>>>>
>>>>>>>> -Thilo
>>>>>>>
>>>>>>>>  alsdec.c |    4 ++--
>>>>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>> 987821d84540420efa6f2e67be17094074e638f8  als_issue1657.rev1.patch
>>>>>>>> Index: libavcodec/alsdec.c
>>>>>>>> ===================================================================
>>>>>>>> --- libavcodec/alsdec.c	(Revision 21025)
>>>>>>>> +++ libavcodec/alsdec.c	(Arbeitskopie)
>>>>>>>> @@ -%ld,%ld +%ld,%ld @@
>>>>>>>>              y = 1 << 19;
>>>>>>>>  
>>>>>>>>              for (sb = 0; sb < smp; sb++)
>>>>>>>> -                y += MUL64(lpc_cof[sb],raw_samples[smp - (sb + 1)]);
>>>>>>>> +                y += MUL64(lpc_cof[sb], *(raw_samples + smp - (sb + 1)));
>>>>>>>
>>>>>>> patch ok
>>>>>>
>>>>>> Applied.
>>>>>>
>>>>>>>
>>>>>>> independant of this, it could be that if lpc_cof was reversed
>>>>>>>
>>>>>>> for (sb = 0; sb < smp; sb++)
>>>>>>>     y += MUL64(lpc_cof[sb], raw_samples[sb]);
>>>>>>
>>>>>> In the second case the index has to be negative which is not possible
>>>>>> with this approach.
>>>>>
>>>>> i dont see why
>>>>> also it should be 
>>>>> raw_samples[sb]
>>>>> and
>>>>> raw_samples++
>>>>
>>>> Hmm. If you could please elaborate a bit more, I will pick this up
>>>> tomorrow and will see if I can make it better.
>>>
>>> one way:
>>>
>>> -    for (; smp < bd->block_length; smp++) {
>>> -        y = 1 << 19;
>>> -
>>> -        for (sb = 0; sb < opt_order; sb++)
>>> -            y += MUL64(bd->lpc_cof[sb],raw_samples[smp - (sb + 1)]);
>>> -
>>> -        raw_samples[smp] -= y >> 20;
>>> +    for (raw_samples+=smp; raw_samples < raw_samples_end; raw_samples++) {
>>> +        y = 1 << 19;
>>> +
>>> +        for (sb= -opt_order; sb<0; sb++)
>>> +            y += MUL64(lpc_cof[sb],raw_samples[sb]);
>>> +
>>> +        raw_samples[0] -= y >> 20;
>>>     }
>>
>> Ok I see, slightly different patch attached that has a ~20% speed gain
>> in the second loop alone. Also issue 1657 is avoided by not using smp in
>> substraction (and yes, tested with 64-bit & gcc 4.0).
>>
> 
>> To have lpc_cof[sb] as well as raw_samples[sb] without math in the
>> indices, lpc_cof has to be reverted and its pointer shifted to one next
>> to the end (I think...) - this would make it too obfuscated IMO.
> 
> could you try? it would be interresting to see how much faster it is

Seems not to be faster. Also, reversed order of lpc_cof would shift this
math from these loops into others and reversion has also to be done
somewhere.

17446 dezicycles in reversed array, 2042 runs, 6 skips
18731 dezicycles in reversed array, 2038 runs, 10 skips
17446 dezicycles in reversed array, 2042 runs, 6 skips

17651 dezicycles in simple math, 2040 runs, 8 skips
18497 dezicycles in simple math, 2039 runs, 9 skips
17696 dezicycles in simple math, 2038 runs, 10 skips

Code snippet:

>    // reverse linear prediction coefficients for efficiency
>    for (k = 0; k < opt_order; k++)
>        lpc_cof_reversed[k] = lpc_cof[opt_order - (k + 1)];
>
>START_TIMER;
>    // reconstruct raw samples
>    raw_samples = bd->raw_samples + smp;
>    lpc_cof     = lpc_cof_reversed + opt_order;
>
>    for (; smp < bd->block_length; smp++) {
>        y = 1 << 19;
>
>// simple:
>//        for (sb = 0; sb < opt_order; sb++)
>//            y += MUL64(bd->lpc_cof[sb], raw_samples[-(sb + 1)]);
>// reversed:
>        for (sb = -opt_order; sb < 0; sb++)
>            y += MUL64(lpc_cof[sb], raw_samples[sb]);
>
>        *raw_samples++ -= y >> 20;
>    }
>
>    raw_samples = bd->raw_samples;
>//STOP_TIMER("simple math");
>STOP_TIMER("reversed array");

And there is an artifact dereferencing left in the simple math way
(bd->lpc_cof) which makes it 'even slower'...

-Thilo