[FFmpeg-devel] [PATCH 2/3] Indeo 5 decoder: common DSP functions

Maxim max_pole
Sun Jan 10 21:58:37 CET 2010

Michael Niedermayer schrieb:
> On Sun, Jan 10, 2010 at 01:22:17PM +0200, Kostya wrote:
>> On Sat, Jan 09, 2010 at 05:43:40PM +0200, Kostya wrote:
>>> On Sat, Jan 09, 2010 at 03:47:39PM +0100, Michael Niedermayer wrote:
>>>> On Sat, Jan 09, 2010 at 04:40:30PM +0200, Kostya wrote:
>>>>> On Fri, Jan 08, 2010 at 11:41:23PM +0100, Michael Niedermayer wrote:
>>>>>> On Sun, Jan 03, 2010 at 12:56:36PM +0200, Kostya wrote:
>>>>>> [...]
>>>>>>> void ff_ivi_recompose53(const IVIPlaneDesc *plane, uint8_t *dst,
>>>>> [function body skipped]
>>>>>> is this mess faster than some more readable variant?
>>>>> Here's more readable variant by me, checked to be bitexact but it's
>>>>> significantly slower (> 10%), I'd rather leave old one.
>>>> I also prefer speed, what about an implementation using lifting?
>>> I'll try to implement it.
>> Hmm, after some experiments I'd rather leave original version.
>> Even grouping variables together in array gives significant performance
>> drop. And pure lifting transform is not applicable here either because
>> band data is grouped and it will take at least two passes (hor/vert)
>> with conditions for missing bands and requires an additional temp
>> buffer.
> So you can improve snow 5/3 performance by using this code?
> My point is that i dont really care which code but iam slightly alergic to
> code duplication and i dont see why this should be faster here while slower
> in snow than lifting ...
> So please elaborate if you think snow and this have a different optimal
> implementation

At the time of development of this code I did some performance research
regarding this filter. I observed two important points where the
performance can be improved:

- Doing the vertical and horizontal filtering separately requires an
additional temp buffer what doens't use the cache memory effectively,
especially in the case of large images. Therefore the one-pass filtering
was more preferable. Moverover, all previously calculated values must be
reused whenever possible...

- Due to data decimation in the encoder an upsampling step (inserting a
zero value between each pair of the filter coefficients) is needed in
the decoder. This leads to an high amount of redundant calculations,
because the half of them operates on zeros. This can be optimized by
using two separate filter for odd/even pixels. So calculating four
pixels at once (two vertical + two horizontal ones) using separate
filters works as fast as the lifting technique because those can be
simplified alot.
The lifting works well for the encoder step IMHO, but there is no
performance improvement in the decoder due to the above mentioned
downsampled data...

I'm agree with the fact that those code duplications and obfuscations by
using temp values look ugly. Maybe the polyphase filters can be
separated out and whole thing can be better documented?


More information about the ffmpeg-devel mailing list