[FFmpeg-devel] [PATCH] SSE optimization for DCA decoder

Fri Aug 29 12:55:37 CEST 2008

2008/8/29 Robert Swain <robert.swain at gmail.com>:
> 2008/8/29 Alexander E. Patrakov <patrakov at gmail.com>:
>> David Conrad wrote:
>>> Attached gives me about a 45% faster overall DCA decode on my penryn.
>>> Name suggestions for the function welcome.
>>
>> I think that we should completely rewrite this QMF instead (at the very
>> least, in order to get rid of the name "subband_fir_noidea") and express it
>> in terms of convolution (or scalar product) combined with a 32-point DCT
>> (which is already available in the optimized form). When writing my DCA
>> encoder, I had to treat this QMF as a linear black-box instead of
>> understanding the existing code. The reason is that the code is almost a
>> verbatim copy from the spec, and the spec has been obviously obfuscated in
>> order to make writing the encoder harder. Thus, I want to share some
>> algebraic properties of the transform, in order to inspire you to write a
>> better C implementation of it (based on this understanding).
>
> [suggestions]
>
> [code]
>
> I will need analysis (32-subband; real-valued time domain to
> complex-valued and hence oversampled output) and synthesis (normal -
> 64-subband, downsampled - 32-subband; complex-valued input to
> real-valued time domain) QMFs for SBR so this may be useful to me too.

There is also a low-power version specified that uses critically
sampled real-valued QMF banks. I'm not sure whether to consider these
real-valued banks or the complex-valued ones. I expect as it is
intended for low power there is some sort of quality loss through
using only real-valued filtering, else why not always use it?

Rob