[Ffmpeg-cvslog] r5898 - in trunk/libavcodec: dsputil.c dsputil.h i386/dsputil_mmx.c vorbis.c vorbis.h

Tue Aug 8 08:48:58 CEST 2006

Loren Merritt skrev:
> On Thu, 3 Aug 2006, Benjamin Larsson wrote:
>
>> Loren Merritt wrote:
>>> On Thu, 3 Aug 2006, Benjamin Larsson wrote:
>>>
>>>> If you want to optimize more you could look at the mdct pre and 
>>>> post twiddle steps in mdct.c. Currently they are scalar operations. 
>>>> Optimizing this would also give a gain to wma and aac.
>>>
>> I forgot that ac3 also would gain from this.
>
> If there were an ffac3 and ffaac, that is.
Lets hope there will be.
>
>
>>> hmm, those are annoying because the data aren't contiguous.
>>
>> I don't understand can you elaborate?
>
> In some of the arrays, the data for iteration k is next to the data 
> for iteration k+1, and in other arrays it's next to n-k. This is fine 
> for 3dnow, which just loads one complex number into one mmreg. But for 
> sse, I would have to unroll the loop an extra time (doing iterations 
> k, k+1, n-k, n-k-1 all at once) in order to load the data efficiently.
Ok, I understand, how about the window overlap? Would there be any 
significant gain to write SIMD code for that operation? The window could 
be mirrored to get rid of the n-k indexing.
>
> --Loren Merritt
MvH
Benjamin Larsson