[FFmpeg-devel] [PATCH] SSE dct32()

Vitor Sessak vitor1001
Wed Jun 23 23:27:39 CEST 2010


On 06/20/2010 07:51 PM, Michael Niedermayer wrote:
> On Sun, Jun 20, 2010 at 01:12:48PM +0100, M?ns Rullg?rd wrote:
>> Vitor Sessak<vitor1001 at gmail.com>  writes:
>>
>>> On 06/20/2010 01:33 PM, M?ns Rullg?rd wrote:
>>>> Vitor Sessak<vitor1001 at gmail.com>   writes:
>>>>
>>>>> On 06/20/2010 12:15 PM, M?ns Rullg?rd wrote:
>>>>>> Vitor Sessak<vitor1001 at gmail.com>    writes:
>>>>>>
>>>>>>>>> I don't remember seeing a big difference _for the dct32 code_ between in ==
>>>>>>>>> out and in != out.
>>>>>>>>
>>>>>>>> now iam confused, i thought the 3% you quoted was about in ==out vs in!= out
>>>>>>>> ?
>>>>>>>
>>>>>>> No, the 3% slowdown was when converting our general code (using FFT)
>>>>>>> to have in != out.
>>>>>>
>>>>>> And that was due to missed optimisations caused by gcc not knowing
>>>>>> that those pointers don't alias each other.  Marking them restrict is
>>>>>> not good either, since we actually want to pass the same value
>>>>>> sometimes.
>>>>>
>>>>> That and one extra used register.
>>>>
>>>> So what do we do?  I see the following options:
>>>>
>>>> 1. Change mp3 decoder to work with inplace transform.
>>>
>>> Looks hard with no speed loss
>>
>> Just hard or impossible?
>
> hard, not impossible
> just consider that dct32() trashes its input array
>
> Either way, the in != out thing is not a big issue if its not slower
> what is a big issue is that high level optimizations have to be done
> before asm optimisations
>
> is our dct32() code optimal? If i didnt miscount mp3lib does 4 butterflies
> less but i could have miscounted. Also our dct32() should be benchmarked
> against dct32() codes from other mp3 decoders to make sure our highlevel
> code is ok before one starts writing asm for it

Our C dct32() is faster than mp3lib latest svn C version. Patch to test 
attached (dct32_test.diff). Don't expect me to test every dct32() 
implementation on the web...

Anyway, in what does it influences the patch to move dct32() to shared 
code? New version attached (dct32_common.diff)...

-Vitor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dct32_test.diff
Type: text/x-patch
Size: 28362 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100623/c02a1e69/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dct32_common.diff
Type: text/x-patch
Size: 17065 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100623/c02a1e69/attachment-0001.bin>



More information about the ffmpeg-devel mailing list