[FFmpeg-devel] [PATCH] SSE dct32()

Måns Rullgård mans
Sun Jun 20 14:12:48 CEST 2010

Vitor Sessak <vitor1001 at gmail.com> writes:

> On 06/20/2010 01:33 PM, M?ns Rullg?rd wrote:
>> Vitor Sessak<vitor1001 at gmail.com>  writes:
>>> On 06/20/2010 12:15 PM, M?ns Rullg?rd wrote:
>>>> Vitor Sessak<vitor1001 at gmail.com>   writes:
>>>>>>> I don't remember seeing a big difference _for the dct32 code_ between in ==
>>>>>>> out and in != out.
>>>>>> now iam confused, i thought the 3% you quoted was about in ==out vs in!= out
>>>>>> ?
>>>>> No, the 3% slowdown was when converting our general code (using FFT)
>>>>> to have in != out.
>>>> And that was due to missed optimisations caused by gcc not knowing
>>>> that those pointers don't alias each other.  Marking them restrict is
>>>> not good either, since we actually want to pass the same value
>>>> sometimes.
>>> That and one extra used register.
>> So what do we do?  I see the following options:
>> 1. Change mp3 decoder to work with inplace transform.
> Looks hard with no speed loss

Just hard or impossible?

>> 2. Copy the block before doing inplace transform.
> Speed loss

Yes, of course.  I was merely listing every option, good or bad.

>> 3. Apply magic to remove slowdown from splitting in/out.
>> Did I miss anything?
> Yes:
> 4. Have a special function pointer only for the 32-point DCT accepting
> in != out as in my patch in this thread (dct32_new.diff). Note that
> for the function for 32-point DCT (and only for it) in != out does not
> give a noticeable speed loss.

I'm sure you also see the slight ugliness in this.  If it's the only
sane solution, so be it, but I'd prefer something nicer.

M?ns Rullg?rd
mans at mansr.com

More information about the ffmpeg-devel mailing list