[FFmpeg-devel] [PATCH] Speed up dct32() in mpegaudiodec and make it avoid trashing its input

Vitor Sessak vitor1001
Mon Jun 7 08:33:36 CEST 2010


On 06/07/2010 02:31 AM, Michael Niedermayer wrote:
> On Sun, Jun 06, 2010 at 04:16:27PM +0200, Vitor Sessak wrote:
>> $subj. This should make the function suitable to be moved to the common DCT
>> framework after my patch in the thread "[PATCH] SSE dct32()".
>>
>> Benchmarks:
>>
>> Fixed point, patched:
>> 4554 dezicycles in dct32, 128 runs, 0 skips
>> 4880 dezicycles in dct32, 256 runs, 0 skips
>> 5078 dezicycles in dct32, 512 runs, 0 skips
>> 4443 dezicycles in dct32, 1024 runs, 0 skips
>> 4112 dezicycles in dct32, 2048 runs, 0 skips
>> 4122 dezicycles in dct32, 4095 runs, 1 skips
>> 4054 dezicycles in dct32, 8190 runs, 2 skips
>> 4008 dezicycles in dct32, 16379 runs, 5 skips
>> 3968 dezicycles in dct32, 32759 runs, 9 skips
>> 3911 dezicycles in dct32, 65516 runs, 20 skips
>> 3868 dezicycles in dct32, 131042 runs, 30 skips
>> 3844 dezicycles in dct32, 262075 runs, 69 skipss
>> 3860 dezicycles in dct32, 524151 runs, 137 skipss
>> 3881 dezicycles in dct32, 1048328 runs, 248 skips
>> 3852 dezicycles in dct32, 2096579 runs, 573 skips
>> 3838 dezicycles in dct32, 4193100 runs, 1204 skips
>> 3831 dezicycles in dct32, 8386205 runs, 2403 skips
>
> seeing the whole output is not interrestingm seeing the last score
> of several runs is interresting

ok.

Fixed point, patched:
3847 dezicycles in dct32, 8386234 runs, 2374 skips
3822 dezicycles in dct32, 8386575 runs, 2033 skips
3846 dezicycles in dct32, 8386386 runs, 2222 skips

Floating point, patched:
3384 dezicycles in dct32_float, 8386658 runs, 1950 skips
3494 dezicycles in dct32_float, 8386603 runs, 2005 skips
3451 dezicycles in dct32_float, 8386525 runs, 2083 skips

Fixed point, original:
4488 dezicycles in dct32, 8385764 runs, 2844 skips
4473 dezicycles in dct32, 8386027 runs, 2581 skips
4485 dezicycles in dct32, 8386185 runs, 2423 skips

Floating point, original:
3781 dezicycles in dct32_float, 8386360 runs, 2248 skips
3766 dezicycles in dct32_float, 8386079 runs, 2529 skips
3798 dezicycles in dct32_float, 8385870 runs, 2738 skips

>> -#define ADD(a, b) tab[a] += tab[b]
>> +#define ADD(a, b) val##a += val##b
>>
>> +
>> +#define SWAPSUM(a,b,c)\
>> +{\
>> +    FFSWAP(INTFLOAT, val##a, val##b);\
>> +    ADD(a, c);                     \
>> +}
>
> swaping variables is always a redundant operation in code lacking
> backward branches.

It's true, but I was expecting the compiler to optimize it out. The code 
was done this way to match the code in my SSE version, in which the same 
macro did FFSWAP(float, out[a], out[b]);. But it is better not to trust 
the compiler and a new version is attached.

-Vitor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mp3_dct32_2.diff
Type: text/x-patch
Size: 5387 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100607/6179298a/attachment.bin>



More information about the ffmpeg-devel mailing list