[FFmpeg-devel] [PATCH] x86/dsputilenc: optimize sum_abs_dctelem functions

James Almer jamrial at gmail.com
Sun May 25 01:18:01 CEST 2014


On 24/05/14 7:45 PM, Michael Niedermayer wrote:
> On Sat, May 24, 2014 at 07:39:22PM -0300, James Almer wrote:
>> Use a single register as accumulator, and make the SUM_ABS_DCTELEM
>> macro more readable
>>
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>>  libavcodec/x86/dsputilenc.asm | 44 ++++++++++++++++++-------------------------
>>  1 file changed, 18 insertions(+), 26 deletions(-)
> 
> what effect does this have on speed ?

SSE2
Before  300 decicycles in dctelem, 1048574 runs, 2 skips
After:  298 decicycles in dctelem, 1048574 runs, 2 skips

SSSE3
Before: 289 decicycles in dctelem, 1048574 runs, 2 skips
After:  293 decicycles in dctelem, 1048573 runs, 3 skips

This was encoding a 1 minute long 1920x1080 video using the snow encoder.
I originally tested this on an SSE2 only machine, so i didn't see the hit 
on the SSSE3 version. Sorry about that.

I'll send a patch that only adds the macro changes but leaves the assembly 
intact.


More information about the ffmpeg-devel mailing list