[FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

James Darnley jdarnley at obe.tv
Tue Nov 29 18:14:35 EET 2016


On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
> 2016-11-29 12:52 GMT+01:00 James Darnley <jdarnley at obe.tv>:
>> sse2:
>> complex: 4.13x faster (1514 vs. 367 cycles)
>> simple:  4.38x faster (1836 vs. 419 cycles)
>>
>> avx:
>> complex: 1.07x faster (260 vs. 244 cycles)
>> simple:  1.03x faster (284 vs. 274 cycles)
> 
> What are you comparing?

I stuck a timer around the call to the h264dsp function in
libavcodec/h264_mb_template.c.  Using STOP_TIMER(__func__) let me get a
different message for each function created.  The two functions my code
was called from were hl_decode_mb_simple_16 and hl_decode_mb_complex.

The video being decoded was one from fate concatenated together several
times.

The AVX comparison is it versus SSE2.



More information about the ffmpeg-devel mailing list