[FFmpeg-devel] [PATCH 1/6] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

Fri Dec 2 01:56:16 EET 2016

On 2016-12-02 00:31, Carl Eugen Hoyos wrote:
> 2016-12-01 17:57 GMT+01:00 James Darnley <jdarnley at obe.tv>:
>> Yorkfield:
>>  - mmx2: 2.44x faster (278 vs. 114 cycles)
>>  - sse2: 3.35x faster (278 vs.  83 cycles)
>>
>> Skylake:
>>  - mmx2: 1.69x faster (169 vs. 100 cycles)
>>  - sse2: 2.34x faster (169 vs.  72 cycles)
> 
> Is it expected (or possible) that the speed impact is so
> different for different Intel hardware?

Yes.  Intel's Core branded processors introduced a much better
micro-architecture (the generation after the Yorkfield) which will cause
the scalar C code to be quite a bit faster.  The SIMD on the other hand
was already so quick it didn't gain much.

(At least I think I remember this being the story.)

>>  - avx:  2.32x faster (169 vs.  73 cycles)
> 
> Don't you agree that if this is true (I don't know if it is)
> the patch should not be applied as is?

I do agree and I wouldn't (deliberately) apply anything that made the
decoder slower, or not as fast as it could be.