[FFmpeg-devel] swscale/swscale_unscaled : add X86_64 (SSE2, AVX) for uyvyto422

James Almer jamrial at gmail.com
Tue Apr 3 03:10:07 EEST 2018

On 4/2/2018 8:33 PM, Carl Eugen Hoyos wrote:
> 2018-04-02 23:26 GMT+02:00, Martin Vignali <martin.vignali at gmail.com>:
>> Around 20% faster  (on a "benchmark cmd", who test pix_fmt conversion)
>> (4.2s with the patch, 5.2s without)
>> Pass fate test for me.
>> Checkasm result :
>> uyvytoyuv422_c: 14146.6
>> uyvytoyuv422_mmx: 13696.4
>> uyvytoyuv422_mmxext: 19395.9
> Something looks wrong here...
> Carl Eugen

On a Haswell using GCC i get

uyvytoyuv422_c: 44884.2
uyvytoyuv422_mmx: 15284.5
uyvytoyuv422_mmxext: 28656.5
uyvytoyuv422_sse2: 10921.8
uyvytoyuv422_avx: 10606.5

Martin is using a Clang version that is for some reason ignoring our
attempts at disabling tree vectorization, so his C function is optimized
with simd by the compiler, hence the good result.

The mmxext version being slower than the mmx one seems however to be an
existing issue in the tree, which we should probably deal with. Unless
of course the test is wrong.

More information about the ffmpeg-devel mailing list