[FFmpeg-devel] swscale/swscale_unscaled : add X86_64 (SSE2, AVX) for uyvyto422
jamrial at gmail.com
Tue Apr 3 03:10:07 EEST 2018
On 4/2/2018 8:33 PM, Carl Eugen Hoyos wrote:
> 2018-04-02 23:26 GMT+02:00, Martin Vignali <martin.vignali at gmail.com>:
>> Around 20% faster (on a "benchmark cmd", who test pix_fmt conversion)
>> (4.2s with the patch, 5.2s without)
>> Pass fate test for me.
>> Checkasm result :
>> uyvytoyuv422_c: 14146.6
>> uyvytoyuv422_mmx: 13696.4
>> uyvytoyuv422_mmxext: 19395.9
> Something looks wrong here...
> Carl Eugen
On a Haswell using GCC i get
Martin is using a Clang version that is for some reason ignoring our
attempts at disabling tree vectorization, so his C function is optimized
with simd by the compiler, hence the good result.
The mmxext version being slower than the mmx one seems however to be an
existing issue in the tree, which we should probably deal with. Unless
of course the test is wrong.
More information about the ffmpeg-devel