[FFmpeg-devel] Some AVX functions for 8-bit H.264 IDCT
James Darnley
jdarnley at obe.tv
Fri Mar 17 15:18:35 EET 2017
A first draft of a patch set adding AVX functions for 8-bit H.264 IDCT.
Unfortunately they only provide a small speedup. 8-bit data isn't usually large
enough to take advantage of wider registers. Although I admit I might have
missed the places where only MMX code exists but 16-byte registers would be
useful; or I just haven't reached them yet.
Regarding these patches: I still need to check that they work on 32-bit and
Windows (both sizes). 64-bit Linux was fine. I also need to write a proper
subject line for most of them.
Finally, h264_idct_add16intra does not work so of course I won't push it if I
can't get it working. I still included it here for completeness, future
reference, and for fresh eyes.
Initial timing data, Skylake-U:
h264_idct_add
avx: 1.20x faster (658±0.8 vs. 547±0.2 decicycles) compared with mmxext
h264_idct_dc_add
avx: 1.04x faster (521±1.7 vs. 501±1.1 decicycles) compared with mmxext
h264_idct8_add
avx: 1.01x faster (1069±1.9 vs. 1060±0.7 decicycles) compared with sse2
h264_idct8_dc_add
avx: 1.12x faster (638±12.7 vs. 568±4.3 decicycles) compared with mmxext
h264_idct_add16
avx: 1.01x faster (2150±46.1 vs. 2118±29.0 decicycles) compared with sse2
h264_idct8_add4
avx: 1.00x faster (2884±63.9 vs. 2880±21.1 decicycles) compared with sse2
h264_idct_add16intra
avx: 1.02x faster (1580±4.8 vs. 1555±3.9 decicycles) compared with sse2
More information about the ffmpeg-devel
mailing list