[FFmpeg-devel] Some AVX functions for 8-bit H.264 IDCT

James Darnley jdarnley at obe.tv
Fri Mar 17 15:18:35 EET 2017


A first draft of a patch set adding AVX functions for 8-bit H.264 IDCT.
Unfortunately they only provide a small speedup.  8-bit data isn't usually large
enough to take advantage of wider registers.  Although I admit I might have
missed the places where only MMX code exists but 16-byte registers would be
useful; or I just haven't reached them yet.

Regarding these patches: I still need to check that they work on 32-bit and
Windows (both sizes).  64-bit Linux was fine.  I also need to write a proper
subject line for most of them.

Finally, h264_idct_add16intra does not work so of course I won't push it if I
can't get it working.  I still included it here for completeness, future
reference, and for fresh eyes.

Initial timing data, Skylake-U:

h264_idct_add
    avx: 1.20x faster (658±0.8 vs. 547±0.2 decicycles) compared with mmxext
h264_idct_dc_add
    avx: 1.04x faster (521±1.7 vs. 501±1.1 decicycles) compared with mmxext
h264_idct8_add
    avx: 1.01x faster (1069±1.9 vs. 1060±0.7 decicycles) compared with sse2
h264_idct8_dc_add
    avx: 1.12x faster (638±12.7 vs. 568±4.3 decicycles) compared with mmxext
h264_idct_add16
    avx: 1.01x faster (2150±46.1 vs. 2118±29.0 decicycles) compared with sse2
h264_idct8_add4
    avx: 1.00x faster (2884±63.9 vs. 2880±21.1 decicycles) compared with sse2
h264_idct_add16intra
    avx: 1.02x faster (1580±4.8 vs. 1555±3.9 decicycles) compared with sse2



More information about the ffmpeg-devel mailing list