[FFmpeg-devel] [PATCH 0/6] sse2/avx functions for 8-bit simple idct

Mon Jun 12 16:36:03 EEST 2017

I think I have reached the final state for these patches.  There has been little
change to the 1st, 3rd, 4th, and 5th.

The 2nd adds an option to explicitly control what the macro does after the IDCT.
This allows the small optimisation for 8-bit of not storing the data back to the
source block.

The 6th lets the IDCT use the slightly different coefficients to get exact
output compared with the MMX original.  This is rather messy but I think it is
slightly better than trying to alter the code macro.  A word diff looks much
cleaner than the line diff git uses by default.

If people would kindly give their opinion on the 2nd and 6th patches in
particular I would greatly appreciate it.

Performance gain decoding an MPEG2 HD sample over the old MMX:
 - Yorkfield: 210 to 224 fps
 - Haswell:   387 to 426 fps

Would anyone like me to get some timer figures for the functions themselves?

James Darnley (6):
  avcodec/x86: cleanup simple_idct10
  avcodec/x86: modify simple_idct10 macros to add an action paramter
  avcodec/x86: add x86-64 8-bit simple_idct function
  avcodec/x86: add x86-64 8-bit simple_idct put function
  avcodec/x86: add x86-64 8-bit simple_idct add function
  avcodec/x86: allow 8-bit simple_idct to use slightly different
    coefficients

 libavcodec/tests/x86/dct.c                |   2 +
 libavcodec/x86/idctdsp_init.c             |  23 +++++
 libavcodec/x86/proresdsp.asm              |  22 ++---
 libavcodec/x86/simple_idct.h              |   9 ++
 libavcodec/x86/simple_idct10.asm          | 139 ++++++++++++++++++++++++++----
 libavcodec/x86/simple_idct10_template.asm | 136 ++++++++++++++++-------------
 6 files changed, 244 insertions(+), 87 deletions(-)

-- 
2.13.0