[FFmpeg-devel] [PATCH] vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function.

Henrik Gramner henrik at gramner.com
Tue Oct 6 19:31:20 CEST 2015


On Tue, Oct 6, 2015 at 3:43 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> ---
>  libavcodec/x86/Makefile                     |   1 +
>  libavcodec/x86/vp9dsp_init.c                |   4 +-
>  libavcodec/x86/vp9dsp_init.h                |  15 ++--
>  libavcodec/x86/vp9dsp_init_16bpp_template.c |  14 +++-
>  libavcodec/x86/vp9itxfm.asm                 |  16 +----
>  libavcodec/x86/vp9itxfm_16bpp.asm           | 108 ++++++++++++++++++++++++++++
>  libavcodec/x86/vp9itxfm_template.asm        |  37 ++++++++++
>  7 files changed, 173 insertions(+), 22 deletions(-)
>  create mode 100644 libavcodec/x86/vp9itxfm_16bpp.asm
>  create mode 100644 libavcodec/x86/vp9itxfm_template.asm

Did you look into using SSE2 instead? That would eliminate
instructions in some parts but might make other parts more complex.
Note that some MMX instructions only has half the throughput of
equivalent SSE/AVX ones in Skylake (and most likely future Intel
µarchs as well).


More information about the ffmpeg-devel mailing list