[FFmpeg-devel] [PATCH] vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function.

Ronald S. Bultje rsbultje at gmail.com
Mon Oct 12 16:28:31 CEST 2015


Hi,

On Tue, Oct 6, 2015 at 1:31 PM, Henrik Gramner <henrik at gramner.com> wrote:

> On Tue, Oct 6, 2015 at 3:43 PM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> > ---
> >  libavcodec/x86/Makefile                     |   1 +
> >  libavcodec/x86/vp9dsp_init.c                |   4 +-
> >  libavcodec/x86/vp9dsp_init.h                |  15 ++--
> >  libavcodec/x86/vp9dsp_init_16bpp_template.c |  14 +++-
> >  libavcodec/x86/vp9itxfm.asm                 |  16 +----
> >  libavcodec/x86/vp9itxfm_16bpp.asm           | 108
> ++++++++++++++++++++++++++++
> >  libavcodec/x86/vp9itxfm_template.asm        |  37 ++++++++++
> >  7 files changed, 173 insertions(+), 22 deletions(-)
> >  create mode 100644 libavcodec/x86/vp9itxfm_16bpp.asm
> >  create mode 100644 libavcodec/x86/vp9itxfm_template.asm
>
> Did you look into using SSE2 instead? That would eliminate
> instructions in some parts but might make other parts more complex.
> Note that some MMX instructions only has half the throughput of
> equivalent SSE/AVX ones in Skylake (and most likely future Intel
> µarchs as well).


Hm, ok, so I need to look into that, since that will affect the 8bpp
assembly also. I'll probably want to create sse2 versions of all these
functions. I'll do that separately if that's OK.

Ronald


More information about the ffmpeg-devel mailing list