[FFmpeg-devel] [PATCH] [WIP] dnxhdenc: get_pixels_8x4_sym_10bit_sse2

Ronald S. Bultje rsbultje at gmail.com
Wed Apr 9 14:48:02 CEST 2014


Hi,

On Tue, Apr 8, 2014 at 11:51 PM, Timothy Gu <timothygu99 at gmail.com> wrote:

> On Tue, Apr 8, 2014 at 8:42 PM, Timothy Gu <timothygu99 at gmail.com> wrote:
> > Before:
> > 3383 decicycles in dnxhd_10bit_get_pixels_8x4_sym, 130910 runs, 162 skips
> > After:
> > 750 decicycles in ff_get_pixels_8x4_sym_10bit_sse2, 130999 runs, 73 skips
> >
> > Overall performance impact negligible.
> >
> > Signed-off-by: Timothy Gu <timothygu99 at gmail.com>
> > ---
> >  libavcodec/x86/dnxhdenc.asm    | 41
> +++++++++++++++++++++++++++++------------
> >  libavcodec/x86/dnxhdenc_init.c |  4 ++++
> >  2 files changed, 33 insertions(+), 12 deletions(-)
> >
> > diff --git a/libavcodec/x86/dnxhdenc.asm b/libavcodec/x86/dnxhdenc.asm
> > index 9dd6d51..d42530b 100644
> > --- a/libavcodec/x86/dnxhdenc.asm
> > +++ b/libavcodec/x86/dnxhdenc.asm
> > @@ -26,18 +26,30 @@ section .text
> >
> >  ; void get_pixels_8x4_sym_sse2(int16_t *block, const uint8_t *pixels,
> >  ;                              ptrdiff_t line_size)
> > -INIT_XMM sse2
> > -cglobal get_pixels_8x4_sym, 3,3,5, block, pixels, linesize
> > -    pxor      m4,       m4
> > -    movq      m0,       [pixelsq]
> > -    add       pixelsq,  linesizeq
> > -    movq      m1,       [pixelsq]
> > -    movq      m2,       [pixelsq+linesizeq]
> > -    movq      m3,       [pixelsq+linesizeq*2]
> > -    punpcklbw m0,       m4
> > -    punpcklbw m1,       m4
> > -    punpcklbw m2,       m4
> > -    punpcklbw m3,       m4
> > +
> > +%macro GET_PIXELS 1
> > +%if %1 == 8
> > +cglobal get_pixels_8x4_sym,       3,3,5, block, pixels, linesize
> > +%elif %1 == 16
> > +cglobal get_pixels_8x4_sym_10bit, 3,3,4, block, pixels, linesize
> > +%endif
> > +    %if %1 == mmsize/2
> > +        pxor        m4, m4
> > +        %define LOAD movh
> > +    %elif %1 == mmsize && %1 == 16
> > +        %define LOAD movu
> > +    %endif
> > +    LOAD            m0, [pixelsq]
> > +    add        pixelsq, linesizeq
> > +    LOAD            m1, [pixelsq]
> > +    LOAD            m2, [pixelsq+linesizeq]
> > +    LOAD            m3, [pixelsq+linesizeq*2]
>
> I probably messed up the loading and linesize here. Can someone give
> me a pointer on how to fix it?


You mean it doesn't work? What's the C code?

Ronald


More information about the ffmpeg-devel mailing list