[FFmpeg-devel] [PATCH] dnxhd get_pixels_4x8_sym sse

Michael Niedermayer michaelni
Thu Dec 11 23:15:05 CET 2008


On Wed, Dec 10, 2008 at 06:22:15PM -0800, Baptiste Coudurier wrote:
> Michael Niedermayer wrote:
> > On Wed, Dec 10, 2008 at 05:39:05PM -0800, Baptiste Coudurier wrote:
> >> Hi,
> >>
> >> $subject, I don't polute dsputil more since this is used only by dnxhd
> >> encoder AFAIK.
> > [...]
> > 
> >> @@ -158,8 +159,13 @@
> >>  
> >>      dsputil_init(&ctx->m.dsp, avctx);
> >>      ff_dct_common_init(&ctx->m);
> >> +#ifdef HAVE_MMX
> >> +    ff_dnxhd_init_mmx(ctx);
> >> +#endif
> >>      if (!ctx->m.dct_quantize)
> >>          ctx->m.dct_quantize = dct_quantize_c;
> >> +    if (!ctx->get_pixels_4x8_sym)
> >> +        ctx->get_pixels_4x8_sym = dnxhd_get_pixels_4x8;
> >>  
> >>      ctx->m.mb_height = (avctx->height + 15) / 16;
> >>      ctx->m.mb_width  = (avctx->width  + 15) / 16;
> > 
> > am i missing something or could the if() be avoided by setting the pointer
> > before dsputil_init/ff_dnxhd_init_mmx ?
> 
> Hum, right, interesting, it's just usually this way everywhere else.
> Changed.
> 

> > also isnt it 8x4 instead of 4x8 (that being a seperate fix unrelated to this
> > patch of course) i just noticed it ...
> 
> Well it's 4 rows of 8 pixels, I guess it depends on how you see it.

of course
i just think the width x height variant is more common in libavcodec and
various ITU/MPEG specs than height x width

but besides the bikeshed of which to take, it should be consistant throughout
libav*

[...]

> Updated patch attached.

[...]
> Index: libavcodec/dnxhdenc.c
> ===================================================================
> --- libavcodec/dnxhdenc.c	(revision 16051)
> +++ libavcodec/dnxhdenc.c	(working copy)

> @@ -30,6 +30,7 @@
>  #include "dnxhdenc.h"
>  
>  int dct_quantize_c(MpegEncContext *s, DCTELEM *block, int n, int qscale, int *overflow);
> +static void dnxhd_get_pixels_4x8(DCTELEM *restrict block, const uint8_t *pixels, int line_size);
>  
>  #define LAMBDA_FRAC_BITS 10
>  

maybe functions should be reordered to avoid this? (that of course is a
seperate issue ...)


[...]
> +static void get_pixels_4x8_sym_sse2(DCTELEM *block, const uint8_t *pixels, int line_size)
> +{
> +    __asm__ volatile(
> +        "pxor %%xmm7,      %%xmm7       \n\t"
> +        "movq (%0),        %%xmm0       \n\t"
> +        "movq (%0, %2),    %%xmm1       \n\t"
> +        "movq (%0, %2,2),  %%xmm2       \n\t"
> +        "movq (%0, %3),    %%xmm3       \n\t"
> +        "punpcklbw %%xmm7, %%xmm0       \n\t"
> +        "punpcklbw %%xmm7, %%xmm1       \n\t"
> +        "punpcklbw %%xmm7, %%xmm2       \n\t"
> +        "punpcklbw %%xmm7, %%xmm3       \n\t"
> +        "movdqa %%xmm0,      (%1)       \n\t"
> +        "movdqa %%xmm1,    16(%1)       \n\t"
> +        "movdqa %%xmm2,    32(%1)       \n\t"
> +        "movdqa %%xmm3,    48(%1)       \n\t"
> +        "movdqa %%xmm3 ,   64(%1)       \n\t"
> +        "movdqa %%xmm2 ,   80(%1)       \n\t"
> +        "movdqa %%xmm1 ,   96(%1)       \n\t"
> +        "movdqa %%xmm0,   112(%1)       \n\t"
> +        :
> +        : "r" (pixels), "r" (block), "r" ((x86_reg)line_size), "r" ((x86_reg)line_size*3)

maybe 
"movq (%0),        %%xmm0       \n\t"
"add  %2,          %0           \n\t"
"movq (%0),        %%xmm1       \n\t"
"movq (%0, %2),    %%xmm2       \n\t"
"movq (%0, %2,2),  %%xmm3       \n\t"

is faster (it doesnt need line_size*3), then, maybe not
this would require pixels to be "+r" ...

whichever is faster the patch (with the faster one) is ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If a bugfix only changes things apparently unrelated to the bug with no
further explanation, that is a good sign that the bugfix is wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081211/5404fd7f/attachment.pgp>



More information about the ffmpeg-devel mailing list