[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions

Michael Niedermayer michaelni
Thu Nov 15 03:43:20 CET 2007


On Tue, Nov 13, 2007 at 11:17:41PM +0100, Christophe GISQUET wrote:
> Michael Niedermayer a ?crit :
> > the code which is overall (whole decoder) fastest
> > and for cases where 2 are indistingishable the simpler one
> 
> Sorry for the delay in replying but it was somewhat worth it: testing on
> a P4 showed that at least one optimization was in fact degrading
> performance (special case in vc1_put_shift2_mmx when stride == offset).
> 
> Therefore, final (as far as I see) patch attached.
> 
> Summary:
> MMX version for VC-1 subpel motion compensation functions. 30% faster
> decoding.
> 
[...]
> +/**
> + * Data is already unpacked, so some operations can directly be made from
> + * memory.
> + */
> +static void vc1_put_hor_16b_shift2_mmx(uint8_t *dst, long int stride,
> +                                       const int16_t *src, int rnd)
> +{
> +    int h = 8;
> +    src -= 1;
> +
> +    asm volatile(
> +        LOAD_ROUNDER_MMX("%4")
> +        "1:                                \n\t"
> +        "movq      2*0+0(%1), %%mm1        \n\t"
> +        "movq      2*0+8(%1), %%mm2        \n\t"
> +        "movq      2*1+0(%1), %%mm3        \n\t"
> +        "movq      2*1+8(%1), %%mm4        \n\t"
> +        "paddsw    2*3+0(%1), %%mm1        \n\t"
> +        "paddsw    2*3+8(%1), %%mm2        \n\t"
> +        "paddsw    2*2+0(%1), %%mm3        \n\t"
> +        "paddsw    2*2+8(%1), %%mm4        \n\t"
> +        "psubsw    %%mm3, %%mm1            \n\t"
> +        "psubsw    %%mm4, %%mm2            \n\t"
> +        /* Multiplying by 9 here overflows */
> +        "psllw     $3, %%mm3               \n\t"
> +        "psllw     $3, %%mm4               \n\t"
> +        "psubsw    %%mm1, %%mm3            \n\t"
> +        "psubsw    %%mm2, %%mm4            \n\t"

what overflows here?
also please replace all p*sw by p*w if saturation happens then your code
is buggy


[...]
> +            return;
> +        }
> +        else { /* No horizontal filter, output 8 lines to dst */
> +            vc1_put_shift_8bits[vmode](dst, src, stride, 1-rnd, stride);
> +            return;
> +        }

the return can be factored out of teh if/else

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I count him braver who overcomes his desires than him who conquers his
enemies for the hardest victory is over self. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071115/ce9a46d2/attachment.pgp>



More information about the ffmpeg-devel mailing list