[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms

Michael Niedermayer michaelni
Tue Jan 22 20:24:13 CET 2008


On Sun, Jan 20, 2008 at 12:37:21PM +0100, Christophe GISQUET wrote:
> Hi,
> 
> Michael Niedermayer a ?crit :
> > i think, the following is safe
> > 
> >         t1 = src[0] + src[2];
> >         t2 = src[0] - src[2];
> >         t1= 8*t1 + (t1>>1);
> >         t2= 8*t2 + (t2>>1);
> > 
> >         t3 = 11 * src[1] + 5 * src[3];
> >         t4 = 11 * src[3] - 5 * src[1];
> > 
> >         dst[0] = (t1 + t3 + 2) >> 2;
> >         dst[1] = (t2 - t4 + 2) >> 2;
> >         dst[2] = (t2 + t4 + 2) >> 2;
> >         dst[3] = (t1 - t3 + 2) >> 2;
> [...]
> 
> Ok I've implemented that. I also tried to decompose t3 and t4 as:
> t3 = 5(2s1+s3) + s1
> t4 = 5(2s3-s1) + s3
> (trading one constant loading from memory and 2 multiplies for 2 shift
> and 2 additions)
> 
> But this is slower, and in fact I can load the multiply constants in
> registers (by loading the bias from memory instead), further increasing
> the speed difference.
> 
> 1D2 ~ 1080 dezicycles
> 1D3 ~ 1120
> 
> Anyway, that's mostly for reference, as it was shown the 4x4 dct is not
> relevant speedwise and the code for transposing the zz scantables is not
> provided.
> 
> Best regards,
> -- 
> Christophe GISQUET

> Index: libavcodec/i386/vc1dsp_mmx.c
> ===================================================================
> --- libavcodec/i386/vc1dsp_mmx.c	(r?vision 11559)
> +++ libavcodec/i386/vc1dsp_mmx.c	(copie de travail)
> @@ -467,6 +467,121 @@
>  DECLARE_FUNCTION(3, 2)
>  DECLARE_FUNCTION(3, 3)
>  
> +/* out: d0=R1 d1=R0 d2=R3 d3=R2 */
> +#define IDCT4_1D2(R0, R1, R2, R3, TMP0, TMP1, ADD, SHIFT)       \
> +    SUMSUB_BA(R2, R0)  /* R2=s0+s2 R0=s0-s2 */                  \

> +    "movq       "#R0", "#TMP0" \n\t"                            \
> +    "movq       "#R2", "#TMP1" \n\t"                            \
> +    "psllw      $3, "#R0" \n\t"                                 \
> +    "psllw      $3, "#R2" \n\t"                                 \
> +    "paddw      "MANGLE(ADD)", "#TMP0" \n\t"                    \
> +    "paddw      "MANGLE(ADD)", "#TMP1" \n\t"                    \

maybe the following is faster:

movq MANGLE(ADD), TMP0
movq MANGLE(ADD), TMP1
paddw R0, TMP0
paddw R1, TMP1
psllw $3, R0
psllw $3, R1



> +    "psraw      $1, "#TMP0" \n\t"                               \
> +    "psraw      $1, "#TMP1" \n\t"                               \
> +    "paddw      "#TMP0", "#R0" \n\t"                            \
> +    "paddw      "#TMP1", "#R2" \n\t"                            \

> +    "movq       "#R1", "#TMP0" \n\t"                            \
> +    "movq       "#R3", "#TMP1" \n\t"                            \
> +    "pmullw     %%mm7, "#R1" \n\t"                              \
> +    "pmullw     %%mm7, "#R3" \n\t"                              \
> +    "pmullw     %%mm6, "#TMP0" \n\t"                            \
> +    "pmullw     %%mm6, "#TMP1" \n\t"                            \
> +    "psubw      "#TMP0", "#R3" \n\t"                            \
> +    "paddw      "#TMP1", "#R1" \n\t"                            \

t= 5(A+B)
X= t+ 6A
Y= t-16B

movq   R1, TMP0
paddw  R3, R1
psllw  $4, R3
pmullw mm7(=6), TMP0
pmullw mm6(=5), R1
paddw  R1, TMP0
psubw  R3, R1


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080122/f7aa877e/attachment.pgp>



More information about the ffmpeg-devel mailing list