[FFmpeg-devel] [PATCH] SSE2 Xvid idct

Michael Niedermayer michaelni
Mon Apr 14 04:26:19 CEST 2008


On Sun, Apr 13, 2008 at 10:10:21PM -0400, Alexander Strange wrote:
> On Sun, Apr 13, 2008 at 5:39 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > [..]
> >  >>>>
> >  >>>> #ifdef ARCH_X86_64
> >  >>>> # define XMMS   "%%xmm12"
> >  >>>> #else
> >  >>>> # define XMMS   "%%xmm2"
> >  >>>> #endif
> >  >>>> s/%%xmm2/XMMS/
> >  >>>>
> >  >>>> #ifndef ARCH_X86_64
> >  >>>> "movdqa   %%xmm2, "spill"         \n\t" \
> >  >>>> #endif
> >  >>>> ...
> >  >>>> #ifndef ARCH_X86_64
> >  >>>> "movdqa  "spill", %%xmm2          \n\t" \
> >  >>>> #endif
> >  >>>>
> >  >>>> or a
> >  >>>> MOV_ONLY_ON32" %%xmm2, ...
> >  >>>>
> >  >>>>
> >  >>>> And i think something similar can be don with ROW*
> >  >>>
> >  >>> Done. The row part is already optimal on 64 since pshufhw handles it.
> >  >>
> >  >> I meant the
> >  >>>    "movdqa   "ROW2", %%xmm4          \n\t" \
> >  >>>    "movdqa   "ROW6", %%xmm6          \n\t" \
> >  >> [...]
> >  >>>    "movdqa   "ROW0", %%xmm4          \n\t" \
> >  >>>    "movdqa   "ROW4", %%xmm6          \n\t" \
> >  >>
> >  >> they are unneeded on 64.
> >  >
> >  > Oh, that. Done:
> >
> >
> >  [...]
> >  > ///IDCT pass on columns, assuming rows 4-6 are zero.
> >                                            ^
> >  typo
> 
> Fixed.
> 
> >  [...]
> >  >     iLLM_HEAD
> >  >     ASMALIGN(4)
> >  >     JNZ("%%ecx", "2f")
> >  >     JNZ("%%eax", "3f")
> >  >     JNZ("%%edx", "4f")
> >  >     JNZ("%%ebx", "5f")
> >  >     iLLM_PASS_SPARSE("%0")
> >  >     "jmp 6f                                                      \n\t"
> >  >     "2:                                                          \n\t"
> >  >     iMTX_MULT("4*16(%0)", MANGLE(iTab1), "#", PUT_EVEN(ROW4))
> >  >     "3:                                                          \n\t"
> >  >     iMTX_MULT("5*16(%0)", MANGLE(iTab4), ROUND(walkenIdctRounders+4*16), PUT_ODD(ROW5))
> >  >     JZ("%%edx", "1f")
> >  >     "4:                                                          \n\t"
> >  >     iMTX_MULT("6*16(%0)", MANGLE(iTab3), ROUND(walkenIdctRounders+5*16), PUT_EVEN(ROW6))
> >  >     JZ("%%ebx", "1f")
> >  >     "5:                                                          \n\t"
> >  >     iMTX_MULT("7*16(%0)", MANGLE(iTab2), ROUND(walkenIdctRounders+5*16), PUT_ODD(ROW7))
> >  >     iLLM_HEAD
> >
> >  iLLM_HEAD is executed twice here
> 
> That's intentional, it turned out to be the best way to handle it on
> 32-bit. (call it a speculative prefetch)
> But we can get rid of it for x86-64, so I did.
> 
> >  >     iLLM_PASS("%0")
> >  >     "6:                                                          \n\t"
> >  >     : "+r"(block)
> >  >     :
> >  >     : "%eax", "%ecx", "%edx", "%ebx", "memory");
> >
> >  ebx + gcc + PIC -> problems
> >
> >  Also the changes to existing code are missing this time ...
> 
> changed to esi
> The others hadn't changed and I didn't want to repost them every time...

looks ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I have never wished to cater to the crowd; for what I know they do not
approve, and what they approve I do not know. -- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080414/3f114c15/attachment.pgp>



More information about the ffmpeg-devel mailing list