[FFmpeg-devel] [PATCH] SSE2 Xvid idct

Mon Apr 14 04:10:21 CEST 2008

On Sun, Apr 13, 2008 at 5:39 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> [..]
>  >>>>
>  >>>> #ifdef ARCH_X86_64
>  >>>> # define XMMS   "%%xmm12"
>  >>>> #else
>  >>>> # define XMMS   "%%xmm2"
>  >>>> #endif
>  >>>> s/%%xmm2/XMMS/
>  >>>>
>  >>>> #ifndef ARCH_X86_64
>  >>>> "movdqa   %%xmm2, "spill"         \n\t" \
>  >>>> #endif
>  >>>> ...
>  >>>> #ifndef ARCH_X86_64
>  >>>> "movdqa  "spill", %%xmm2          \n\t" \
>  >>>> #endif
>  >>>>
>  >>>> or a
>  >>>> MOV_ONLY_ON32" %%xmm2, ...
>  >>>>
>  >>>>
>  >>>> And i think something similar can be don with ROW*
>  >>>
>  >>> Done. The row part is already optimal on 64 since pshufhw handles it.
>  >>
>  >> I meant the
>  >>>    "movdqa   "ROW2", %%xmm4          \n\t" \
>  >>>    "movdqa   "ROW6", %%xmm6          \n\t" \
>  >> [...]
>  >>>    "movdqa   "ROW0", %%xmm4          \n\t" \
>  >>>    "movdqa   "ROW4", %%xmm6          \n\t" \
>  >>
>  >> they are unneeded on 64.
>  >
>  > Oh, that. Done:
>
>
>  [...]
>  > ///IDCT pass on columns, assuming rows 4-6 are zero.
>                                            ^
>  typo

Fixed.

>  [...]
>  >     iLLM_HEAD
>  >     ASMALIGN(4)
>  >     JNZ("%%ecx", "2f")
>  >     JNZ("%%eax", "3f")
>  >     JNZ("%%edx", "4f")
>  >     JNZ("%%ebx", "5f")
>  >     iLLM_PASS_SPARSE("%0")
>  >     "jmp 6f                                                      \n\t"
>  >     "2:                                                          \n\t"
>  >     iMTX_MULT("4*16(%0)", MANGLE(iTab1), "#", PUT_EVEN(ROW4))
>  >     "3:                                                          \n\t"
>  >     iMTX_MULT("5*16(%0)", MANGLE(iTab4), ROUND(walkenIdctRounders+4*16), PUT_ODD(ROW5))
>  >     JZ("%%edx", "1f")
>  >     "4:                                                          \n\t"
>  >     iMTX_MULT("6*16(%0)", MANGLE(iTab3), ROUND(walkenIdctRounders+5*16), PUT_EVEN(ROW6))
>  >     JZ("%%ebx", "1f")
>  >     "5:                                                          \n\t"
>  >     iMTX_MULT("7*16(%0)", MANGLE(iTab2), ROUND(walkenIdctRounders+5*16), PUT_ODD(ROW7))
>  >     iLLM_HEAD
>
>  iLLM_HEAD is executed twice here

That's intentional, it turned out to be the best way to handle it on
32-bit. (call it a speculative prefetch)
But we can get rid of it for x86-64, so I did.

>  >     iLLM_PASS("%0")
>  >     "6:                                                          \n\t"
>  >     : "+r"(block)
>  >     :
>  >     : "%eax", "%ecx", "%edx", "%ebx", "memory");
>
>  ebx + gcc + PIC -> problems
>
>  Also the changes to existing code are missing this time ...

changed to esi
The others hadn't changed and I didn't want to repost them every time...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sse2-permute.diff
Type: application/octet-stream
Size: 1340 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080413/c78e1651/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sse2-xvid-idct.diff
Type: application/octet-stream
Size: 1826 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080413/c78e1651/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: idct_sse2_xvid.c
Type: application/octet-stream
Size: 15375 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080413/c78e1651/attachment-0002.obj>