[FFmpeg-devel] [PATCH] SSE2 Xvid idct

Alexander Strange astrange
Sun Apr 13 23:25:26 CEST 2008


On Apr 13, 2008, at 6:26 AM, Michael Niedermayer wrote:
> On Sun, Apr 13, 2008 at 05:35:01AM -0400, Alexander Strange wrote:
>>
>> On Apr 12, 2008, at 8:15 AM, Michael Niedermayer wrote:
> [...]
>>>>   "psubsw   %%xmm6, %%xmm5          \n\t" \
>>>>   "movdqa   "ROW0", %%xmm4          \n\t" \
>>>>   "movdqa   "ROW4", %%xmm6          \n\t" \
>>>>   "movdqa   %%xmm2, "spill"         \n\t" \
>>>>   "movdqa   %%xmm4, %%xmm2          \n\t" \
>>>>   "psubsw   %%xmm6, %%xmm4          \n\t" \
>>>>   "paddsw   %%xmm2, %%xmm6          \n\t" \
>>>>   "movdqa   %%xmm6, %%xmm2          \n\t" \
>>>>   "psubsw   %%xmm7, %%xmm6          \n\t" \
>>>>   "paddsw   %%xmm2, %%xmm7          \n\t" \
>>>>   "movdqa   %%xmm4, %%xmm2          \n\t" \
>>>>   "psubsw   %%xmm5, %%xmm4          \n\t" \
>>>>   "paddsw   %%xmm2, %%xmm5          \n\t" \
>>>>   "movdqa   %%xmm5, %%xmm2          \n\t" \
>>>>   "psubsw   %%xmm0, %%xmm5          \n\t" \
>>>>   "paddsw   %%xmm2, %%xmm0          \n\t" \
>>>>   "movdqa   %%xmm4, %%xmm2          \n\t" \
>>>>   "psubsw   %%xmm3, %%xmm4          \n\t" \
>>>>   "paddsw   %%xmm2, %%xmm3          \n\t" \
>>>>   "movdqa  "spill", %%xmm2          \n\t" \
>>>
>>> #ifdef ARCH_X86_64
>>> # define XMMS   "%%xmm12"
>>> #else
>>> # define XMMS   "%%xmm2"
>>> #endif
>>> s/%%xmm2/XMMS/
>>>
>>> #ifndef ARCH_X86_64
>>> "movdqa   %%xmm2, "spill"         \n\t" \
>>> #endif
>>> ...
>>> #ifndef ARCH_X86_64
>>> "movdqa  "spill", %%xmm2          \n\t" \
>>> #endif
>>>
>>> or a
>>> MOV_ONLY_ON32" %%xmm2, ...
>>>
>>>
>>> And i think something similar can be don with ROW*
>>
>> Done. The row part is already optimal on 64 since pshufhw handles it.
>
> I meant the
>>    "movdqa   "ROW2", %%xmm4          \n\t" \
>>    "movdqa   "ROW6", %%xmm6          \n\t" \
> [...]
>>    "movdqa   "ROW0", %%xmm4          \n\t" \
>>    "movdqa   "ROW4", %%xmm6          \n\t" \
>
> they are unneeded on 64.

Oh, that. Done:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: idct_sse2_xvid.c
Type: application/octet-stream
Size: 15252 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080413/e76d35a9/attachment.obj>
-------------- next part --------------





More information about the ffmpeg-devel mailing list