[Ffmpeg-devel] gcc4 support & MMX fixups (from Debian)

Paweł Sikora pluto
Tue Jan 31 21:17:58 CET 2006


Hi all,

I have an implementation of transpose4x4 in C which uses gcc's vector
extensions. It doesn't press register allocator so much and allows
optimal code scheduling.

Instantiation of attached patch e.g. in foo(dst, src, 4, 4)
gives a nice piece of code:

[ x86-64 example ]

foo:    movd        4(%rsi), %mm0
        movd        (%rsi), %mm1
        movd        8(%rsi), %mm2
        movd        12(%rsi), %mm3
        punpcklbw   %mm0, %mm1
        punpcklbw   %mm3, %mm2
        movq        %mm1, %mm0
        punpckhwd   %mm2, %mm1
        punpcklwd   %mm2, %mm0
        movd        %mm1, 8(%rdi)
        punpckhdq   %mm1, %mm1
        movd        %mm0, (%rdi)
        punpckhdq   %mm0, %mm0
        movd        %mm1, 12(%rdi)
        movd        %mm0, 4(%rdi)
        ret

actually gcc-4.1 has a good optimizer and happy asm. hardcoding
doesn't introduce incredible performance boost but only degradation
of code scheduling.

BR,
Pawel.

-- 
to_be || !to_be == 1, to_be | ~to_be == -1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ffmpeg-gcc4.patch
Type: text/x-diff
Size: 1774 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20060131/7a802117/attachment.patch>



More information about the ffmpeg-devel mailing list