[FFmpeg-devel] [PATCH] move H264 mmx2 (bi)weight functions to yasm

Ronald S. Bultje rsbultje
Wed Sep 1 20:01:42 CEST 2010


Hi,

On Wed, Sep 1, 2010 at 11:56 AM, Aurelien Jacobs <aurel at gnuage.org> wrote:
> On Wed, Sep 01, 2010 at 09:58:51AM -0400, Ronald S. Bultje wrote:
>> --- ffmpeg-svn.orig/libavcodec/x86/Makefile ? 2010-09-01 09:21:59.000000000 -0400
>> +++ ffmpeg-svn/libavcodec/x86/Makefile ? ? ? ?2010-09-01 09:22:30.000000000 -0400
>> @@ -10,7 +10,7 @@
>>
>> ?MMX-OBJS-$(CONFIG_H264DSP) ? ? ? ? ? ? += x86/h264dsp_mmx.o
>> ?YASM-OBJS-$(CONFIG_H264DSP) ? ? ? ? ? ?+= x86/h264_deblock_sse2.o ? ? ? \
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?x86/h264_weight_sse2.o ? ? ? ?\
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?x86/h264_weight.o ? ? ? ?\
>
> Alignment...

Fixed.

Jason complained I didn't integrate the mmx2 and sse2 ones (which is
why yasm is so awesome), and also suggested I write sse2/ssse3
versions for the non-square (8x16/16x8/8x4) ones. attached patch tries
to do all that (Jason pretty bluntly said he'd refuse a patch that
doesn't do this all at once).

Anyone fancy profiling this? I just tested make fate-h264 on x86-64
and x86-32 with SSSE3 or SSSE3&&SSE2 disabled to ensure output is
correct. It's probably faster but I didn't test.

Right now, with all my patches, for H264 some loopfilter and idct code
is still inline asm (and qpel, as part of dsputil_mmx.c include), the
rest of H264 asm is all yasm. Let's try to free us of the last pieces!

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: yasmify-h264-biweight.patch
Type: application/octet-stream
Size: 17182 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100901/8211149e/attachment.obj>



More information about the ffmpeg-devel mailing list