[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Ronald S. Bultje rsbultje
Mon Sep 27 18:15:16 CEST 2010


Hi,

On Fri, Sep 24, 2010 at 9:40 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Fri, Sep 24, 2010 at 07:33:11PM -0400, Ronald S. Bultje wrote:
>> Yeah, I over-enthusiastically screwed up here, sorry. First patch should still be ok,
>> I'll ask on gcc-list how to write a constant without the $. Without that, it'll be hard to
>> get the last 10 cycles off, I'm affraid...
>
> ?try %a0 and %c0 with "i" it produces a constant without $
> ?%n0 will produce a negated one

827 dezicycles in lf-strength, 4194155 runs, 149 skips

\o/ (on x86-64 above, so this is in fact 3 cycles faster than what I
got using yasm).

Patches attached, passes make fate-h264 on x86-64 and x86-32. Needs
testing on icc and clang.

fix-lfstrength-inline-asm.patch
inlines the dir loop in h264_loop_filter_strength_mmx2() - same as
what I sent earlier. 60-70% of the speed increase comes from here.

fix-lfstrength-unrollloop.patch
unrolls the bidir loop inside h264_loop_filter_strength_mmx2() (the
only part which changes d_idx) - this is required to make d_idx a
constant offset rather than calculating it in-code

fix-lfstrength-removevars.patch
removes d_idx, makes all offsets constant - preparation for the below patches

fix-lfstrength-removemask.patch
removes mask_dir - minor speed increase, not really related to anything else

fix-lfstrength-remove-edgevar.patch
removes the edge and b_idx variable duplication, and merges all
expressions using these to use direct constant offsets in asm
off(addr,idx,size) / [off+idx*size+off]. This removes all leas in the
code, which contributes to the remainder of the speed increase.

I might not have marked all memory-clobbers correctly ("r" everywhere
seems to work), so review would be good here.

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-lfstrength-unrollloop.patch
Type: application/octet-stream
Size: 2268 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100927/aecf634a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-lfstrength-removevars.patch
Type: application/octet-stream
Size: 7198 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100927/aecf634a/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-lfstrength-removemask.patch
Type: application/octet-stream
Size: 2032 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100927/aecf634a/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-lfstrength-remove-edgevar.patch
Type: application/octet-stream
Size: 8037 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100927/aecf634a/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-lfstrength-inline-asm.patch
Type: application/octet-stream
Size: 2946 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100927/aecf634a/attachment-0004.obj>



More information about the ffmpeg-devel mailing list