[FFmpeg-devel] [PATCH] h264pred16x16 plane sse2/ssse3 optimizations

Thu Sep 30 03:17:06 CEST 2010

On Wed, Sep 29, 2010 at 08:56:13PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> On Wed, Sep 29, 2010 at 8:51 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Tue, Sep 28, 2010 at 10:31:51PM -0400, Ronald S. Bultje wrote:
> >> + ? ?lea ? ? ? ? ?r4, [r0+r2*8-1]
> >> + ? ?lea ? ? ? ? ?r3, [r0+r2*4-1]
> >> + ? ?add ? ? ? ? ?r4, r2
> >> +
> >> +%ifdef ARCH_X86_64
> >> +%define e_reg r11
> >> +%else
> >> +%define e_reg r0
> >> +%endif
> >> +
> >
> > i see alot of r0-1 maybe r0 could be decreased by 1 somewhere?
> 
> Yes, this is actually both smaller/simpler and also faster. Changed.
> 
> >> + ? ?movzx ? ? e_reg, byte [r3+r1 ? ?]
> >> + ? ?movzx ? ? ? ?r5, byte [r4+r2*2 ?]
> >> + ? ?sub ? ? ? ? ?r5, e_reg
> >> + ? ?shl ? ? ? ? ?r5, 2
> >> +
> >> + ? ?movzx ? ? e_reg, byte [r3 ? ? ? ]
> >> + ? ?movzx ? ? ? ?r6, byte [r4+r2 ? ?]
> >> + ? ?sub ? ? ? ? ?r6, e_reg
> >> + ? ?lea ? ? ? ? ?r5, [r5+r6*4]
> >> + ? ?sub ? ? ? ? ?r5, r6
> >> +
> >> + ? ?movzx ? ? e_reg, byte [r3+r2 ? ?]
> >> + ? ?movzx ? ? ? ?r6, byte [r4 ? ? ? ]
> >> + ? ?sub ? ? ? ? ?r6, e_reg
> >> + ? ?lea ? ? ? ? ?r5, [r5+r6*2]
> >> +
> >> + ? ?movzx ? ? e_reg, byte [r3+r2*2 ?]
> >> + ? ?movzx ? ? ? ?r6, byte [r4+r1 ? ?]
> >> + ? ?sub ? ? ? ? ?r6, e_reg
> >> + ? ?add ? ? ? ? ?r5, r6
> >
> > this and the shl 2 case look like they could be merged like
> > add+shl->lea
> 
> Also changed.
> 
> >> + ? ?lea ? ? ? ? ?r3, [r4+r2*4 ?]
> >> +
> >> + ? ?movzx ? ? e_reg, byte [r0+r1 ?-1]
> >> + ? ?movzx ? ? ? ?r6, byte [r3+r2*2 ?]
> >> + ? ?sub ? ? ? ? ?r6, e_reg
> >> + ? ?lea ? ? ? ? ?r5, [r5+r6*8]
> >> +
> >> + ? ?movzx ? ? e_reg, byte [r0 ? ? -1]
> >> + ? ?movzx ? ? ? ?r6, byte [r3+r2 ? ?]
> >> + ? ?sub ? ? ? ? ?r6, e_reg
> >> + ? ?lea ? ? ? ? ?r5, [r5+r6*8]
> >> + ? ?sub ? ? ? ? ?r5, r6
> >
> > the *7 with lea + sub can maybe be changed to a add into the *8 case and a
> > subtract (replacing lea by add)
> >
> >> + ? ?movzx ? ? e_reg, byte [r0+r2 ?-1]
> >> + ? ?movzx ? ? ? ?r6, byte [r3 ? ? ? ]
> >> + ? ?sub ? ? ? ? ?r6, e_reg
> >> + ? ?lea ? ? ? ? ?r5, [r5+r6*4]
> >> + ? ?lea ? ? ? ? ?r5, [r5+r6*2]
> >
> > this could add into *4 and *2 cases to replace the 2 leas by 2 adds
> > or to leas *2 into the *3 case redusing the 2 leas to 1
> > similar tricks may be possible elsewhere
> 
> I didn't quite get these two, what exactly would you like me to try?

a+=8*c
a+=8*b
a-=b

to

c+=b
a+=8*c
a-=b

----
a+=2*b
a+=b
a+=2*c
a+=4*c

to

b+=2*c
a+=2*b
a+=b


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100930/7da95500/attachment.pgp>