[FFmpeg-devel] [PATCH] SSE2 and SSSE3 versions of h264 biweight prediction code (biweight_h264_pixels_tab)
Fri Jul 30 09:35:37 CEST 2010
On Thu, Jul 29, 2010 at 11:15:58AM -0700, Eli Friedman wrote:
> On Thu, Jul 29, 2010 at 9:23 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> > Hi,
> > On Thu, Jul 29, 2010 at 12:32 AM, Eli Friedman <eli.friedman at gmail.com> wrote:
> >> Patch attached. ?Loosely based off of the MMX2 version. ?Around 1%
> >> faster overall on a test file on my Mobile Core i5.
> > [..]
> >> +cglobal h264_biweight_8x8_ssse3, 7, 7, 8
> >> + ? ?BIWEIGHT_SSSE3_SETUP
> >> + ? ?mov ? ? ? ?r3, 4
> >> +
> >> +.nextrow
> >> + ? ?BIWEIGHT_SSSE3_OP r2
> >> + ? ?movh ? ? ? [r0], m0
> >> + ? ?movhps ? ? [r0+r2], m0
> >> + ? ?lea ? ? ? ?r0, [r0+r2*2]
> >> + ? ?lea ? ? ? ?r1, [r1+r2*2]
> >> + ? ?dec ? ? ? ?r3
> >> + ? ?jnz .nextrow
> >> + ? ?REP_RET
> > You have several unused r%d regs here, maybe you want to use lea r4,
> > [r2*2] and then use add r0/r1, r4 instead of lea, that should result
> > in slightly smaller code. Same for h264_biweight_8x8_sse2.
> Will do.
> >> +%macro BIWEIGHT_SSSE3_OP 1
> >> + ? ?movh ? ? ? m0, [r0]
> >> + ? ?movh ? ? ? m1, [r1]
> >> + ? ?movh ? ? ? m2, [r0+%1]
> >> + ? ?movh ? ? ? m3, [r1+%1]
> >> + ? ?punpcklbw ?m0, m1
> >> + ? ?punpcklbw ?m2, m3
> > If you don't use m1/m3 afterwards, you can IIRC just punpcklbw m0,
> > [r0+%1] and same for the line below.
> I don't have appropriate alignment for the 8x8 case, but I suppose I
> can do it in the 16x16 case.
> I'll wait for other comments before updating.
dont wait for mine, if ronald and jason already reveiw this
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The misfortune of the wise is better than the prosperity of the fool.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 190 bytes
Desc: Digital signature
More information about the ffmpeg-devel