[FFmpeg-devel] [PATCH] SSE2 and SSSE3 versions of h264 biweight prediction code (biweight_h264_pixels_tab)

Eli Friedman eli.friedman
Thu Jul 29 20:15:58 CEST 2010


On Thu, Jul 29, 2010 at 9:23 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Thu, Jul 29, 2010 at 12:32 AM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> Patch attached. ?Loosely based off of the MMX2 version. ?Around 1%
>> faster overall on a test file on my Mobile Core i5.
> [..]
>> +cglobal h264_biweight_8x8_ssse3, 7, 7, 8
>> + ? ?BIWEIGHT_SSSE3_SETUP
>> + ? ?mov ? ? ? ?r3, 4
>> +
>> +.nextrow
>> + ? ?BIWEIGHT_SSSE3_OP r2
>> + ? ?movh ? ? ? [r0], m0
>> + ? ?movhps ? ? [r0+r2], m0
>> + ? ?lea ? ? ? ?r0, [r0+r2*2]
>> + ? ?lea ? ? ? ?r1, [r1+r2*2]
>> + ? ?dec ? ? ? ?r3
>> + ? ?jnz .nextrow
>> + ? ?REP_RET
>
> You have several unused r%d regs here, maybe you want to use lea r4,
> [r2*2] and then use add r0/r1, r4 instead of lea, that should result
> in slightly smaller code. Same for h264_biweight_8x8_sse2.

Will do.

>> +%macro BIWEIGHT_SSSE3_OP 1
>> + ? ?movh ? ? ? m0, [r0]
>> + ? ?movh ? ? ? m1, [r1]
>> + ? ?movh ? ? ? m2, [r0+%1]
>> + ? ?movh ? ? ? m3, [r1+%1]
>> + ? ?punpcklbw ?m0, m1
>> + ? ?punpcklbw ?m2, m3
>
> If you don't use m1/m3 afterwards, you can IIRC just punpcklbw m0,
> [r0+%1] and same for the line below.

I don't have appropriate alignment for the 8x8 case, but I suppose I
can do it in the 16x16 case.

I'll wait for other comments before updating.

-Eli



More information about the ffmpeg-devel mailing list