[FFmpeg-devel] [PATCH] MMX2/SSSE3 VC1 loop filter

Mon Jul 5 23:19:58 CEST 2010

On Jul 5, 2010, at 5:02 PM, Jason Garrett-Glaser wrote:

> On Mon, Jul 5, 2010 at 1:30 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> Hi,
>> 
>> On Mon, Jul 5, 2010 at 1:44 AM, David Conrad <lessen42 at gmail.com> wrote:
>>> Updated to patch cleanly, compile, and added mmx/sse2 versions
>> [..]
>>> +SECTION_RODATA
>>> +pw_4: times 8 dw 4
>>> +pw_5: times 8 dw 5
>> 
>> cextern pw_4, pw_5 (i.e. use the ones in dsputil_mmx.c) maybe?
>> 
>>> +; low, high (src), zero
>>> +%macro UNPACK2 4
>>> +    mova      m%2, m%3
>>> +    punpckh%1 m%3, m%4
>>> +    punpckl%1 m%2, m%4
>>> +%endmacro
>> 
>> duplicate of SBUTTERFLY in x86util.asm, maybe?
>> 
>>> +%macro STORE_4_WORDS_MMX 6
>>> +    movd   %6, %5
>>> +%if mmsize==16
>>> +    psrldq %5, 4
>>> +%else
>>> +    psrlq  %5, 32
>>> +%endif
>>> +    mov    %1, %6w
>>> +    shr    %6, 16
>>> +    mov    %2, %6w
>>> +    movd   %6, %5
>>> +    mov    %3, %6w
>>> +    shr    %6, 16
>>> +    mov    %4, %6w
>>> +%endmacro
>> 
>> For VP8 H loopfilter, I save the neighbouring two rows (p1/q1) and
>> write the four out as dwords using movd at once from the mm register,
>> have you tried that (I'm not asking you to rewrite it if you didn't),
>> and if so, is it faster?
>> 
>> (I suppose this isn't very practical because of the SSE4 version below...)
>> 
>>> +%macro STORE_4_WORDS_SSE4 6
>>> +    pextrw %1, %5, %6+0
>>> +    pextrw %2, %5, %6+1
>>> +    pextrw %3, %5, %6+2
>>> +    pextrw %4, %5, %6+3
>>> +%endmacro
>> [..]
> 
> I don't recall pextrw being SSE4...

The form with a memory destination is