[FFmpeg-devel] [PATCH 02/10] x86: dcadsp: implement SSE lfe_dir

Sun Apr 6 18:35:42 CEST 2014

On 06/04/14 1:27 PM, Hendrik Leppkes wrote:
> On Sun, Apr 6, 2014 at 5:34 PM, Timothy Gu <timothygu99 at gmail.com> wrote:
>> On Apr 6, 2014 8:24 AM, "Hendrik Leppkes" <h.leppkes at gmail.com> wrote:
>>>
>>> On Fri, Feb 14, 2014 at 5:00 PM, Christophe Gisquet
>>> <christophe.gisquet at gmail.com> wrote:
>>>> Results for Arrandale/Windows:
>>>> 32: 1670 -> 316
>>>> 64:  728 -> 298
>>>> ---
>>>>  libavcodec/x86/dcadsp.asm    | 87
>> ++++++++++++++++++++++++++++++++++++++++++++
>>>>  libavcodec/x86/dcadsp_init.c |  4 ++
>>>>  2 files changed, 91 insertions(+)
>>>>
>>>> diff --git a/libavcodec/x86/dcadsp.asm b/libavcodec/x86/dcadsp.asm
>>>> index a0995c9..f4149d2 100644
>>>> --- a/libavcodec/x86/dcadsp.asm
>>>> +++ b/libavcodec/x86/dcadsp.asm
>>>> @@ -88,3 +88,90 @@ INT8X8_FMUL_INT32
>>>>
>>>>  INIT_XMM sse4
>>>>  INT8X8_FMUL_INT32
>>>> +
>>>> +; %1=v0/v1  %2=in1  %3=in2
>>>> +%macro FIR_LOOP 2-3
>>>> +.loop%1:
>>>> +%define va          m1
>>>> +%define vb          m2
>>>> +%if %1
>>>> +%define OFFSET      0
>>>> +%else
>>>> +%define OFFSET      NUM_COEF*count
>>>> +%endif
>>>> +; for v0, incrementint and for v1, decrementing
>>>> +    mova        va, [cf0q + OFFSET]
>>>> +    mova        vb, [cf0q + OFFSET + 4*NUM_COEF]
>>>> +%if %0 == 3
>>>> +    mova        m4, [cf0q + OFFSET + mmsize]
>>>> +    mova        m0, [cf0q + OFFSET + 4*NUM_COEF + mmsize]
>>>> +%endif
>>>> +    mulps       va, %2
>>>> +    mulps       vb, %2
>>>> +%if %0 == 3
>>>> +    mulps       m4, %3
>>>> +    mulps       m0, %3
>>>> +    addps       va, m4
>>>> +    addps       vb, m0
>>>> +%endif
>>>> +    ; va = va1 va2 va3 va4
>>>> +    ; vb = vb1 vb2 vb3 vb4
>>>> +%if %1
>>>> +    SWAP        va, vb
>>>> +%endif
>>>> +    mova        m4, va
>>>> +    unpcklps    va, vb ; va3 vb3 va4 vb4
>>>> +    unpckhps    m4, vb ; va1 vb1 va2 vb2
>>>> +    addps       m4, va ; va1+3 vb1+3 va2+4 vb2+4
>>>> +    movhlps     vb, m4 ; va1+3  vb1+3
>>>> +    addps       vb, m4 ; va0..4 vb0..4
>>>> +    movh    [outq + count], vb
>>>
>>> I got a complaint about a crash on a SSE-only system, and the disasm I
>>> got from the user was pointing at this exact line:
>>>
>>> ......
>>> 6A2F283F  movaps      xmm4,xmm1
>>> 6A2F2842  unpcklps    xmm1,xmm2
>>> 6A2F2845  unpckhps    xmm4,xmm2
>>> 6A2F2848  addps       xmm4,xmm1
>>> 6A2F284B  movhlps     xmm2,xmm4
>>> 6A2F284E  addps       xmm2,xmm4
>>> 6A2F2851  movq        mmword ptr [eax+ecx],xmm2  <<<
>>>
>>> The "movh" generates a movq, which according to my quick research
>>> seems to be SSE2-only, and causes an illegal instruction on the users
>>> CPU.
>>
>> That should probably be movlps. See
>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=b5161908e06b4497bf663510fb495ba97a6fd2b5
>> .
>>
> 
> I sent a patch based on that, thanks for remembering the earlier case. :)

Ideally, this should be fixed on x86inc so movh expands to movq or movlps 
depending on the requested instruction set, so this doesn't happen again.

And afaik, movq using XMM registers is what's not available on SSE. 
The problem is not that the dest operand is a memory address, but that 
integer instructions didn't get the 128bits treatment until SSE2.