[FFmpeg-devel] [PATCH] Add x86-optimized versions of lshift_tab().

Måns Rullgård mans
Sat Feb 12 21:30:46 CET 2011


"Ronald S. Bultje" <rsbultje at gmail.com> writes:

> Hi,
>
> On Sat, Feb 12, 2011 at 2:31 PM, Justin Ruggles
> <justin.ruggles at gmail.com> wrote:
>> New function name AC3DSPContext.ac3_lshift_int16().
>> ---
>> ?libavcodec/ac3dsp.c ? ? ? ? | ? 11 +++++++++++
>> ?libavcodec/ac3dsp.h ? ? ? ? | ? 11 +++++++++++
>> ?libavcodec/ac3enc_fixed.c ? | ? 19 +------------------
>> ?libavcodec/x86/ac3dsp.asm ? | ? 35 +++++++++++++++++++++++++++++++++++
>> ?libavcodec/x86/ac3dsp_mmx.c | ? ?7 +++++++
>> ?5 files changed, 65 insertions(+), 18 deletions(-)
> [..]
>> +    /**
>> +     * Left-shift each value in an array of int16_t by a specified amount.
>> +     * @param src    input array
>> +     *               constraints: align 16
>> +     * @param len    number of values in the array
>> +     *               constraints: multiple of 32 greater than 0
>> +     * @param shift  left shift amount
>> +     *               constraints: range [0,15]
>> +     */
>> +    void (*ac3_lshift_int16)(int16_t *src, int len, unsigned int shift);
>
> See below on this.
>
>> +cglobal ac3_lshift_int16_%1, 3,3,5, src, offset, shift
>> +    cmp    shiftd, 0
>> +    je .end
>> +    shl   offsetq, 1
>> +    sub   offsetq, mmsize*4
>> +    movd       m0, shiftd
>> +.loop:
>> +    mova       m1, [srcq+offsetq         ]
>> +    mova       m2, [srcq+offsetq+mmsize  ]
>> +    mova       m3, [srcq+offsetq+mmsize*2]
>> +    mova       m4, [srcq+offsetq+mmsize*3]
>> +    psllw      m1, m0
>> +    psllw      m2, m0
>> +    psllw      m3, m0
>> +    psllw      m4, m0
>> +    mova  [srcq+offsetq         ], m1
>> +    mova  [srcq+offsetq+mmsize  ], m2
>> +    mova  [srcq+offsetq+mmsize*2], m3
>> +    mova  [srcq+offsetq+mmsize*3], m4
>> +    sub   offsetq, mmsize*4
>> +    jge .loop
>> +.end:
>> +    RET
>> +%endmacro
>> +
>> +INIT_MMX
>> +AC3_LSHIFT_INT16 mmx
>> +INIT_XMM
>> +AC3_LSHIFT_INT16 sse2
>
> Doesn't this do 64 per loop iteration for sse2? If so, doesn't that
> conflict with the function definition and/or overflow?

64 bytes, 32 int16_t elements.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list