[FFmpeg-devel] [PATCH] Add x86-optimized versions of lshift_tab().

Ronald S. Bultje rsbultje
Sat Feb 12 21:20:25 CET 2011


Hi,

On Sat, Feb 12, 2011 at 2:31 PM, Justin Ruggles
<justin.ruggles at gmail.com> wrote:
> New function name AC3DSPContext.ac3_lshift_int16().
> ---
> ?libavcodec/ac3dsp.c ? ? ? ? | ? 11 +++++++++++
> ?libavcodec/ac3dsp.h ? ? ? ? | ? 11 +++++++++++
> ?libavcodec/ac3enc_fixed.c ? | ? 19 +------------------
> ?libavcodec/x86/ac3dsp.asm ? | ? 35 +++++++++++++++++++++++++++++++++++
> ?libavcodec/x86/ac3dsp_mmx.c | ? ?7 +++++++
> ?5 files changed, 65 insertions(+), 18 deletions(-)
[..]
> +    /**
> +     * Left-shift each value in an array of int16_t by a specified amount.
> +     * @param src    input array
> +     *               constraints: align 16
> +     * @param len    number of values in the array
> +     *               constraints: multiple of 32 greater than 0
> +     * @param shift  left shift amount
> +     *               constraints: range [0,15]
> +     */
> +    void (*ac3_lshift_int16)(int16_t *src, int len, unsigned int shift);

See below on this.

> +cglobal ac3_lshift_int16_%1, 3,3,5, src, offset, shift
> +    cmp    shiftd, 0
> +    je .end

test shiftd, shiftd should give smaller binary code, and then "jz"
(although that's actually the same, but jz better describes what it
does here).

> +    shl   offsetq, 1
> +    sub   offsetq, mmsize*4
> +    movd       m0, shiftd
> +.loop:
> +    mova       m1, [srcq+offsetq         ]
> +    mova       m2, [srcq+offsetq+mmsize  ]
> +    mova       m3, [srcq+offsetq+mmsize*2]
> +    mova       m4, [srcq+offsetq+mmsize*3]
> +    psllw      m1, m0
> +    psllw      m2, m0
> +    psllw      m3, m0
> +    psllw      m4, m0
> +    mova  [srcq+offsetq         ], m1
> +    mova  [srcq+offsetq+mmsize  ], m2
> +    mova  [srcq+offsetq+mmsize*2], m3
> +    mova  [srcq+offsetq+mmsize*3], m4
> +    sub   offsetq, mmsize*4
> +    jge .loop
> +.end:
> +    RET
> +%endmacro
> +
> +INIT_MMX
> +AC3_LSHIFT_INT16 mmx
> +INIT_XMM
> +AC3_LSHIFT_INT16 sse2

Doesn't this do 64 per loop iteration for sse2? If so, doesn't that
conflict with the function definition and/or overflow?

Ronald



More information about the ffmpeg-devel mailing list