[FFmpeg-devel] [PATCH 02/10] diracdsp: add dequantization SIMD

James Almer jamrial at gmail.com
Mon Jun 27 23:38:06 CEST 2016


On 6/27/2016 8:53 AM, Rostislav Pehlivanov wrote:
> I've attached another patch which should work fine now.
> I did this after the put_signed_rect so it does require the first patch,
> but if this patch is okay I'll amend and tidy things before I push.
> For some reason changing dstq to be stored at r4 or r3 broke it and I've no
> idea why. Neither is used after loading m2 and m3. Should work on x86_32
> now, but I'm wondering why I can't save that register.

[...]

> diff --git a/libavcodec/x86/diracdsp.asm b/libavcodec/x86/diracdsp.asm
> index c5cc530..4bc8b2d 100644
> --- a/libavcodec/x86/diracdsp.asm
> +++ b/libavcodec/x86/diracdsp.asm
> @@ -266,9 +266,45 @@ HPEL_FILTER sse2
>  ADD_OBMC 32, sse2
>  ADD_OBMC 16, sse2
>  
> -%if ARCH_X86_64 == 1
>  INIT_XMM sse4
>  
> +; void dequant_subband_32(uint8_t *src, uint8_t *dst, ptrdiff_t stride, const int qf, const int qs, int tot_v, int tot_h)
> +cglobal dequant_subband_32, 7, 8, 4, src, dst, stride, qf, qs, tot_v, tot_h

x86_32 has 8 gprs but you can only use 7 as the last one is reserved
to keep the stack pointer.

> +
> +    movd   m2, qfd
> +    movd   m3, qsd
> +    SPLATD m2
> +    SPLATD m3
> +    mov    r4, tot_hq
> +    mov    r7, dstq
> +
> +    .loop_v:
> +    mov    tot_hq, r4
> +    mov    dstq,   r7
> +
> +    .loop_h:
> +    movu   m0, [srcq]
> +
> +    pabsd  m1, m0
> +    pmulld m1, m2
> +    paddd  m1, m3
> +    psrld  m1,  2
> +    psignd m1, m0
> +
> +    movu   [dstq], m1
> +
> +    add    srcq, mmsize
> +    add    dstq, mmsize
> +    sub    tot_hd, 4
> +    jg     .loop_h
> +
> +    add    r7, strideq
> +    dec    tot_vd
> +    jg     .loop_v
> +
> +    RET

I'm not sure why you say using r3 instead of r7 here didn't work for
you. I just tried it (after applying all patches up to 6/10) and fate
at least still passes, on both x86_64 and x86_32.



More information about the ffmpeg-devel mailing list