[FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter

James Almer jamrial at gmail.com
Fri Jan 15 03:55:44 CET 2016


On 1/14/2016 11:05 PM, James Darnley wrote:
> 2.6 times faster
> ---
> I have one question now.  Should I make the function name match the assembly
> existing deblock/loop filter functions?  I took the current name from the C (as
> I was originally trying to use a gather instruction but that didn't offer any
> benefit).
> ---
>  libavcodec/x86/h264_deblock.asm | 40 ++++++++++++++++++++++++++++++++++++++++
>  libavcodec/x86/h264dsp_init.c   |  4 ++++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
> index 5151f3c..20f0814 100644
> --- a/libavcodec/x86/h264_deblock.asm
> +++ b/libavcodec/x86/h264_deblock.asm
> @@ -864,7 +864,47 @@ ff_chroma_inter_body_mmxext:
>      DEBLOCK_P0_Q0
>      ret
>  
> +cglobal h264_h_loop_filter_chroma422_8, 5, 7, 8, mmsize + ARCH_X86_64*2*mmsize

This will not work with x86_32 compilers that don't have aligned stack (Like msvc)
because r6 is needed to store the stack pointer.

> +    %if ARCH_X86_64
> +        %define buf0 [rsp+16]
> +        %define buf1 [rsp+8]
> +    %else
> +        %define buf0 r0m
> +        %define buf1 r2m
> +    %endif
> +
> +    movd m6, [r4]

Since r4 is free after this point, you can use it instead of r6 to easily solve
the above.


More information about the ffmpeg-devel mailing list