[FFmpeg-devel] [PATCH] avcodec/v210: add avx2 version of the line encoder
Henrik Gramner
henrik at gramner.com
Thu Jan 14 20:21:24 CET 2016
On Wed, Jan 13, 2016 at 4:55 PM, James Darnley <james.darnley at gmail.com> wrote:
> diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm
> index 859e2d9..a8f3d3c 100644
> --- a/libavcodec/x86/v210enc.asm
> +++ b/libavcodec/x86/v210enc.asm
> -cextern pb_FE
> -%define v210_enc_max_8 pb_FE
> +;cextern pb_FE
> +local_pb_FE: times 32 db 0xfe
> +%define v210_enc_max_8 local_pb_FE
You could change ff_pb_FE to be 32-byte instead of duplicating it.
> +%if cpuflag(avx2)
> + movu xm1, [yq+widthq*2]
> + vinserti128 m1, m1, [yq+widthq*2+12], 1
> +%else
> movu m1, [yq+2*widthq]
> +%endif
xmN can be used unconditionally which gets rid of the %else. E.g.
movu xm1, [yq+widthq*2]
%if cpuflag(avx2)
vinserti128 m1, m1, [yq+widthq*2+12], 1
%endif
> +%if cpuflag(avx2)
> + movq xm3, [uq+widthq]
> + movhps xm3, [vq+widthq]
> + movq xm7, [uq+widthq+6]
> + movhps xm7, [vq+widthq+6]
> + vinserti128 m3, m3, xm7, 1
> +%else
> movq m3, [uq+widthq]
> movhps m3, [vq+widthq]
> +%endif
Ditto. Also use xm2 instead of xm7 since it's unused at this point and
it avoids having to use an extra vector register in the AVX2 version.
> +%if cpuflag(avx2)
> + movu [dstq], xm0
> + movu [dstq+16], xm1
> + vextracti128 [dstq+32], m0, 1
> + vextracti128 [dstq+48], m1, 1
> +%else
> movu [dstq], m0
> movu [dstq+mmsize], m1
> +%endif
Ditto.
Otherwise LGTM.
More information about the ffmpeg-devel
mailing list