[FFmpeg-devel] [PATCH 1/7] x86: hevc_mc: add AVX2 optimizations

James Almer jamrial at gmail.com
Fri Feb 6 01:15:28 CET 2015


On 05/02/15 4:20 PM, Christophe Gisquet wrote:
> From: plepere <pierre-edouard.lepere at insa-rennes.fr>

This should probably be changed to Pierre Edouard Lepere.

> +%if cpuflag(avx2) && (%0 == 3)
> +
> +    vextracti128 xm10, m0, 1
> +    vinserti128 m10, m1, xm10, 0
> +    vinserti128 m0, m0, xm1, 1
> +    mova m1, m10
> +
> +    vextracti128 xm10, m2, 1
> +    vinserti128 m10, m3, xm10, 0
> +    vinserti128 m2, m2, xm3, 1
> +    mova m3, m10
> +
> +
> +    vextracti128 xm10, m4, 1
> +    vinserti128 m10, m5, xm10, 0
> +    vinserti128 m4, m4, xm5, 1
> +    mova m5, m10
> +
> +    vextracti128 xm10, m6, 1
> +    vinserti128 m10, m7, xm10, 0
> +    vinserti128 m6, m6, xm7, 1
> +    mova m7, m10
> +%endif

I didn't check but i think these can be simplified using vperm2i128.
It can be done in a separate patch anyway.

> @@ -619,6 +761,89 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
>              c->idct_dc[3] = ff_hevc_idct32x32_dc_8_avx2;
>              if (ARCH_X86_64) {
>                  SAO_BAND_INIT(8, avx2);
> +                c->put_hevc_epel[7][0][0] = ff_hevc_put_hevc_pel_pixels32_8_avx2;
> +                c->put_hevc_epel[8][0][0] = ff_hevc_put_hevc_pel_pixels48_8_avx2;
> +                c->put_hevc_epel[9][0][0] = ff_hevc_put_hevc_pel_pixels64_8_avx2;
[...]

It would be nice all this was compressed to a couple macros like with SSE4. But that's 
cosmetics and not a blocker.

>              }
>  
>              c->transform_add[2] = ff_hevc_transform_add16_10_avx2;
> 

Should be ok if it passes fate and compiles with yasm <= 1.1.0 (there are C wrappers 
and those usually need more strict checks for HAVE_AVX2_EXTERNAL because dead code 
elimination doesn't seem to trigger until after pre-processing is done).


More information about the ffmpeg-devel mailing list