[FFmpeg-devel] [PATCH] x86/hecv_res_add: add ff_hevc_transform_add{8, 16, 32}_8_avx

Wed Aug 20 17:36:10 CEST 2014

On 20/08/14 4:29 AM, Christophe Gisquet wrote:
> Hi,
> 
> 2014-08-20 4:55 GMT+02:00 James Almer <jamrial at gmail.com>:
>> ~15% faster than sse2
> [...]
>> @@ -509,7 +509,11 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
>>              if (ARCH_X86_64) {
>>                  c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_avx;
>>                  c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_avx;
>> +
>> +                c->transform_add[2]    = ff_hevc_transform_add16_8_avx;
>> +                c->transform_add[3]    = ff_hevc_transform_add32_8_avx;
> 
> Does avx => ARCH_X86_64 (didn't know) ? Otherwise the reg count seems
> fine, meaning the condition is unneeded.

No, AVX does not imply x86_64. The reg count for these is currently 12 xmm regs, 
meaning x86_64 only.
I'll send a patch to get them down to 8 or so later.

> 
>>              }
>> +            c->transform_add[1]    = ff_hevc_transform_add8_8_avx;
> 
> I'm not entirely sure, but this is instantiated through INIT_YMM avx2,
> and I wouldn't expect performance improvement past the 3-op-form?
> 
> So couldn't this one be instantiated to use xmm regs? (mmx may be a
> burden eg need for emms and need to rewrite it).

Aren't you thinking about the 10bit functions? All three AVX I'm adding here are 8bit 
and using xmm. There are no 8bit AVX2 functions currently.