[FFmpeg-devel] [PATCH] x86: kill fpel_mmx.c

Tue May 20 23:29:26 CEST 2014

Hi,

2014-05-20 20:58 GMT+02:00 Michael Niedermayer <michaelni at gmx.at>:
>> -#if HAVE_MMX_INLINE
>> +#if HAVE_YASM
>
> why not HAVE_MMX_EXTERNAL ?

Because I don't know what it means. yasm supporting mmx?

>>  static void hpeldsp_init_mmxext(HpelDSPContext *c, int flags, int cpu_flags)
>>  {
>> +#if HAVE_YASM
>> +    c->avg_pixels_tab[0][0] = avg_pixels16_mmxext;
>> +    c->avg_pixels_tab[1][0] = ff_avg_pixels8_mmxext;
>> +#endif
>>  #if HAVE_MMXEXT_EXTERNAL
>>      c->put_pixels_tab[0][1] = ff_put_pixels16_x2_mmxext;
>>      c->put_pixels_tab[0][2] = put_pixels16_y2_mmxext;
>>
>> -    c->avg_pixels_tab[0][0] = avg_pixels16_mmxext;
>>      c->avg_pixels_tab[0][1] = avg_pixels16_x2_mmxext;
>>      c->avg_pixels_tab[0][2] = avg_pixels16_y2_mmxext;
>>
>>      c->put_pixels_tab[1][1] = ff_put_pixels8_x2_mmxext;
>>      c->put_pixels_tab[1][2] = ff_put_pixels8_y2_mmxext;
>>
>> -    c->avg_pixels_tab[1][0] = ff_avg_pixels8_mmxext;
>>      c->avg_pixels_tab[1][1] = ff_avg_pixels8_x2_mmxext;
>>      c->avg_pixels_tab[1][2] = ff_avg_pixels8_y2_mmxext;
>
> why this change ?

Unless I'm missing something, I'm moving C inline asm to yasm, hence
the define. Otherwise I'm just trying to move lines around, but the
intermediate define of avg_pixels16_mmxext is useless. But then it
probably means I just don't know HAVE_MMXEXT_EXTERNAL.

> also did you benchmark the change this patch makes ?

Here's the benchmark on win32 (the closer I can get to the intended systems).

554 decicycles in put8, 8388241 runs, 367 skips
707 decicycles in put16, 4193813 runs, 491 skips
855 decicycles in avg8, 524259 runs, 29 skips
4499 decicycles in avg16, 131038 runs, 34 skips

after:
492 decicycles in put8, 8388226 runs, 382 skips
709 decicycles in put16, 4193875 runs, 429 skips
844 decicycles in avg8, 524251 runs, 37 skips
4306 decicycles in avg16, 131042 runs, 30 skips

Almost negligible except for avg16.

(I simply wrapped the call to the actual asm between START/STOP_TIMER)

> the original code was quite finetuned IIRC someone back then
> tested all kinds of instruction orders and complex addressing vs
> increasing the pointers, and what we had was the best found i that
> testing ...
> that was of course with the cpus of that time, but still a mmx only
> or mmx2 only box for which some of this gets used could be similar
>
> So we should make sure we dont worsen this ...

Agreed, though the above benchmark is almost senseless in that regard.
Memory/cache speed/bandwidth, execution ports, insn throughput/cycle
count have changed so much since the CPUs where that would run...

>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Asymptotically faster algorithms should always be preferred if you have
> asymptotical amounts of data
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

-- 
Christophe