[FFmpeg-devel] [PATCH] x86/swr: add ff_float_to_int32_a_avx2
jamrial at gmail.com
Thu Nov 6 23:04:49 CET 2014
On 06/11/14 6:35 PM, Christophe Gisquet wrote:
> 2014-11-06 21:48 GMT+01:00 James Almer <jamrial at gmail.com>:
>> 13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips
>> 8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips
> A couple of naïve questions (I haven't checked):
> Does it increase the alignment requirement?
> If yes, should it be notified somewhere (API bump, comment in the
> relevant header, ...)?
No, the function checks for alignment and jumps to a branch that uses movdqu if needed.
ff_int32_to_float_a_avx also uses ymm regs and this same macro.
Nonetheless, instructions using the VEX coding scheme don't need any kind of alignment.
We could modify or duplicate these macros so the AVX versions don't do unnecessary things
movu m0, [mem]
mulps m0, m1
when "mulps m0, m1, [mem]" would work just as well regardless of alignment.
The only instruction that still needs alignment with the VEX scheme is of course movdqa.
>> x86inc.asm doesn't seem to handle cmpps or its aliases properly when using avx.
> I remember fixing its declaration (missing one parameter) while
> working on aac (maybe one year ago). It's unrelated, maybe?
If you use "cmpps m0, m1, 5" it will work for non-VEX coding, but error out otherwise
since x86inc.asm turns that into "vcmpps m0, m1, 5" instead of "vcmpps m0, m0, m1, 5"
With aliases like cmpnltps it doesn't even add the "v" prefix.
> Otherwise looks obvious.
More information about the ffmpeg-devel