[FFmpeg-devel] [PATCH 08/11] avcodec/v210enc: add AVX-512 10-bit line pack function
jdarnley at obe.tv
Fri Nov 10 23:13:53 EET 2017
On 2017-11-10 14:32, James Darnley wrote:
> I mentioned previously that using ZMM registers will cause the CPU to
> reduce its frequency.
> Gramner said on IRC that a user should spend 20-30% of time in
> AVX-512/ZMM code for it to be a net gain in speed.
> From ffmpeg-devel IRC on 2017-10-26
>> [18:49:26 CEST] <Gramner> J_Darnley: be aware that using zmm registers induces significant frequency drops which reduces performance of everything else, so if you want to use 512-bit vectors you better go all in on it to make up for it. you probably want to spend at least 20-30% of overall runtime in avx-512 code
>> [18:50:00 CEST] <Gramner> the alternative is to stay in 256-bit mode and just leverage new instructions and opmasks
> This means any cycles you might save by using longer registers, fewer
> instructions, better instructions, whatever, will be lost because the
> frequency drops meaning it takes longer to execute overall.
Some details about this can be found in one of Intel's documents: Intel®
64 and IA-32 Architectures Optimization Reference Manual
Order Number: 248966-038
Specifically section 15.26 "SKYLAKE SERVER POWER MANAGEMENT"
Earlier on the ffmpeg-devel IRC channel I posted a link to Cloudflare's
blog in which they discuss the effects of running just a few (my words)
In the worst cases on some of the new processors the frequency drop can
be 1GHz. In Cloudflare's case just spending about 2.5% of time in a
cryptography function using AVX-512 was causing a 10% drop in their
overall performance (requests served per second).
After seeing this and the discussion on IRC I won't commit any of the
function patches. The functions are not very impressive and are likely
to make everything else slower.
The IRC log should appear at the link below.
More information about the ffmpeg-devel