[FFmpeg-trac] #7706(avcodec:open): 20-30% perf drop in FFmpeg (H264) transcode performance with VAAPI

FFmpeg trac at avcodec.org
Fri Nov 8 14:14:04 EET 2019


#7706: 20-30% perf drop in FFmpeg (H264) transcode performance with VAAPI
-------------------------------------+-------------------------------------
             Reporter:  eero-t       |                    Owner:
                 Type:  defect       |                   Status:  open
             Priority:  important    |                Component:  avcodec
              Version:  git-master   |               Resolution:
             Keywords:  vaapi        |               Blocked By:
  regression                         |
             Blocking:               |  Reproduced by developer:  1
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------

Comment (by eero-t):

 I ran several variants of 6 transcode operations and few other media
 tests:
 * In 8-bit (max FullHD) AVC transcode tests perf improves up to 20%, when
 running single transcode operation
 * In 10-bit 4K HEVC transcode [1], perf increase was 3-4%
 * When running multiple transcode operations in parallel, there was no
 perf change (all changes were within daily variance)
 * There were no performance regressions

 Even with the patch, there's still very clear gap to original January
 performance.  Because that perf drop concerned only single transcode
 operations (parallel ones were not impacted), it's possible that some part
 of the gap was due to P-state power management (I'm not fixing CPU & GPU
 speeds in my tests, on purpose).

 I was testing this on KBL i7 GT3e NUC with 28W TDP.  Some observations on
 power usage:
 * In the tests improving most, patch increases GPU power usage without
 increased CPU power usage, i.e. FFmpeg was better able to feed work to GPU
 * When many instances of the same test are run in parallel, things are TDP
 limited. Either there's no change in power usage, or patch causes slightly
 higher CPU usage, which results in GPU using less power.  No idea how
 latter behavior is able to maintain same speed, maybe P-state is better
 able to save GPU power with the interaction patterns caused by the patch?


 [1] Note: I'm seeing marginal reproducible quality drop (0.1% SSIM, 2-3%
 PSNR) in this test-case: https://trac.ffmpeg.org/ticket/8328

 I assume that's something related to frame timings like with QSV, not a
 change in encoded frame contents.

--
Ticket URL: <https://trac.ffmpeg.org/ticket/7706#comment:11>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list