[FFmpeg-trac] #8694(avcodec:new): FFV1 decoding needs a huge number of threads for optimal performance

FFmpeg trac at avcodec.org
Sat May 30 00:23:18 EEST 2020


#8694: FFV1 decoding needs a huge number of threads for optimal performance
-------------------------------------+-------------------------------------
             Reporter:  zorr         |                     Type:
                                     |  enhancement
               Status:  new          |                 Priority:  normal
            Component:  avcodec      |                  Version:  git-
             Keywords:  ffv1,        |  master
  decoding, performance              |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
 I noticed that when decoding FFV1 (especially version 1) you can get much
 higher performance by increasing the number of decoding threads to a much
 larger value than the default (and recommended) number. On my system
 (Ryzen 3900X) the default is 16 threads. Using a 8-bit ffv1 v1 SD
 (720x576) source 16 threads gives '''163 fps''' but 384 threads gives
 '''1181 fps''' (7.2x speed-up). Other lossless codecs (huffyuv, magicyuv,
 utvideo) don't behave this way - the best performance is achieved with 48
 threads, but using 24 threads is almost as good and it makes sense because
 that's the number logical cores on the test machine.

 I ran the tests using the null encoder and without audio. The test source
 is over 30 minutes (44058 frames). I measured the wall clock time and
 calculated the fps, took the best of three runs. The test script was (just
 varying the '''-threads''' parameter)

 {{{
 ffmpeg -threads 384 -i src.avi -an -f null -
 }}}

 More detailed results below:

 {{{
 ffv1 v1, null encoder
 threads         time (ms)       fps
 16              269650          163
 24              216301          204
 48              130619          337
 96              72483           608
 128             57245           770
 192             46769           942
 256             38337           1149
 384             37304           1181
 512             37352           1180
 768             37458           1176
 }}}

 I also ran a more real-world scenario of converting the source to huffyuv.
 In this case best performance was achieved with 512 threads but 256 is
 almost as good. Detailed results below.

 {{{
 ffv1 v1 -> huffyuv
 threads         time (ms)       fps
 16              279524          158
 24              224079          197
 48              133244          331
 96              75631           583
 128             60817           724
 192             49113           897
 256             41690           1057
 384             41644           1058
 512             41628           1058
 768             41722           1056
 }}}

 FFV1 v3 doesn't need quite as many threads, the optimal was 128 threads
 (and even 96 is almost as good).

 {{{
 ffv1 v3 null encoder
 threads         time (ms)       fps
 16              91734           480
 24              72105           611
 48              50835           867
 64              40670           1083
 80              39819           1106
 96              37766           1167
 128             37621           1171
 192             37661           1170
 }}}

 And here are the results for utvideo, magicyuv and huffyuv.

 {{{
 utvideo, null encoder
 threads         time (ms)       fps
 6               19033           2315
 8               14329           3075
 12              9785            4503
 16              7703            5720
 24              5463            8065
 48              5436            8105
 96              5497            8015

 magicyuv, null encoder
 threads         time (ms)       fps
 6               30525           1443
 8               22947           1920
 12              15902           2771
 16              12687           3473
 24              8956            4919
 48              8923            4938
 96              8944            4926

 huffyuv, null encoder
 threads         time (ms)       fps
 6               22630           1947
 8               17048           2584
 12              12210           3608
 16              10034           4391
 24              7214            6107
 48              7189            6129
 96              7263            6066
 }}}

 These benchmarks were run with the git build 20200525-6268034 (May 25,
 2020 10:44). I have also tested version 4.2.2 and version 3.4.2. The
 performance is very similar in all of them. User '''furq''' on #ffmpeg
 channel also confirmed that on his Ryzen 2600 (6 cores, 12 logical cores)
 the best performance was with 128 threads.

 I made a couple of charts to better visualize the scaling behaviour of the
 codecs, see here: https://i.postimg.cc/VNTxgWdw/ffv1-performance.png.

 Whenever more than 16 threads are requested, ffmpeg displays a warning
 ''"Using a thread count greater than 16 is not recommended."'' When I
 asked about this on #ffmpeg IRC channel users '''furq''' and '''Compn'''
 were able to find out that the warning message is probably related to
 H.264 slice threading which seems to be buggy with more than 16 threads
 https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/pthread_internal.h#L24-L26.
 The actual warning message code is here
 https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/pthread.c#L64-L67.
 I have also confirmed that there are no errors on the resulting video even
 when using 512 threads to decode ffv1, the hashes are equal.

 So I think one way to improve things would be to customize the warning
 message based on the used codec. Perhaps even adjusting the default number
 of threads based on the codec and the number of available cores. Users are
 probably not aware that adjusting the number of threads a 7-fold speed-up
 is possible.

 And I think it's worth taking a look at why ffv1 needs so many threads in
 the first place. Perhaps it is by design but it could also be a symptom of
 a hidden design flaw or a simple coding error.

--
Ticket URL: <https://trac.ffmpeg.org/ticket/8694>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list