[FFmpeg-trac] #8694(avcodec:new): FFV1 decoding needs a huge number of threads for optimal performance
FFmpeg
trac at avcodec.org
Sat May 30 00:23:18 EEST 2020
#8694: FFV1 decoding needs a huge number of threads for optimal performance
-------------------------------------+-------------------------------------
Reporter: zorr | Type:
| enhancement
Status: new | Priority: normal
Component: avcodec | Version: git-
Keywords: ffv1, | master
decoding, performance | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
I noticed that when decoding FFV1 (especially version 1) you can get much
higher performance by increasing the number of decoding threads to a much
larger value than the default (and recommended) number. On my system
(Ryzen 3900X) the default is 16 threads. Using a 8-bit ffv1 v1 SD
(720x576) source 16 threads gives '''163 fps''' but 384 threads gives
'''1181 fps''' (7.2x speed-up). Other lossless codecs (huffyuv, magicyuv,
utvideo) don't behave this way - the best performance is achieved with 48
threads, but using 24 threads is almost as good and it makes sense because
that's the number logical cores on the test machine.
I ran the tests using the null encoder and without audio. The test source
is over 30 minutes (44058 frames). I measured the wall clock time and
calculated the fps, took the best of three runs. The test script was (just
varying the '''-threads''' parameter)
{{{
ffmpeg -threads 384 -i src.avi -an -f null -
}}}
More detailed results below:
{{{
ffv1 v1, null encoder
threads time (ms) fps
16 269650 163
24 216301 204
48 130619 337
96 72483 608
128 57245 770
192 46769 942
256 38337 1149
384 37304 1181
512 37352 1180
768 37458 1176
}}}
I also ran a more real-world scenario of converting the source to huffyuv.
In this case best performance was achieved with 512 threads but 256 is
almost as good. Detailed results below.
{{{
ffv1 v1 -> huffyuv
threads time (ms) fps
16 279524 158
24 224079 197
48 133244 331
96 75631 583
128 60817 724
192 49113 897
256 41690 1057
384 41644 1058
512 41628 1058
768 41722 1056
}}}
FFV1 v3 doesn't need quite as many threads, the optimal was 128 threads
(and even 96 is almost as good).
{{{
ffv1 v3 null encoder
threads time (ms) fps
16 91734 480
24 72105 611
48 50835 867
64 40670 1083
80 39819 1106
96 37766 1167
128 37621 1171
192 37661 1170
}}}
And here are the results for utvideo, magicyuv and huffyuv.
{{{
utvideo, null encoder
threads time (ms) fps
6 19033 2315
8 14329 3075
12 9785 4503
16 7703 5720
24 5463 8065
48 5436 8105
96 5497 8015
magicyuv, null encoder
threads time (ms) fps
6 30525 1443
8 22947 1920
12 15902 2771
16 12687 3473
24 8956 4919
48 8923 4938
96 8944 4926
huffyuv, null encoder
threads time (ms) fps
6 22630 1947
8 17048 2584
12 12210 3608
16 10034 4391
24 7214 6107
48 7189 6129
96 7263 6066
}}}
These benchmarks were run with the git build 20200525-6268034 (May 25,
2020 10:44). I have also tested version 4.2.2 and version 3.4.2. The
performance is very similar in all of them. User '''furq''' on #ffmpeg
channel also confirmed that on his Ryzen 2600 (6 cores, 12 logical cores)
the best performance was with 128 threads.
I made a couple of charts to better visualize the scaling behaviour of the
codecs, see here: https://i.postimg.cc/VNTxgWdw/ffv1-performance.png.
Whenever more than 16 threads are requested, ffmpeg displays a warning
''"Using a thread count greater than 16 is not recommended."'' When I
asked about this on #ffmpeg IRC channel users '''furq''' and '''Compn'''
were able to find out that the warning message is probably related to
H.264 slice threading which seems to be buggy with more than 16 threads
https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/pthread_internal.h#L24-L26.
The actual warning message code is here
https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/pthread.c#L64-L67.
I have also confirmed that there are no errors on the resulting video even
when using 512 threads to decode ffv1, the hashes are equal.
So I think one way to improve things would be to customize the warning
message based on the used codec. Perhaps even adjusting the default number
of threads based on the codec and the number of available cores. Users are
probably not aware that adjusting the number of threads a 7-fold speed-up
is possible.
And I think it's worth taking a look at why ffv1 needs so many threads in
the first place. Perhaps it is by design but it could also be a symptom of
a hidden design flaw or a simple coding error.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/8694>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list