[FFmpeg-devel] [PATCH v5] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

Wang, Shaofei shaofei.wang at intel.com
Wed Feb 20 12:17:41 EET 2019


> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf Of
> Mark Thompson
> Sent: Saturday, February 16, 2019 8:12 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v5] Improved the performance of 1
> decode + N filter graphs and adaptive bitrate.
> On 15/02/2019 21:54, Shaofei Wang wrote:
> > It enabled multiple filter graph concurrency, which bring above about
> > 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
> >
> > Below are some test cases and comparison as reference.
> > (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
> > (Software: Intel iHD driver - 16.9.00100, CentOS 7)
> >
> > For 1:N transcode by GPU acceleration with vaapi:
> > ./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
> >     -hwaccel_output_format vaapi \
> >     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> >     -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
> >     -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null
> >
> >     test results:
> >                 2 encoders 5 encoders 10 encoders
> >     Improved       6.1%    6.9%       5.5%
> >
> > For 1:N transcode by GPU acceleration with QSV:
> > ./ffmpeg -hwaccel qsv -c:v h264_qsv \
> >     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> >     -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null
> \
> >     -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null
> > /dev/null
> >
> >     test results:
> >                 2 encoders  5 encoders 10 encoders
> >     Improved       6%       4%         15%
> >
> > For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
> > ./ffmpeg -hwaccel qsv -c:v h264_qsv \
> >     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> >     -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f
> null /dev/null \
> >     -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f
> > null /dev/null
> >
> >     test results:
> >                 2 scale  5 scale   10 scale
> >     Improved       12%     21%        21%
> >
> > For CPU only 1 decode to N scaling:
> > ./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> >     -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
> >     -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null
> >
> >     test results:
> >                 2 scale  5 scale   10 scale
> >     Improved       25%    107%       148%
> >
> > Signed-off-by: Wang, Shaofei <shaofei.wang at intel.com>
> > Reviewed-by: Zhao, Jun <jun.zhao at intel.com>
> > ---
> >  fftools/ffmpeg.c        | 121
> ++++++++++++++++++++++++++++++++++++++++++++----
> >  fftools/ffmpeg.h        |  14 ++++++
> >  fftools/ffmpeg_filter.c |   1 +
> >  3 files changed, 128 insertions(+), 8 deletions(-)
> 
> On a bit more review, I don't think this patch works at all.
> 
It has been tested and verified by a lot of cases. More fate cases need to be covered now.

> The existing code is all written to be run serially.  This simplistic approach to
> parallelising it falls down because many of those functions use variables
> written in what were previously other functions called at different times but
> have now become other threads, introducing undefined behaviour due to
> data races.
>
Actually, this is not a patch to parallel every thing in the ffmpeg. It just thread the input filter
of the filter graph(tend for simple filter graph), which is a simple way to improve N filter graph
performance and also without introduce huge modification. So that there is still a lot of serial function call, differences
are that each filter graph need to init its output stream instead of init all together and each
filter graph will reap filters for its filter chain.

> To consider a single example (not the only one), the function
> check_init_output_file() does not work at all after this change.  The test for
> OutputStream initialisation (so that you run exactly once after all of the
> output streams are ready) races with other threads setting those variables.
> Since that's undefined behaviour you may get lucky sometimes and have the
> output file initialisation run exactly once, but in general it will fail in unknown
> ways.
> 

The check_init_output_file() should be responsible for the output file related with
specified output stream which managed by each thread chain, that means even it
called by different thread, the data setting are different. Let me double check.

> If you want to resubmit this patch, you will need to refactor a lot of the other
> code in ffmpeg.c to rule out these undefined cases.
> 
OK. This patch would only effect on SIMPLE filter graph.

> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


More information about the ffmpeg-devel mailing list