[FFmpeg-devel] [PATCH] avcodec: only warn about hwaccel with frame threads
Hendrik Leppkes
h.leppkes at gmail.com
Mon Jan 25 20:01:02 CET 2016
On Mon, Jan 25, 2016 at 7:50 PM, Michael Niedermayer
<michael at niedermayer.cc> wrote:
> On Mon, Jan 25, 2016 at 04:39:49PM +0100, Hendrik Leppkes wrote:
>> On Mon, Jan 25, 2016 at 1:28 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> > Hi,
>> >
>> > On Mon, Jan 25, 2016 at 4:01 AM, wm4 <nfxjfg at googlemail.com> wrote:
>> >
>> >> On Sun, 24 Jan 2016 20:03:01 -0500
>> >> "Ronald S. Bultje" <rsbultje at gmail.com> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > On Sun, Jan 24, 2016 at 6:49 PM, Hendrik Leppkes <h.leppkes at gmail.com>
>> >> > wrote:
>> >>
>> >>
>> >> > > Unfortunately that doesn't alleviate the other issues, like the
>> >> > > complexity needed in the decoders during frame threading, or the extra
>> >> > > resources needed (extra image surfaces for every thread).
>> >> > >
>> >> >
>> >> > So, the extra code is just in the decoders, which already need it anyway
>> >> > (because they implement frame-mt), right? Or do hwaccels need extra code
>> >> > also?
>> >> >
>> >> > The extra resources aren't a big deal IMO. Memory use isn't typically a
>> >> big
>> >> > issue, we're adding a fw kb extra for contexts but practically all memory
>> >> > is in framebuffers regardless.
>> >>
>> >> It's can be a big deal for hardware decoding, because hw surfaces
>> >> might be a more constrained resource than system RAM. Also, you often
>> >> have to preallocate _all_ surfaces you're going to use, so you'll have
>> >> to add the exact number of additionally needed surfaces to the
>> >> preallocation.
>> >
>> >
>> > If only one thread is active, the rest never has to be inited and thus
>> > contains no surfaces (or framebuffers, or anything), right? If not, that
>> > should be a trivial win.
>> >
>>
>> If you can implement it like this, ie. only make one single thread do
>> the work, that would also avoid a bunch of the complexity with copying
>> contexts around and avoiding multiple init calls of the hwaccel.
>> On top of that, avoid the extra resource requirements and the delay
>> inherent to frame threading otherwise, since no extra frames are
>> "cached" inside the other worker threads.
>
> is there no hwaccel that (can) work(s) with MT ?
> iam bringing that up here before code is unconditionally removed that
> might be needed for such case
>
Like I explained in an earlier post above, hwaccels don't MT, they
execute async on a worker thread, but never more than one at the same
time.
The only reason someone might potentially see any speed up from using
hwaccel+MT is from the sheer lack of optimizations in
ffmpeg_<dxva2/vdpau>.c. Some very basic pipelining would give the same
speed up, instead of forcing the hardware to sync every frame
immediately as it is right now.
Thats really all the MT case does today: It adds a "delay", that
allows the hardware internally to work more in parallel, but you don't
need MT to do that, you can just buffer 2-4 output frames before
trying to process them and achieve the exact same speedup.
This behavior was confirmed by an NVIDIA engineer some years ago - the
hardware has several "stages", and for optimal performance you should
keep multiple frames inside the hardware. The decode APIs don't allow
this, so the GPU already returns you a frame while its still being
decoded in a later stage - and once you try to access it, the GPU has
to "sync" the frame and wait until its done. If you just buffer it for
a bit (say a 2 frame ring buffer), this bottleneck goes away and all
is fast.
- Hendrik
More information about the ffmpeg-devel
mailing list