[FFmpeg-devel] [PATCH] libavcodec/qsv.c: Re-design session control and internal allocation

Mon Oct 26 11:22:38 CET 2015

Hi,

2015-10-23 16:52 GMT+02:00 wm4 <nfxjfg at googlemail.com>:
> On Fri, 23 Oct 2015 16:12:07 +0200
> Gwenole Beauchesne <gb.devel at gmail.com> wrote:
>
>> Hi,
>>
>> 2015-10-23 15:41 GMT+02:00 wm4 <nfxjfg at googlemail.com>:
>> > On Fri, 23 Oct 2015 14:54:56 +0200
>> > Gwenole Beauchesne <gb.devel at gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> 2015-10-07 18:40 GMT+02:00 wm4 <nfxjfg at googlemail.com>:
>> >> > On Wed, 7 Oct 2015 19:20:56 +0300
>> >> > Ivan Uskov <ivan.uskov at nablet.com> wrote:
>> >> >
>> >> >> Hello Hendrik,
>> >> >>
>> >> >> Wednesday, October 7, 2015, 5:58:25 PM, you wrote:
>> >> >>
>> >> >> HL> On Wed, Oct 7, 2015 at 4:41 PM, Ivan Uskov <ivan.uskov at nablet.com> wrote:
>> >> >>
>> >> >> HL> Global static variables are not acceptable, sorry.
>> >> >> HL> You'll have to find another way to solve your problem, but global
>> >> >> HL> state is not the way to go.
>> >> >> Unfortunately   I   do   not   have   ideas  how to provide single and common
>> >> >> memory  block  for  separate    modules   by  another  way.   Memory  mapping
>> >> >> files  looks not too portable and much more bulky solution then one global variable.
>> >> >> I  do  not  see  the  way to use appropriate field of AVCodecContext to share
>> >> >> global data.
>> >> >> Has anybody any ideas?
>> >> >> Is  me  missed  something?  There is really the necessary to have a global common
>> >> >> structure shared between decoder, vpp and decoder.
>> >> >>
>> >> >
>> >> > There's no automagic way to get this done.
>> >> >
>> >> > Hardware accelerators like vaapi and vdpau need the same thing. These
>> >> > have special APIs to set an API context on the decoder. Some APIs use
>> >> > hwaccel_context, vdpau uses av_vdpau_bind_context().
>> >> >
>> >> > The hwaccel_context pointer is untyped (just a void*), and what it means
>> >> > is implicit to the hwaccel or the decoder that is used. It could point
>> >> > to some sort of qsv state, which will have to be explicitly allocated
>> >> > and set by the API user (which might be ffmpeg.c).
>> >> >
>> >> > For filters there is no such thing yet. New API would have to be
>> >> > created. For filters in particular, we will have to make sure that the
>> >> > API isn't overly qsv-specific, and that it is extendable to other APIs
>> >> > (like for example vaapi or vdpau).
>> >>
>> >> I have been looking into a slightly different way: the common object
>> >> being transported is an AVFrame. So, my initial approach is to create
>> >> an AV_FRAME_DATA_HWACCEL metadata. Lookups could be optimized by
>> >> keeping around an AVFrameInternal struct that resides in the same
>> >> allocation unit as the AVFrame. But, this is a detail.
>> >>
>> >> From there, there are at least two usage models, when it comes to filters too:
>> >>
>> >> 1. Make the AVHWAccelFrame struct hold multiple hwaccel-specific info,
>> >> with a master one, and slave ones for various different APIs.
>> >> Advantage: a single AVFrame can be used and impersonified whenever
>> >> necessary. e.g. a VA surface master could be exported/re-used with an
>> >> mfxSurface, a dma_buf (for OpenCL), or userptr buffer. Drawback:
>> >> terribly tedious to manage.
>> >>
>> >> 2. Simpler approach: the AVHWAccelFrame holds a unique struct to
>> >> hwaccel specific data. Should we need to export that for use with
>> >> another API, it's not too complicated to av_frame_ref() + add new
>> >> hwaccel-specific metadata.
>> >
>> > It could be something like an API identifier (an enum or so) + API
>> > specific pointer to a struct.
>>
>> That's exactly that:
>>
>> /**
>>  * Hardware Accelerator identifier.
>>  *
>>  * @note
>>  * A hardware accelerator can be device-less. This means that only the
>>  * underlying hardware resource, e.g. a Linux dma_buf handle, is being
>>  * transported in the AVFrame. That hardware resource could be mapped
>>  * through standard OS-dependent calls, e.g. mmap() on Linux.
>>  */
>> enum AVHWAccelId {
>>     AV_HWACCEL_ID_NONE = -1,
>>     AV_HWACCEL_ID_VAAPI,
>>     AV_HWACCEL_ID_NB,           ///< Not part of ABI
>> };
>>
>> Name to be improved if people have better suggestions, as this really
>> is to be seen as HW resource, not necessarily attached to a particular
>> HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
>> VA surface.
>
> OK. (Minor nit: if ID_NONE is valid and means HW API without context,
> maybe it should be 0, not -1. Also, if it was meant this way, maybe
> these should still have their own ID for other purposes.)

In my current model, ID_NONE is not meant to be valid because the
hwaccel side data shall only exist for hwaccel purposes. Besides,
having ID_NONE set to -1 is consistent with other liavu enums and
convenient to have ID_NB express directly the exact number of
hwaccels.

>> I am reworking the patch series as I changed my mind again: current
>> map strategy was overly complex (and required to be). There were at
>> least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
>> am now preferring a unique av_hwaccel_frame_get_pixels() defined as
>> follow:
>>
>> /**
>>  * Returns AVFrame pixels into linear memory
>>  *
>>  * This function takes a snapshot of the underlying HW surface and
>>  * exposes it to SW backed memory. This may involve a copy from GPU
>>  * memory to CPU memory.
>>  *
>>  * @note
>>  * There is no effort underway to commit the modified pixels back to
>>  * GPU memory when the \ref dst AVFrame is released.
>>  *
>>  * @param[in] src       the source frame to read
>>  * @param[inout] dst    the target frame to update, or create if NULL
>>  * @param[in] flags     an optional combination of AV_FRAME_FLAG_xxx flags
>>  * @return 0 on success, an AVERROR code on failure.
>>  */
>> int
>> av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);
>>
>> i.e. the cost of allocating and copying AVFrame metadata should be
>> less than the actual work needed behind the scene. So, it looks like a
>> better interim solution for starters.
>
> So this is for read-access only, right? If it can map data, there
> also needs to be an unmap function, and the user would have to know
> about when to use it.

Well, put can be implementing by reversing src/dst in this function. :)
Actually, this can be av_hwaccel_frame_copy(), but the benefit of
having get_pixels() is to leave out the allocation business to lavu
and just having the user to bother about _unref().

> I don't know if something for write-access is required. It'd be
> potentially useful to API users, but they could also be bothered to
> implement their own code for this.
>
> I'm not sure how this plays into other uses, such as e.g. mapping a
> surface in OpenGL or OpenCL. But it's probably ok.

I don't think I want to implement OGL PBO mappings in lavu. :)
But allowing users to set up hwaccel metadata to allow for this is
probably desirable. Currently, I'd want to limit public APIs as much
as possible and expose whenever needed, i.e. more use-cases exposed.

>> >
>> >> For VA-API specific purposes, I then have:
>> >> - AVVADisplay, which is refcounted, and that can handle automatic
>> >> initialize/terminate calls ;
>> >> - AVVAFrameBuffer that holds an { AVVADisplay, VASurfaceID, and
>> >> possibly VAImageID } tuple (for mappings in non-USWC memory), and that
>> >> populates AVFrame.buf[0].
>> >
>> > Sounds good.
>> >
>> >> I am undecided yet on whether I'd create an AV_PIX_FMT_HWACCEL format
>> >> and allow hwaccel specific AVFrame to absorb an existing AVFrame, as a
>> >> transitional method from existing vaapi hwaccel to "new" vaapi
>> >> hwaccel. In that new generic hwaccel format model, buf[0] et al. would
>> >> be clearly defined, and data[i] not supposed to be touched by user
>> >> application. For now, the trend towards that option is low in my mind.
>> >
>> > I'd still have different pixfmts for each hwaccel, even if they behave
>> > the same. (The main reason being that hw accel init via get_format()
>> > requires it.)
>> >
>> > IMHO, one AVFrame plane pointer should point to your suggested
>> > AVHWAccelFrame. This would make more sense with how AVFrame
>> > refcounting works in the general case.
>> >
>> > AVFrame specifically demands that AVFrame.buf[] covers all memory
>> > pointed to by AVFrame.data. This doesn't make much sense with the
>> > current API, where AVFrame.data[3] is an API specific surface ID
>> > (usually an integer casted to a pointer), and the AVBufferRef doesn't
>> > really make any sense.
>>
>> Yes, that's also what I wanted to get rid of, at least for VA-API with
>> lavc managed objects.
>>
>> > With the new suggestion, the AVBufferRef would cover the memory used by
>> > AVHWAccelFrame. While not particularly useful, this is consistent, and
>> > also frees API users from worrying about what the AVBufferRef data/size
>> > fields should be set to.
>> >
>> > As for compatiblity with old code: maybe AVFrame.data[1] could
>> > point to something like this. But I think it's better to make a clean
>> > break.
>>
>> When is the next major bump scheduled to happen BTW? :)
>
> At least a year, though we're just about still in the phase where we can
> change the ABI freely. API can be deprecated and even disabled (i.e.
> the API is still there, but consists of stubs only).
>
> I'm not sure how others think about completely changing this.
> Personally I'd be fine with it.
>
>> For compatibility, that's also the idea behind another generic
>> AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
>> user-supplied pointers, and buf[i] shall be filled in by appropriate
>> accessors, or while creating the side-data, e.g.
>> av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
>> AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
>> would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
>> e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
>> the old-style fields should live untouched, hence the need to keep the
>> hwaccel side-data around.
>
> Isn't the problem more about output?
>
> Again, there's the problem with the current hwaccel API selecting the
> hwaccel with get_format(), just using the hwaccel-specific pixfmt.

I also envision a need for AVCodecContext.hwaccel_id field + possibly
.get_hwaccel(). Just so that to depart from that pixfmt tie.

> Also, AVFrame.buf[] should cover the memory referenced by
> AVFrame.data[]. It's merely a refcount helper for AVFrame.data[], and
> should not do additional things.
>
> I think using AVFrame side data for this would be a bit awkward.
> Possibly it _could_ be used to store things like VADisplay if we don't
> find a better way, but I think having a AVHWAccelFrame would be better.

Side data is quite simple to use, and ref-counted easily. I didn't
want to touch to AVFrame fields. Though, it's of course possible to
extend it with either public or external fields.

>> >
>> >> PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
>> >> crop information into there. Since this is only useful to hwaccel, no
>> >> need to populate AVFrame with additional fields IMHO.
>> >
>> > IMHO, crop information should be generally available, even for software
>> > surfaces. What we currently do are terrible hacks: align the X/Y
>> > coordinates to chroma boundaries and adjust the pointer (meaning
>> > everyone has to do with possibly unaligned pointers, and non-mod-2
>> > crops don't work correctly), and this also means you can have a
>> > non-mod-2 width/height, which doesn't really make sense for chroma.
>>
>> I don't really see why this would be needed for SW. AVFrame.buf[] will
>> hold the buffers as in "the original allocation". Then AVFrame.data[]
>> shall be filled in to fit the actual cropped/decoded region. Isn't it?
>
> Yes, AVFrame.buf[] does this, but you still don't know e.g. the
> original width, or even the pointer to a plane's original (0, 0) pixel
> if AVFrame.buf[0] covers all planes.
>
> I think doing cropping as metadata would also simplify code a lot. For
> example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
> should be handled. It's less tricky, more correct, more efficient.

OK. A crop side-data is desired then. I somehow was convinced that it
wouldn't matter for SW. Though, it would actually be really need that
the user doesn't have to care about anything and just use .data[]
appropriately. So, probably have internal_data[] fields for that SIMD
alignment needs?

Thanks,
-- 
Gwenole Beauchesne
Intel Corporation SAS / 2 rue de Paris, 92196 Meudon Cedex, France
Registration Number (RCS): Nanterre B 302 456 199