[FFmpeg-devel] Added HW H.264 and HEVC encoding for AMD GPUs based on AMF SDK

Wed Nov 15 01:47:13 EET 2017

> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> Of Mark Thompson
> Sent: November 14, 2017 6:11 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] Added HW H.264 and HEVC encoding for AMD
> GPUs based on AMF SDK
> 
> On 14/11/17 22:10, Mironov, Mikhail wrote:
> >> On 14/11/17 17:14, Mironov, Mikhail wrote:
> >>>>>>>>> +    res = ctx->factory->pVtbl->CreateContext(ctx->factory,
> >>>>>>>>> + &ctx-
> >>>>>>> context);
> >>>>>>>>> +    AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> >>>> AVERROR_UNKNOWN,
> >>>>>>>> "CreateContext() failed with error %d\n", res);
> >>>>>>>>> +    // try to reuse existing DX device
> >>>>>>>>> +    if (avctx->hw_frames_ctx) {
> >>>>>>>>> +        AVHWFramesContext *device_ctx =
> >>>>>>>>> + (AVHWFramesContext*)avctx-
> >>>>>>>>> hw_frames_ctx->data;
> >>>>>>>>> +        if (device_ctx->device_ctx->type ==
> >>>>>> AV_HWDEVICE_TYPE_D3D11VA){
> >>>>>>>>> +            if (amf_av_to_amf_format(device_ctx->sw_format)
> >>>>>>>>> + ==
> >>>>>>>>> + AMF_SURFACE_UNKNOWN) {
> >>>>>>>>
> >>>>>>>> This test is inverted.
> >>>>>>>>
> >>>>>>>> Have you actually tested this path?  Even with that test fixed,
> >>>>>>>> I'm unable to pass the following initialisation test with an
> >>>>>>>> AMD
> >>>>>>>> D3D11
> >>>> device.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes, the condition should be reverted. To test I had to add
> >>>>>>> "-hwaccel d3d11va -hwaccel_output_format d3d11" to the
> command
> >>>> line.
> >>>>>>
> >>>>>> Yeah.  I get:
> >>>>>>
> >>>>>> $ ./ffmpeg_g -y -hwaccel d3d11va -hwaccel_device 0 -
> >>>>>> hwaccel_output_format d3d11 -i ~/bbb_1080_264.mp4 -an -c:v
> >> h264_amf
> >>>>>> out.mp4 ...
> >>>>>> [AVHWDeviceContext @ 000000000270e120] Created on device
> >> 1002:665f
> >>>>>> (AMD Radeon (TM) R7 360 Series).
> >>>>>> ...
> >>>>>> [h264_amf @ 00000000004dcd80] amf_shared: avctx-
> >hw_frames_ctx
> >>>> has
> >>>>>> non-AMD device, switching to default
> >>>>>>
> >>>>>> It's then comedically slow in this state (about 2fps), but works
> >>>>>> fine when the decode is in software.
> >>>>>
> >>>>> Is it possible that you also have iGPU not disabled and it is used
> >>>>> for
> >>>> decoding as adapter 0?
> >>>>
> >>>> There is an integrated GPU, but it's currently completely disabled.
> >>>> (I made
> >>>> <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-
> >>>> November/219795.html> to check that the device was definitely
> >>>> right.)
> >>>>
> >>>>> Can you provide a log from dxdiag.exe?
> >>>>
> >>>> <http://ixia.jkqxz.net/~mrt/DxDiag.txt>
> >>>>
> >>>>> If AMF created own DX device then submission logic an speed is the
> >>>>> same
> >>>> as from SW decoder.
> >>>>> It would be interesting to see a short GPUVIEW log.
> >>>>
> >>>> My Windows knowledge is insufficient to get that immediately, but
> >>>> if you think it's useful I can look into it?
> >>>
> >>> I think I know what is going on. You are on Win7. In Win7 D3D11VA
> >>> API is
> >> not available from MSFT.
> >>> AMF will fall into DX9 based encoding submission and this is why the
> >> message was produced.
> >>> The AMF performance should be the same on DX9 but I don’t know how
> >>> decoding is done without D3D11VA support.
> >>> GPUVIEW is not really needed if my assumptions are correct.
> >>
> >> Ah, that would make sense.  Maybe detect it and fail earlier with a
> >> helpful message - the current "not an AMD device" is wrong in this case.
> >>
> >> Decode via D3D11 does work for me on Windows 7 with both AMD and
> >> Intel; I don't know anything about how, though.  (I don't really care
> >> about Windows 7 - this was just a set of parts mashed together into a
> >> working machine for testing, the Windows 7 install is inherited from
> >> elsewhere.)
> >
> > I run this in Win7.  What I see is the decoding does go via D3D11VA.
> > The support comes with Platform Update. But AMF encoder works on Win7
> > via D3D9 only. That explains the performance hit: In D3D11 to copy video
> output HW accelerator copies frame via staging texture.
> > If I use for decoding DXVA2 it is faster because staging texture is not
> needed.
> > I am thinking to connect dxva2 acceleration with AMF encoder but
> > probably in the next phase.
> > I've added more precise logging.
> >
> >>
> >>>>>>>>> +    { "filler_data",    "Filler Data Enable",
> >> OFFSET(filler_data),
> >>>>>>>> AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
> >>>>>>>>> +    { "vbaq",           "Enable VBAQ",
> >>>> OFFSET(enable_vbaq),
> >>>>>>>> AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
> >>>>>>>>> +    { "frame_skipping", "Rate Control Based Frame Skip",
> >>>>>>>> OFFSET(skip_frame),         AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE
> },
> >>>>>>>>> +
> >>>>>>>>> +    /// QP Values
> >>>>>>>>> +    { "qp_i",           "Quantization Parameter for I-Frame",
> >>>> OFFSET(qp_i),
> >>>>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
> >>>>>>>>> +    { "qp_p",           "Quantization Parameter for P-Frame",
> >>>>>> OFFSET(qp_p),
> >>>>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
> >>>>>>>>> +    { "qp_b",           "Quantization Parameter for B-Frame",
> >>>>>> OFFSET(qp_b),
> >>>>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
> >>>>>>>>> +
> >>>>>>>>> +    /// Pre-Pass, Pre-Analysis, Two-Pass
> >>>>>>>>> +    { "preanalysis",    "Pre-Analysis Mode",
> >>>>>> OFFSET(preanalysis),
> >>>>>>>> AV_OPT_TYPE_BOOL,{ .i64 = 0 }, 0, 1, VE, NULL },
> >>>>>>>>> +
> >>>>>>>>> +    /// Maximum Access Unit Size
> >>>>>>>>> +    { "max_au_size",    "Maximum Access Unit Size for rate control
> >> (in
> >>>>>> bits)",
> >>>>>>>> OFFSET(max_au_size),        AV_OPT_TYPE_INT, { .i64 = 0 }, 0,
> >> INT_MAX,
> >>>> VE
> >>>>>> },
> >>>>>>>>
> >>>>>>>> Can you explain more about what this option does?  I don't seem
> >>>>>>>> to be able to get it to do anything - e.g. setting -max_au_size
> >>>>>>>> 80000 with 30fps CBR 1M (which should be easily achievable)
> >>>>>>>> still makes packets of more than 80000
> >>>>>>>> bits.)
> >>>>>>>>
> >>>>>>>
> >>>>>>> It means maximum frame size in bits, and it should be used
> >>>>>>> together with enforce_hrd enabled.  I tested, it works after the
> >>>>>>> related fix for
> >>>>>> enforce_hrd.
> >>>>>>> I added  dependency handling.
> >>>>>>
> >>>>>> $ ./ffmpeg_g -y -nostats -i ~/bbb_1080_264.mp4 -an -c:v h264_amf
> >>>>>> -bsf:v trace_headers -frames:v 1000 -enforce_hrd 1 -b:v 1M
> >>>>>> -maxrate 1M - max_au_size 80000 out.mp4 2>&1 | grep 'Packet: [0-
> 9]\{5\}'
> >>>>>> [AVBSFContext @ 00000000029d7f40] Packet: 11426 bytes, key
> frame,
> >>>>>> pts 128000, dts 128000.
> >>>>>> [AVBSFContext @ 00000000029d7f40] Packet: 17623 bytes, key
> frame,
> >>>>>> pts 192000, dts 192000.
> >>>>>> [AVBSFContext @ 00000000029d7f40] Packet: 23358 bytes, pts
> >>>>>> 249856, dts 249856.
> >>>>>>
> >>>>>> (That is, packets bigger than the supposed 80000-bit maximum.)
> >>>> Expected?
> >>>>>
> >>>>> No, this is not expected. I tried the exact command line and did
> >>>>> not get packages more then 80000 bits. Sorry to ask but did you
> >>>>> apply the
> >>>> change in amfenc.h?
> >>>>
> >>>> I used the most recent patch on the list,
> >>>> <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-
> >>>> November/219757.html>.  (Required a bit of fixup to apply, as
> >>>> Michael already noted.)
> >>>
> >>> Yes, I will submit the update today but I cannot repro large packets.
> >>> Can you just check if you get the change:
> >>>
> >>> - typedef     amf_uint16          amf_bool;
> >>> + typedef     amf_uint8          amf_bool;
> >>
> >> Yes, I have that change.
> >>
> >> Could it be a difference in support for the particular card I am
> >> using (Bonaire / GCN 2, so several generations old now), or will that
> >> be the same across all of them?
> >>
> >
> > I got a different clip and reproduced the issue. We discussed this with our
> main "rate control" guy.
> > Basically, this parameter cannot guarantee the frame size in a complex
> > scene case when it is combined with relatively low bit rate value  and
> relatively low max AU size value.
> > To confirm this it would be great if you could share your output stream so
> we verify that this is the case.
> > (or input stream).
> 
> Input:
> <http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1
> 080p_60fps_normal.mp4>
> Output: <http://ixia.jkqxz.net/~mrt/amf_max_au_size.mp4>
> 
> Looking at the transition on frame 976, the output quality is pretty bad, but
> not really bad enough to merit the failure - the macroblock QPs are only
> 37/38, and go higher on following frames.

Yes I see this but AMF default max QP is 46  for transcoding mode and on 
frame #976 there is scene change and encoder did reach the limit:
from Vega analyzer for this frame: max QP = 46, min QP = 37.
We can alternate the defaults in ffmpeg codec if desired.
Default settings in the encoder:
Transcoding: min= 18 max=46
All other modes: min=22 max = 48

> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Thanks,
Mikhail