[FFmpeg-devel] Added HW H.264 and HEVC encoding for AMD GPUs based on AMF SDK

Wed Nov 15 00:10:43 EET 2017

> On 14/11/17 17:14, Mironov, Mikhail wrote:
> >>>>>>> +    res = ctx->factory->pVtbl->CreateContext(ctx->factory,
> >>>>>>> + &ctx-
> >>>>> context);
> >>>>>>> +    AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> >> AVERROR_UNKNOWN,
> >>>>>> "CreateContext() failed with error %d\n", res);
> >>>>>>> +    // try to reuse existing DX device
> >>>>>>> +    if (avctx->hw_frames_ctx) {
> >>>>>>> +        AVHWFramesContext *device_ctx =
> >>>>>>> + (AVHWFramesContext*)avctx-
> >>>>>>> hw_frames_ctx->data;
> >>>>>>> +        if (device_ctx->device_ctx->type ==
> >>>> AV_HWDEVICE_TYPE_D3D11VA){
> >>>>>>> +            if (amf_av_to_amf_format(device_ctx->sw_format) ==
> >>>>>>> + AMF_SURFACE_UNKNOWN) {
> >>>>>>
> >>>>>> This test is inverted.
> >>>>>>
> >>>>>> Have you actually tested this path?  Even with that test fixed,
> >>>>>> I'm unable to pass the following initialisation test with an AMD
> >>>>>> D3D11
> >> device.
> >>>>>>
> >>>>>
> >>>>> Yes, the condition should be reverted. To test I had to add
> >>>>> "-hwaccel d3d11va -hwaccel_output_format d3d11" to the command
> >> line.
> >>>>
> >>>> Yeah.  I get:
> >>>>
> >>>> $ ./ffmpeg_g -y -hwaccel d3d11va -hwaccel_device 0 -
> >>>> hwaccel_output_format d3d11 -i ~/bbb_1080_264.mp4 -an -c:v
> h264_amf
> >>>> out.mp4 ...
> >>>> [AVHWDeviceContext @ 000000000270e120] Created on device
> 1002:665f
> >>>> (AMD Radeon (TM) R7 360 Series).
> >>>> ...
> >>>> [h264_amf @ 00000000004dcd80] amf_shared: avctx->hw_frames_ctx
> >> has
> >>>> non-AMD device, switching to default
> >>>>
> >>>> It's then comedically slow in this state (about 2fps), but works
> >>>> fine when the decode is in software.
> >>>
> >>> Is it possible that you also have iGPU not disabled and it is used
> >>> for
> >> decoding as adapter 0?
> >>
> >> There is an integrated GPU, but it's currently completely disabled.
> >> (I made
> >> <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-
> >> November/219795.html> to check that the device was definitely right.)
> >>
> >>> Can you provide a log from dxdiag.exe?
> >>
> >> <http://ixia.jkqxz.net/~mrt/DxDiag.txt>
> >>
> >>> If AMF created own DX device then submission logic an speed is the
> >>> same
> >> as from SW decoder.
> >>> It would be interesting to see a short GPUVIEW log.
> >>
> >> My Windows knowledge is insufficient to get that immediately, but if
> >> you think it's useful I can look into it?
> >
> > I think I know what is going on. You are on Win7. In Win7 D3D11VA API is
> not available from MSFT.
> > AMF will fall into DX9 based encoding submission and this is why the
> message was produced.
> > The AMF performance should be the same on DX9 but I don’t know how
> > decoding is done without D3D11VA support.
> > GPUVIEW is not really needed if my assumptions are correct.
> 
> Ah, that would make sense.  Maybe detect it and fail earlier with a helpful
> message - the current "not an AMD device" is wrong in this case.
> 
> Decode via D3D11 does work for me on Windows 7 with both AMD and Intel;
> I don't know anything about how, though.  (I don't really care about
> Windows 7 - this was just a set of parts mashed together into a working
> machine for testing, the Windows 7 install is inherited from elsewhere.)

I run this in Win7.  What I see is the decoding does go via D3D11VA. The support comes 
with Platform Update. But AMF encoder works on Win7 via D3D9 only. That explains 
the performance hit: In D3D11 to copy video output HW accelerator copies frame via staging texture. 
If I use for decoding DXVA2 it is faster because staging texture is not needed.
I am thinking to connect dxva2 acceleration with AMF encoder 
but probably in the next phase.
I've added more precise logging.

> 
> >>>>>>> +    { "filler_data",    "Filler Data Enable",
> OFFSET(filler_data),
> >>>>>> AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
> >>>>>>> +    { "vbaq",           "Enable VBAQ",
> >> OFFSET(enable_vbaq),
> >>>>>> AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
> >>>>>>> +    { "frame_skipping", "Rate Control Based Frame Skip",
> >>>>>> OFFSET(skip_frame),         AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
> >>>>>>> +
> >>>>>>> +    /// QP Values
> >>>>>>> +    { "qp_i",           "Quantization Parameter for I-Frame",
> >> OFFSET(qp_i),
> >>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
> >>>>>>> +    { "qp_p",           "Quantization Parameter for P-Frame",
> >>>> OFFSET(qp_p),
> >>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
> >>>>>>> +    { "qp_b",           "Quantization Parameter for B-Frame",
> >>>> OFFSET(qp_b),
> >>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
> >>>>>>> +
> >>>>>>> +    /// Pre-Pass, Pre-Analysis, Two-Pass
> >>>>>>> +    { "preanalysis",    "Pre-Analysis Mode",
> >>>> OFFSET(preanalysis),
> >>>>>> AV_OPT_TYPE_BOOL,{ .i64 = 0 }, 0, 1, VE, NULL },
> >>>>>>> +
> >>>>>>> +    /// Maximum Access Unit Size
> >>>>>>> +    { "max_au_size",    "Maximum Access Unit Size for rate control
> (in
> >>>> bits)",
> >>>>>> OFFSET(max_au_size),        AV_OPT_TYPE_INT, { .i64 = 0 }, 0,
> INT_MAX,
> >> VE
> >>>> },
> >>>>>>
> >>>>>> Can you explain more about what this option does?  I don't seem
> >>>>>> to be able to get it to do anything - e.g. setting -max_au_size
> >>>>>> 80000 with 30fps CBR 1M (which should be easily achievable) still
> >>>>>> makes packets of more than 80000
> >>>>>> bits.)
> >>>>>>
> >>>>>
> >>>>> It means maximum frame size in bits, and it should be used
> >>>>> together with enforce_hrd enabled.  I tested, it works after the
> >>>>> related fix for
> >>>> enforce_hrd.
> >>>>> I added  dependency handling.
> >>>>
> >>>> $ ./ffmpeg_g -y -nostats -i ~/bbb_1080_264.mp4 -an -c:v h264_amf
> >>>> -bsf:v trace_headers -frames:v 1000 -enforce_hrd 1 -b:v 1M -maxrate
> >>>> 1M - max_au_size 80000 out.mp4 2>&1 | grep 'Packet: [0-9]\{5\}'
> >>>> [AVBSFContext @ 00000000029d7f40] Packet: 11426 bytes, key frame,
> >>>> pts 128000, dts 128000.
> >>>> [AVBSFContext @ 00000000029d7f40] Packet: 17623 bytes, key frame,
> >>>> pts 192000, dts 192000.
> >>>> [AVBSFContext @ 00000000029d7f40] Packet: 23358 bytes, pts 249856,
> >>>> dts 249856.
> >>>>
> >>>> (That is, packets bigger than the supposed 80000-bit maximum.)
> >> Expected?
> >>>
> >>> No, this is not expected. I tried the exact command line and did not
> >>> get packages more then 80000 bits. Sorry to ask but did you apply
> >>> the
> >> change in amfenc.h?
> >>
> >> I used the most recent patch on the list,
> >> <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-
> >> November/219757.html>.  (Required a bit of fixup to apply, as Michael
> >> already noted.)
> >
> > Yes, I will submit the update today but I cannot repro large packets.
> > Can you just check if you get the change:
> >
> > - typedef     amf_uint16          amf_bool;
> > + typedef     amf_uint8          amf_bool;
> 
> Yes, I have that change.
> 
> Could it be a difference in support for the particular card I am using (Bonaire
> / GCN 2, so several generations old now), or will that be the same across all
> of them?
> 

I got a different clip and reproduced the issue. We discussed this with our main "rate control" guy. 
Basically, this parameter cannot guarantee the frame size in a complex scene case when it is combined 
with relatively low bit rate value  and relatively low max AU size value. 
To confirm this it would be great if you could share your output stream so we verify that this is the case.
(or input stream).

> Thanks,
> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Thanks,
Mikhail