[FFmpeg-user] Using a AMD Radeon RX5xx with ffmpeg

Lukas Obermann obermann.lukas at gmail.com
Wed Jul 25 10:18:57 EEST 2018


Thanks for that hint! I want to give it a try but where can I get OpenMAX IL from? 
I can not find a Ubuntu package, nor sources where I could compile it from. 
Most stuff returned is from xda developers on android.

> On 25.07.2018, at 06:16, Dennis Mungai <dmngaie at gmail.com> wrote:
> 
> From what I can gather, AMD's driver implementation for VAAPI (gallium?
> through mesa) is a work in progress, and compared to i915 (intel's), is
> quite behind.
> 
> On your system, are you able to build FFmpeg to utilize OMX IL? AMD has
> support for it via the VCE block. See this for an example on enabling it:
> https://github.com/legotheboss/YouTube-files/wiki/(RPi)-Compile-FFmpeg-with-the-OpenMAX-H.264-GPU-acceleration
> 
> The guide was written for the rPI, but what we're interested in is OpenMAX
> bellagio and the configuration switches that enable OpenMAX IL encoders.
> 
> 
> 
> On 24 July 2018 at 12:01, Lukas Obermann <obermann.lukas at gmail.com> wrote:
> 
>> Hello Dennis,
>> 
>> thank you for your help! Much appreciate it.
>> 
>> Using your command I get a 1.9x speed. So a slight improvement, but not
>> much.
>> I pasted the debug output here, maybe you can see something usefull?
>> https://pastebin.com/W0KKjZbN <https://pastebin.com/W0KKjZbN>
>> 
>> ad 1. Yes, there is the onboard intel device and 6 of those RX570 that in
>> the end I want to have all transcode stuff in parallel.
>> 
>> lukas at transcoder:~$ vainfo --display drm --device /dev/dri/card1
>> libva info: VA-API version 1.1.0
>> libva info: va_getDriverName() returns 0
>> libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/
>> radeonsi_drv_video.so
>> libva info: Found init function __vaDriverInit_1_1
>> libva info: va_openDriver() returns 0
>> vainfo: VA-API version: 1.1 (libva 2.1.0)
>> vainfo: Driver version: mesa gallium vaapi
>> vainfo: Supported profile and entrypoints
>>      VAProfileMPEG2Simple            : VAEntrypointVLD
>>      VAProfileMPEG2Main              : VAEntrypointVLD
>>      VAProfileVC1Simple              : VAEntrypointVLD
>>      VAProfileVC1Main                : VAEntrypointVLD
>>      VAProfileVC1Advanced            : VAEntrypointVLD
>>      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
>>      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
>>      VAProfileH264Main               : VAEntrypointVLD
>>      VAProfileH264Main               : VAEntrypointEncSlice
>>      VAProfileH264High               : VAEntrypointVLD
>>      VAProfileH264High               : VAEntrypointEncSlice
>>      VAProfileHEVCMain               : VAEntrypointVLD
>>      VAProfileHEVCMain10             : VAEntrypointVLD
>>      VAProfileJPEGBaseline           : VAEntrypointVLD
>>      VAProfileNone                   : VAEntrypointVideoProc
>> 
>> lukas at transcoder:~$ ls -la /dev/dri/
>> total 0
>> drwxr-xr-x   3 root root       340 Jul 23 15:47 .
>> drwxr-xr-x  19 root root      5020 Jul 23 15:47 ..
>> drwxr-xr-x   2 root root       320 Jul 23 15:47 by-path
>> crw-rw----+  1 root video 226,   0 Jul 23 15:47 card0
>> crw-rw----+  1 root video 226,   1 Jul 23 15:47 card1
>> crw-rw----+  1 root video 226,   2 Jul 23 15:47 card2
>> crw-rw----+  1 root video 226,   3 Jul 23 15:47 card3
>> crw-rw----+  1 root video 226,   4 Jul 23 15:47 card4
>> crw-rw----+  1 root video 226,   5 Jul 23 15:47 card5
>> crw-rw----+  1 root video 226,   6 Jul 23 15:47 card6
>> crw-rw----+  1 root video 226, 128 Jul 23 15:47 renderD128
>> crw-rw----+  1 root video 226, 129 Jul 23 15:47 renderD129
>> crw-rw----+  1 root video 226, 130 Jul 23 15:47 renderD130
>> crw-rw----+  1 root video 226, 131 Jul 23 15:47 renderD131
>> crw-rw----+  1 root video 226, 132 Jul 23 15:47 renderD132
>> crw-rw----+  1 root video 226, 133 Jul 23 15:47 renderD133
>> crw-rw----+  1 root video 226, 134 Jul 23 15:47 renderD134
>> 
>> 
>> ad 2. ok, understand. Is there a benefit of doing it that way?
>> 
>> ad 3. I have done two tests now with only the decoder running, which are
>> confusing me now even more.
>> 
>> So running following command:
>> ffmpeg -init_hw_device vaapi=amd:/dev/dri/renderD129 -hwaccel vaapi
>> -hwaccel_output_format vaapi -hwaccel_device amd -filter_hw_device amd -i
>> fs_experiental_method.avi  -f null -
>> 
>> Results in ~ 12x speed
>> frame= 5824 fps=340 q=-0.0 Lsize=N/A time=00:03:14.47 bitrate=N/A
>> speed=11.4x
>> 
>> But, using the CPU (a dual core pentium from last year)
>> ffmpeg -i fs_experiental_method.avi  -f null -
>> 
>> Results in ~ 14x speed
>> frame=10570 fps=408 q=-0.0 Lsize=N/A time=00:05:52.90 bitrate=N/A
>> speed=13.6x
>> 
>> Of course the vaapi one uses only like 10% of CPU while the CPU one uses
>> 100%.
>> 
>> The graph looks like this for vaapi:
>> 
>> [graph_1_in_0_1 @ 0x557f5d1cd5c0] Setting 'time_base' to value '1/32000'
>> [graph_1_in_0_1 @ 0x557f5d1cd5c0] Setting 'sample_rate' to value '32000'
>> [graph_1_in_0_1 @ 0x557f5d1cd5c0] Setting 'sample_fmt' to value 'fltp'
>> [graph_1_in_0_1 @ 0x557f5d1cd5c0] Setting 'channel_layout' to value '0x4'
>> [graph_1_in_0_1 @ 0x557f5d1cd5c0] tb:1/32000 samplefmt:fltp
>> samplerate:32000 chlayout:0x4
>> [format_out_0_1 @ 0x557f5d29fe40] Setting 'sample_fmts' to value 's16'
>> [format_out_0_1 @ 0x557f5d29fe40] auto-inserting filter 'auto_resampler_0'
>> between the filter 'Parsed_anull_0' and the filter 'format_out_0_1'
>> [AVFilterGraph @ 0x557f5d1ceec0] query_formats: 4 queried, 6 merged, 3
>> already done, 0 delayed
>> [auto_resampler_0 @ 0x557f5d2be600] [SWR @ 0x557f5d1d8200] Using fltp
>> internally between filters
>> [auto_resampler_0 @ 0x557f5d2be600] ch:1 chl:mono fmt:fltp r:32000Hz ->
>> ch:1 chl:mono fmt:s16 r:32000Hz
>> [graph 0 input from stream 0:0 @ 0x557f5d2c1e40] Setting 'video_size' to
>> value '1920x1080'
>> [graph 0 input from stream 0:0 @ 0x557f5d2c1e40] Setting 'pix_fmt' to
>> value '46'
>> [graph 0 input from stream 0:0 @ 0x557f5d2c1e40] Setting 'time_base' to
>> value '1/30'
>> [graph 0 input from stream 0:0 @ 0x557f5d2c1e40] Setting 'pixel_aspect' to
>> value '0/1'
>> [graph 0 input from stream 0:0 @ 0x557f5d2c1e40] Setting 'sws_param' to
>> value 'flags=2'
>> [graph 0 input from stream 0:0 @ 0x557f5d2c1e40] Setting 'frame_rate' to
>> value '30/1'
>> [graph 0 input from stream 0:0 @ 0x557f5d2c1e40] w:1920 h:1080
>> pixfmt:vaapi_vld tb:1/30 fr:30/1 sar:0/1 sws_param:flags=2
>> [AVFilterGraph @ 0x557f5d17e480] query_formats: 3 queried, 2 merged, 0
>> already done, 0 delayed
>> 
>> and like this for the cpu:
>> 
>> [graph_1_in_0_1 @ 0x55986d1cab40] Setting 'time_base' to value '1/32000'
>> [graph_1_in_0_1 @ 0x55986d1cab40] Setting 'sample_rate' to value '32000'
>> [graph_1_in_0_1 @ 0x55986d1cab40] Setting 'sample_fmt' to value 'fltp'
>> [graph_1_in_0_1 @ 0x55986d1cab40] Setting 'channel_layout' to value '0x4'
>> [graph_1_in_0_1 @ 0x55986d1cab40] tb:1/32000 samplefmt:fltp
>> samplerate:32000 chlayout:0x4
>> [format_out_0_1 @ 0x55986d19c740] Setting 'sample_fmts' to value 's16'
>> [format_out_0_1 @ 0x55986d19c740] auto-inserting filter 'auto_resampler_0'
>> between the filter 'Parsed_anull_0' and the filter 'format_out_0_1'
>> [AVFilterGraph @ 0x55986d090fc0] query_formats: 4 queried, 6 merged, 3
>> already done, 0 delayed
>> [auto_resampler_0 @ 0x55986d1ab680] [SWR @ 0x55986d0cd740] Using fltp
>> internally between filters
>> [auto_resampler_0 @ 0x55986d1ab680] ch:1 chl:mono fmt:fltp r:32000Hz ->
>> ch:1 chl:mono fmt:s16 r:32000Hz
>> [graph 0 input from stream 0:0 @ 0x55986d15f2c0] Setting 'video_size' to
>> value '1920x1080'
>> [graph 0 input from stream 0:0 @ 0x55986d15f2c0] Setting 'pix_fmt' to
>> value '0'
>> [graph 0 input from stream 0:0 @ 0x55986d15f2c0] Setting 'time_base' to
>> value '1/30'
>> [graph 0 input from stream 0:0 @ 0x55986d15f2c0] Setting 'pixel_aspect' to
>> value '0/1'
>> [graph 0 input from stream 0:0 @ 0x55986d15f2c0] Setting 'sws_param' to
>> value 'flags=2'
>> [graph 0 input from stream 0:0 @ 0x55986d15f2c0] Setting 'frame_rate' to
>> value '30/1'
>> [graph 0 input from stream 0:0 @ 0x55986d15f2c0] w:1920 h:1080
>> pixfmt:yuv420p tb:1/30 fr:30/1 sar:0/1 sws_param:flags=2
>> [AVFilterGraph @ 0x55986d196c40] query_formats: 3 queried, 2 merged, 0
>> already done, 0 delayed
>> 
>> 
>> I find it very strange that CPU decoding is faster then GPU decoding. Or
>> maybe is it a bottleneck? I am a bit lost right now I have to say.
>> 
>> 
>> 
>>> On 23.07.2018, at 22:51, Dennis Mungai <dmngaie at gmail.com> wrote:
>>> 
>>> Hello there,
>>> 
>>> Here's something you can try:
>>> 
>>> ffmpeg -init_hw_device vaapi=amd:/dev/dri/renderD129 -hwaccel vaapi
>>> -hwaccel_output_format vaapi -hwaccel_device amd -filter_hw_device amd -i
>>> fs_experiental_method.avi -vf 'format=nv12|vaapi,hwupload' -y -c:v
>>> h264_vaapi -qp:v 21 -sei +identifier+timing+recovery_point -profile:v
>> main
>>> -level 4 output.avi
>>> 
>>> Assumptions made:
>>> 
>>> 1. You have another GPU on the system. See the DRI device you highlighted
>>> (/dev/dri/card1) is implied to be the second render node because the
>> first
>>> ordinal device would have been /dev/dri/card0, mapped to
>>> /dev/dri/renderD128.
>>> 
>>> Confirm this by providing the output of:
>>> 
>>> (a). vainfo
>>> (b). ls -al /dev/dri/
>>> 
>>> 2. We explicitly initialize and name the hardware device
>>> (/dev/dri/renderD129) to 'amd' and pass it to both the decoder, encoder
>> and
>>> the video filtergraph.
>>> 
>>> 3. Observe the video filter graph. Here's what it does: The decoder will
>>> output either vaapi surfaces (if the hwaccel is usable) or software
>> frames
>>> (if it isn't). In the first case, it matches the vaapi format and
>> hwupload
>>> does nothing (it passes through hardware frames unchanged). In the second
>>> case, it matches the nv12 format and converts whatever the input is to
>>> that, then uploads.
>>> 
>>> This is done for safety reasons: Either way, the encoder will run.
>> However,
>>> depending on the path chosen (upload to memory vs native VAAPI hwdec),
>> your
>>> performance may vary.
>>> 
>>> Reference used:
>>> 
>>> 1. The VAAPI entry on FFmpeg wiki:
>>> https://trac.ffmpeg.org/wiki/Hardware/VAAPI
>>> 
>>> 2. The VAAPI encoders entry in the docs:
>>> http://www.ffmpeg.org/ffmpeg-codecs.html#VAAPI-encoders
>>> 
>>> On 23 July 2018 at 22:30, Lukas Obermann <obermann.lukas at gmail.com>
>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I want to use a RX570 for transcoding with ffmpeg. Have been looking
>> into
>>>> this for some time now and testing around various things.
>>>> I use Ubuntu 18.04 and I have it running with VAAPI. But the performance
>>>> is not good imo. For a 1080p file I only get like 1.8x speed. I was
>>>> expecting something around 6x to 8x.
>>>> Is VAAPI the right way to go here? I see that AMF is not yet ready for
>>>> linux and VDPAU only support decoding, not encoding.
>>>> 
>>>> Following is the command:
>>>> ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/card1
>> -hwaccel_output_format
>>>> vaapi -i fs_experiental_method.avi -y -c:v h264_vaapi -profile:v main
>>>> output.avi
>>>> 
>>>> ffmpeg version n4.0.2
>>>> mesa 18
>>>> amdgpu-pro-18.20-606296
>>>> libva: VA-API version 1.1.0
>>>> 
>>>> And here below the non-debug output of the command, to show the formats.
>>>> I would appreciate any help on this.
>>>> 
>>>> Thanks!
>>>> Lukas
>>>> 
>>>> 
>>>> ffmpeg version n4.0.2-2 Copyright (c) 2000-2018 the FFmpeg developers
>>>> built with gcc 7 (Ubuntu 7.3.0-16ubuntu3)
>>>> configuration: --prefix=/usr --extra-version=2 --toolchain=hardened
>>>> --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-
>> linux-gnu
>>>> --extra-cflags=-I/usr/local/include --extra-ldflags=-L/usr/local/lib
>>>> --enable-gpl --disable-stripping --enable-avresample --enable-avisynth
>>>> --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray
>>>> --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite
>>>> --enable-libfontconfig --enable-libfreetype --enable-libfribidi
>>>> --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa
>>>> --enable-libopenjpeg --enable-libopenmpt --enable-libopus
>> --enable-libpulse
>>>> --enable-librubberband --enable-librsvg --enable-libshine
>>>> --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh
>>>> --enable-libtheora --enable-libtwolame --enable-libvorbis
>> --enable-libvpx
>>>> --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2
>>>> --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx
>>>> --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394
>>>> --enable-libdrm --enable-libiec61883 --enable-chromaprint
>> --enable-frei0r
>>>> --enable-libx264 --enable-shared --enable-vaapi --enable-vdpau
>>>> libavutil      56. 14.100 / 56. 14.100
>>>> libavcodec     58. 18.100 / 58. 18.100
>>>> libavformat    58. 12.100 / 58. 12.100
>>>> libavdevice    58.  3.100 / 58.  3.100
>>>> libavfilter     7. 16.100 /  7. 16.100
>>>> libavresample   4.  0.  0 /  4.  0.  0
>>>> libswscale      5.  1.100 /  5.  1.100
>>>> libswresample   3.  1.100 /  3.  1.100
>>>> libpostproc    55.  1.100 / 55.  1.100
>>>> Input #0, avi, from 'fs_experiental_method.avi':
>>>> Metadata:
>>>>   encoder         : Lavf57.83.100
>>>> Duration: 00:33:38.10, start: 0.000000, bitrate: 8133 kb/s
>>>>   Stream #0:0: Video: h264 (Constrained Baseline) (H264 / 0x34363248),
>>>> yuv420p(progressive), 1920x1080, 8057 kb/s, 30 fps, 30 tbr, 30 tbn, 60
>> tbc
>>>>   Stream #0:1: Audio: mp3 (U[0][0][0] / 0x0055), 32000 Hz, mono, fltp,
>>>> 64 kb/s
>>>> Stream mapping:
>>>> Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_vaapi))
>>>> Stream #0:1 -> #0:1 (mp3 (mp3float) -> mp3 (libmp3lame))
>>>> Press [q] to stop, [?] for help
>>>> [h264_vaapi @ 0x55fcc47055c0] B frames are not supported (0x1) by the
>>>> underlying driver.
>>>> [h264_vaapi @ 0x55fcc47055c0] Warning: some packed headers are not
>>>> supported (want 0xd, got 0).
>>>> Output #0, avi, to 'output.avi':
>>>> Metadata:
>>>>   ISFT            : Lavf58.12.100
>>>>   Stream #0:0: Video: h264 (h264_vaapi) (Main) (H264 / 0x34363248),
>>>> vaapi_vld, 1920x1080, q=0-31, 30 fps, 30 tbn, 30 tbc
>>>>   Metadata:
>>>>     encoder         : Lavc58.18.100 h264_vaapi
>>>>   Stream #0:1: Audio: mp3 (libmp3lame) (U[0][0][0] / 0x0055), 32000 Hz,
>>>> mono, fltp
>>>>   Metadata:
>>>>     encoder         : Lavc58.18.100 libmp3lame
>>>> frame=  202 fps= 52 q=-0.0 Lsize=    4309kB time=00:00:06.80
>>>> bitrate=5187.5kbits/s speed=1.74x
>>>> video:4249kB audio:40kB subtitle:0kB other streams:0kB global
>> headers:0kB
>>>> muxing overhead: 0.444606%
>>>> _______________________________________________
>>>> ffmpeg-user mailing list
>>>> ffmpeg-user at ffmpeg.org
>>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>>>> 
>>>> To unsubscribe, visit link above, or email
>>>> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".
>>> _______________________________________________
>>> ffmpeg-user mailing list
>>> ffmpeg-user at ffmpeg.org
>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>>> 
>>> To unsubscribe, visit link above, or email
>>> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".
>> 
>> _______________________________________________
>> ffmpeg-user mailing list
>> ffmpeg-user at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>> 
>> To unsubscribe, visit link above, or email
>> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".
>> 
> _______________________________________________
> ffmpeg-user mailing list
> ffmpeg-user at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
> 
> To unsubscribe, visit link above, or email
> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".



More information about the ffmpeg-user mailing list