[FFmpeg-user] HW Acceleration 101? 2-Up Streaming from RTSP-->ffmpeg-->YouTube

Steven Kan steven at kan.org
Mon Jan 18 20:18:42 EET 2021

I’ve been using fffmpeg as a “relay station” for a few years to pull RTSP streams from several IP cameras and push them to YouTube, such as my 24/7 BeeCam:

https://www.youtube.com/channel/UCE0jx2Z6qbc5Co8x8Kyisag/live <https://www.youtube.com/channel/UCE0jx2Z6qbc5Co8x8Kyisag/live>

Command is of the form:

./ffmpeg -re -thread_queue_size 512 -rtsp_transport tcp -i "rtsp://anonymous:password@ <rtsp://anonymous:password@>" -vcodec copy -acodec copy -t 01:00:00 -f flv "rtmp://a.rtmp.youtube.com/live2/<my-youtube-streaming-key> <rtmp://a.rtmp.youtube.com/live2/%3Cmy-youtube-streaming-key%3E>”

Because I’m deliberately just relaying the packets and _not_ doing any transcoding, the CPU utilization is remarkably low, and independent of camera resolution. I can run 3 instances of ffmpeg on Raspbian/Raspberry Pi 3B+ and each uses only about 10% of the CPU, despite each pushing a 5 MP camera stream.

This works very well; no problems here!

But now I want to do a “2-Up” live stream of two different cameras, side-by-side. Here’s an archive from last night (waiting for a mating pair of Barn Owls to move in):

https://www.youtube.com/watch?v=GDN2MjPwn0Q&feature=youtu.be <https://www.youtube.com/watch?v=GDN2MjPwn0Q&feature=youtu.be>

The cameras are each outputting 1920 x 1080 @ 25 fps.

Now that I’m actually encoding, I need a lot more CPU/GPU. I’m running this in Win10 Pro/64 on an HP Microserver with an AMD Opteron X3418 Quad-Core, and the CPU runs at about ~65-80% while the integrated GPU runs at about ~55%. 

C:\Program Files\ffmpeg\bin> .\ffmpeg.exe -re -thread_queue_size 1024 -i rtsp://anonymous:password@ <rtsp://anonymous:password@> -i rtsp://anonymous:password@ <rtsp://anonymous:password@> -vcodec h264_amf -acodec copy -t 01:47:02 -filter_complex "nullsrc=size=3840x1080 [base]; [0:v] setpts=PTS-STARTPTS, scale=1920x1080 [upperleft]; [1:v] setpts=PTS-STARTPTS, scale=1920x1080 [upperright]; [base][upperleft] overlay=shortest=1 [tmp1]; [tmp1][upperright] overlay=shortest=1:x=1920" -f flv "rtmp://a.rtmp.youtube.com/live2/my-youtube-streaming-key <rtmp://a.rtmp.youtube.com/live2/my-youtube-streaming-key>”

ffmpeg version 2020-12-09-git-7777e5119a-essentials_build-www.gyan.dev <http://www.gyan.dev/> Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 10.2.0 (Rev5, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      56. 62.100 / 56. 62.100
  libavcodec     58.115.102 / 58.115.102
  libavformat    58. 65.100 / 58. 65.100
  libavdevice    58. 11.103 / 58. 11.103
  libavfilter     7. 92.100 /  7. 92.100
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
Input #0, rtsp, from 'rtsp://anonymous:password@': <rtsp://anonymous:password@':>
    title           : Media Server
  Duration: N/A, start: 0.080000, bitrate: N/A
    Stream #0:0: Video: h264 (High), yuvj420p(pc, bt709, progressive), 1920x1080, 25 fps, 25 tbr, 90k tbn, 180k tbc
Input #1, rtsp, from 'rtsp://anonymous:password@': <rtsp://anonymous:password@':>
    title           : Media Server
  Duration: N/A, start: 0.100000, bitrate: N/A
    Stream #1:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, 100 tbr, 90k tbn, 180k tbc
    Stream #1:1: Audio: aac (LC), 8000 Hz, mono, fltp
Stream mapping:
  Stream #0:0 (h264) -> setpts
  Stream #1:0 (h264) -> setpts
  overlay -> Stream #0:0 (h264_amf)
  Stream #1:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
[rtsp @ 00000199e58e14c0] Thread message queue blocking; consider raising the thread_queue_size option (current value: 8)
[rtsp @ 00000199e586d9c0] max delay reached. need to consume packet
[rtsp @ 00000199e586d9c0] RTP: missed 157 packets
[swscaler @ 00000199e5cc5b80] deprecated pixel format used, make sure you did set range correctly
Output #0, flv, to 'rtmp://a.rtmp.youtube.com/live2/my-youtube-streaming-key': <rtmp://a.rtmp.youtube.com/live2/my-youtube-streaming-key':>
    title           : Media Server
    encoder         : Lavf58.65.100
    Stream #0:0: Video: h264 (h264_amf) ([7][0][0][0] / 0x0007), yuv420p(progressive), 3840x1080 [SAR 1:1 DAR 32:9], q=-1--1, 2000 kb/s, 25 fps, 1k tbn, 25 tbc (default)
      encoder         : Lavc58.115.102 h264_amf
    Stream #0:1: Audio: aac (LC) ([10][0][0][0] / 0x000A), 8000 Hz, mono, fltp
[rtsp @ 00000199e58e14c0] max delay reached. need to consume packet=3925.2kbits/s speed=0.589x
[rtsp @ 00000199e58e14c0] RTP: missed 191 packets
[flv @ 00000199e5df4040] Failed to update header with correct duration.0.0kbits/s speed=0.622x
[flv @ 00000199e5df4040] Failed to update header with correct filesize.
frame=   42 fps= 17 q=-0.0 Lsize=     732kB time=00:00:01.64 bitrate=3655.5kbits/s speed=0.647x

I get a few of those errors, but they don’t seem to be critical. The speed eventually stabilizes at right around 1x:

frame= 2483 fps= 25 q=-0.0 size=   25048kB time=00:01:39.20 bitrate=2068.5kbits/s speed=0.997x

 I know I’m using HW acceleration at some level, because the GPU is at 55%, whereas if I run it with libx264 instead of h264_amf, the CPU goes to ~90% and the GPU stays near 0%

But I don’t have enough horsepower to run two instances like this (e.g two streams, each of which encodes streams from 2 cameras). If I try, the CPU goes to 100%, the GPU goes to about 75%, and the second stream’s speed only gets about 0.5x, while the first stream’s speed drops to ~0.9x.

My goal is to run two instances of ffmpeg like this, 24/7, each at 3840 x 1080, 25 fps, without heating up my entire house, dragging down the North American power grid, or breaking the bank. Is this possible?

So my questions are:

Does ffmpeg use h264_amf for decoding, encoding, or both?
If I drop the camera outputs from 25 fps to 15 fps, ffmpeg still reports an output of 25 fps.
Is this because the two cameras are asynchronous?
It doesn’t seem to affect the CPU/GPU loading much, if at all, which I don’t understand.
Am I using the best available HW acceleration available on my hardware?
The GPU only goes to ~75% when I’m attempting to run 2 instances, despite the CPU is pegged at 100%
Is this where I’d want to use -hwaccel and/or VAAPI and/or ????
I’ve read through both:
https://trac.ffmpeg.org/wiki/HWAccelIntro <https://trac.ffmpeg.org/wiki/HWAccelIntro>
https://trac.ffmpeg.org/wiki/Hardware/VAAPI <https://trac.ffmpeg.org/wiki/Hardware/VAAPI>
and I can’t figure out how to invoke these. In particular, the VAAPI documentation reads like a Unix/Linux feature, given the device declaration like "-init_hw_device vaapi=foo:/dev/dri/renderD128”
Is h.265 more or less CPU/GPU intensive in this type of application? 
Bit rate isn’t very important to me.
At similar price points, do Intel GPUs with integrated GPUs provide better hardware encode/decode?
Would a discrete GPU, even an inexpensive one, provide any benefit?
I don’t really understand the following commands and whether they are helping or hurting my cause!
-rtsp_transport tcp
thread_queue_size 1024

Thanks! It’s all for the owls and bees!

More information about the ffmpeg-user mailing list