[FFmpeg-user] M1/Apple Silicon Max Res for -hwaccel videotoolbox?

Fri Feb 18 22:04:11 EET 2022

I am assembling RTSP feeds from two cameras into one YouTube stream with ffmpeg on my M1 Mac Mini:

https://www.youtube.com/channel/UCIVY11504PcY2sy2qpRhiMg/live

If my cameras are set to output 1920x1080 each, then this works, with CPU utilization of about 25%:

./ffmpeg -thread_queue_size 2048 -hwaccel videotoolbox -i 'rtsp://anonymous:password1@192.168.1.13:554' -hwaccel videotoolbox -i 'rtsp://anonymous:password1@192.168.1.45:554' -vcodec h264_videotoolbox -b:v 5000k -acodec copy -t 02:00:00 -filter_complex "hstack=inputs=2,fps=20" -f flv "rtmp://a.rtmp.youtube.com/live2/<my-youtube-streaming-key>"

I am simultaneously recording the original raw streams on a PC on my LAN via Blue Iris. With my current setup, my local copies of the raw video are only 1920 x 1080, and the cameras are capable of 2592x1944.

YT accepts a maximum horizontal resolution of 3840, so I tried setting the cameras for 2592x1944 and scaling down via:

./ffmpeg -thread_queue_size 2048 -hwaccel videotoolbox -i 'rtsp://anonymous:password1@192.168.1.13:554' -hwaccel videotoolbox -i 'rtsp://anonymous:password1@192.168.1.45:554' -vcodec h264_videotoolbox -b:v 5000k -acodec copy -t 02:00:00 -filter_complex "[0:v]scale=1920:-1[left];[1:v]scale=1920:-1[right];[left][right]hstack"  -f flv "rtmp://a.rtmp.youtube.com/live2/<my-youtube-streaming-key>”

but that results in a stream of errors (full console dump at the bottom):

[h264 @ 0x12100d200] hardware accelerator failed to decode picture
[h264 @ 0x12100d800] hardware accelerator failed to decode picture
[h264 @ 0x12100de00] hardware accelerator failed to decode picture
[rtsp @ 0x12000ca00] max delay reached. need to consume packet
[rtsp @ 0x12000ca00] RTP: missed 141 packets
[rtsp @ 0x12000ca00] max delay reached. need to consume packet
[rtsp @ 0x12000ca00] RTP: missed 19 packets
[rtsp @ 0x12000ca00] max delay reached. need to consume packet
[rtsp @ 0x12000ca00] RTP: missed 289 packets
[rtsp @ 0x12000ca00] max delay reached. need to consume packet
[rtsp @ 0x12000ca00] RTP: missed 531 packets
[h264 @ 0x121026200] hardware accelerator failed to decode picture
[h264 @ 0x121026800] hardware accelerator failed to decode picture
[h264 @ 0x121026e00] hardware accelerator failed to decode picture

and, of course, the YT stream doesn’t work. 

If I remove  -hwaccel videotoolbox then it defaults to libx264, and it will stream, but CPU utilization on my Mac Mini goes from ~25% to 75%. 

What I don’t understand is that, if ffmpeg scales each 2592x1944 stream down to 1920x1440 before hstack, how is that different from combining two original 1920x1080 streams via hstack, other than the additional vertical pixels? Or does the scaling actually happen after hstack? Or is the limitation in the Y direction? Or am I doing this wrong? Or is this a question for Apple?

Full dump:

./ffmpeg -thread_queue_size 2048 -hwaccel videotoolbox -i 'rtsp://anonymous:password1@192.168.1.13:554' -hwaccel videotoolbox -i 'rtsp://anonymous:password1@192.168.1.45:554' -vcodec h264_videotoolbox -b:v 5000k -acodec copy -t 02:00:00 -filter_complex "[0:v]scale=1920:-1[left];[1:v]scale=1920:-1[right];[left][right]hstack"  -f flv "rtmp://a.rtmp.youtube.com/live2/<my-youtube-streaming-key>"
ffmpeg version 4.4 Copyright (c) 2000-2021 the FFmpeg developers
  built with Apple clang version 12.0.0 (clang-1200.0.32.27)
  configuration: --prefix=/Volumes/tempdisk/sw --extra-cflags=-fno-stack-check --arch=arm64 --cc=/usr/bin/clang --enable-gpl --enable-videotoolbox --enable-libopenjpeg --enable-libopus --enable-libmp3lame --enable-libx264 --enable-libx265 --enable-libvpx --enable-libwebp --enable-libass --enable-libfreetype --enable-libtheora --enable-libvorbis --enable-libsnappy --enable-libaom --enable-libvidstab --enable-libzimg --enable-libsvtav1 --enable-version3 --pkg-config-flags=--static --disable-ffplay --enable-postproc --enable-nonfree --enable-neon --enable-runtime-cpudetect --disable-indev=qtkit --disable-indev=x11grab_xcb
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, rtsp, from 'rtsp://anonymous:password1@192.168.1.13:554':
  Metadata:
    title           : Media Server
  Duration: N/A, start: 0.100000, bitrate: N/A
  Stream #0:0: Video: h264 (Main), yuv420p(progressive), 2592x1944, 20 fps, 20 tbr, 90k tbn, 180k tbc
Input #1, rtsp, from 'rtsp://anonymous:password1@192.168.1.45:554':
  Metadata:
    title           : Media Server
  Duration: N/A, start: 0.128000, bitrate: N/A
  Stream #1:0: Video: h264 (Main), yuv420p(progressive), 2592x1944, 20 fps, 20 tbr, 90k tbn, 180k tbc
  Stream #1:1: Audio: aac (LC), 8000 Hz, mono, fltp
Stream mapping:
  Stream #0:0 (h264) -> scale
  Stream #1:0 (h264) -> scale
  hstack -> Stream #0:0 (h264_videotoolbox)
  Stream #1:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
[h264 @ 0x12100d200] hardware accelerator failed to decode picture
[h264 @ 0x12100d800] hardware accelerator failed to decode picture
[h264 @ 0x12100de00] hardware accelerator failed to decode picture
[rtsp @ 0x12000ca00] max delay reached. need to consume packet
[rtsp @ 0x12000ca00] RTP: missed 141 packets
[rtsp @ 0x12000ca00] max delay reached. need to consume packet
[rtsp @ 0x12000ca00] RTP: missed 19 packets
[rtsp @ 0x12000ca00] max delay reached. need to consume packet
[rtsp @ 0x12000ca00] RTP: missed 289 packets
[rtsp @ 0x12000ca00] max delay reached. need to consume packet
[rtsp @ 0x12000ca00] RTP: missed 531 packets
[h264 @ 0x121026200] hardware accelerator failed to decode picture
[h264 @ 0x121026800] hardware accelerator failed to decode picture
[h264 @ 0x121026e00] hardware accelerator failed to decode picture