[FFmpeg-devel] [PATCH 00/25] V4L2 support for DRM_PRIME

Tue Sep 3 04:02:05 EEST 2019

From: Aman Gupta <aman at tmm1.net>

This patchset enables a zero-copy hardware accelerated
decode+scale+encode pipeline on the RPI4 via V4L2. It also
includes various patches submitted to ffmpeg-devel in the
past and from LibreELEC's ffmpeg fork.

The V4L2 decoders can either output software pixel formats,
or be used with `-hwaccel drm` to return AV_PIX_FMT_DRM_PRIME
frames. These drm_prime frames can be fed into both the new
vf_scale_v4l2m2m and the v4l2m2m encoders to achieve zero-copy
video pipelines which use very little CPU.

For example, when decoding h264 using the v4l2 hardware,
the decoder can either return a software nv12/yuv420p frame, or a
drm_prime frame (depending on whether the hwaccel has been activated):

  ./ffmpeg -c:v h264_v4l2m2m -i sample.mpg -map 0:v -f null -y /dev/null
  [h264_v4l2m2m_decoder @ 0x256d390] requesting formats: output=H264/none capture=NV12/nv12

  ./ffmpeg -hwaccel drm -hwaccel_output_format drm_prime -c:v h264_v4l2m2m -i sample.mpg -map 0:v -f null -y /dev/null
  [h264_v4l2m2m_decoder @ 0xefb400] requesting formats: output=H264/none capture=NV12/drm_prime

When feeding the decoded frames into the hardware encoder,
a huge reduction in CPU usage is seen when using drm_prime to avoid
copying frame data back and forth:

  ./ffmpeg -c:v h264_v4l2m2m -i sample.mpg -map 0:v -c:v h264_v4l2m2m -b:v 3000k -f null -y /dev/null
  [h264_v4l2m2m_decoder @ 0x2653800] requesting formats: output=H264/none capture=NV12/nv12
  [h264_v4l2m2m_encoder @ 0x2654410] requesting formats: output=NV12/nv12 capture=H264/none
  FPS=80 CPU=75%

  ./ffmpeg -hwaccel drm -hwaccel_output_format drm_prime -c:v h264_v4l2m2m -i sample.mpg -map 0:v -c:v h264_v4l2m2m -b:v 3000k -f null -y /dev/null
  [h264_v4l2m2m_decoder @ 0x19203f0] requesting formats: output=H264/none capture=NV12/drm_prime
  [h264_v4l2m2m_encoder @ 0x191f780] requesting formats: output=NV12/drm_prime capture=H264/none
  FPS=76 CPU=7%

Finally this patchset also adds a v4l2m2m scaler which takes
advantage of the broadcom ISP (Image Sensor Processor) available
on the RPI and exposed via v4l2. Again, using drm_prime references
between the scaler and encoder reduces CPU usage drastically:

  ./ffmpeg -c:v h264_v4l2m2m -i sample.mpg -map 0:v -vf scale_v4l2m2m=-2:480 -c:v h264_v4l2m2m -b:v 3000k -f null -y /dev/null
  [h264_v4l2m2m_decoder @ 0x1ac73b0] requesting formats: output=H264/none capture=NV12/nv12
  [scale_v4l2m2m @ 0x1b802d0] requesting formats: output=NV12/nv12 capture=NV12/nv12
  [h264_v4l2m2m_encoder @ 0x1ac7ce0] requesting formats: output=NV12/nv12 capture=H264/none
  FPS=84 CPU=60%

  ./ffmpeg -hwaccel drm -init_hw_device drm=v4l2drm -filter_hw_device v4l2drm -hwaccel_output_format drm_prime -c:v h264_v4l2m2m -i sample.mpg -map 0:v -vf scale_v4l2m2m=-2:480 -c:v h264_v4l2m2m -b:v 3000k -f null -y /dev/null
  [h264_v4l2m2m_decoder @ 0x22f74f0] requesting formats: output=H264/none capture=NV12/drm_prime
  [scale_v4l2m2m @ 0x239b440] requesting formats: output=NV12/drm_prime capture=NV12/drm_prime
  [h264_v4l2m2m_encoder @ 0x22f7080] requesting formats: output=NV12/drm_prime capture=H264/none
  FPS=73 CPU=10%

I've tested this extensively on the RPI3+ and RPI4. I am also
awaiting the arrival of some AMLogic hardware to verify the
patchset works there as expected.

I'm fairly confident in most of this patchset, however the
HWContext integration could use a few more eyes. I'm still
not certain whether it makes sense reuse `-hwaccel drm` here
(by allowing null/dummy drm device creation via the last commit),
or if a new `-hwaccel v4l2m2m` would be better. Unfortunately
AFAIK there is no standard way to map frames into v4l2 hardware,
so the only way to end up with a drm_prime frame is by feeding
pixel data into the decoder or scaler first.

----

Aman Gupta (19):
  avcodec/v4l2_context: ensure v4l2_dequeue does not hang in poll() when
    no buffers are pending
  avcodec/v4l2_m2m: disable info logging during device probe
  avcodec/v4l2_m2m: fix av_pix_fmt changing when multiple /dev/video*
    devices are probed
  avcodec/v4l2_m2m_enc: add support for -force_key_frames
  avcodec/v4l2_buffers: teach ff_v4l2_buffer_avframe_to_buf about
    contiguous planar formats
  avcodec/v4l2_m2m: decouple v4l2_m2m helpers from AVCodecContext
  avcodec/v4l2_m2m_enc: fix indentation and add M2MENC_CLASS macro
  avcodec/v4l2_m2m_dec: set pkt_dts on decoded frames to NOPTS
  avcodec/v4l2_buffers: split out AVFrame generation into helper method
  avcodec/v4l2_buffers: split out V4L2Buffer generation into helper
    method
  avcodec/v4l2_buffers: read height/width from the proper context
  avcodec/v4l2_m2m: add support for AV_PIX_FMT_DRM_PRIME
  avcodec/v4l2_m2m_dec: add support for AV_PIX_FMT_DRM_PRIME
  avcodec/v4l2_m2m_enc: add support for AV_PIX_FMT_DRM_PRIME
  avcodec/v4l2_context: expose timeout for dequeue_frame
  avfilter/vf_scale_v4l2m2m: add V4L2 M2M scaler
  avcodec/v4l2m2m: clean up buffer options and pick sane defaults
  avcodec/v4l2_buffers: use correct timebase for encoder/decoder/filter
  avcodec/v4l2_buffers: extract v4l2_timebase constant

Dave Stevenson (1):
  avcodec/v4l2_buffers: Add handling for NV21 and YUV420P

Jonas Karlman (1):
  hwcontext_drm: do not require drm device

Lukas Rusak (2):
  avcodec/v4l2_m2m_dec: fix indentation and add M2MDEC_CLASS macro
  avcodec/v4l2_buffers: split out v4l2_buf_increase_ref helper

Maxime Jourdan (2):
  avcodec/v4l2_m2m_dec: fix dropped packets while decoding
  avcodec/v4l2_context: set frame SAR using VIDIOC_CROPCAP

 configure                      |   2 +
 libavcodec/v4l2_buffers.c      | 378 ++++++++++++++++++++++++++++-----
 libavcodec/v4l2_buffers.h      |   4 +
 libavcodec/v4l2_context.c      | 120 +++++++++--
 libavcodec/v4l2_context.h      |  13 +-
 libavcodec/v4l2_m2m.c          |  81 +++----
 libavcodec/v4l2_m2m.h          |  32 ++-
 libavcodec/v4l2_m2m_dec.c      | 168 +++++++++++----
 libavcodec/v4l2_m2m_enc.c      |  82 ++++---
 libavfilter/Makefile           |   1 +
 libavfilter/allfilters.c       |   1 +
 libavfilter/vf_scale_v4l2m2m.c | 339 +++++++++++++++++++++++++++++
 libavutil/hwcontext_drm.c      |   5 +
 13 files changed, 1036 insertions(+), 190 deletions(-)
 create mode 100644 libavfilter/vf_scale_v4l2m2m.c

-- 
2.20.1