[FFmpeg-devel] [PATCH 00/14] OpenCL infrastructure, filters

Mark Thompson sw at jkqxz.net
Sun Sep 10 23:53:24 EEST 2017

This series adds OpenCL infrastructure to support hwcontext use, with mapping between OpenCL and some other hardware APIs for GPU-only transformations.  (It has no interaction whatsoever with the existing code in libavutil, and adds no new external API beyond the enum values.)

It includes two filters: one runs an arbitrary simple pixel shader on the input, the other implements the overlay filter in OpenCL to run on GPU-side images.

(Several earlier iterations of this went to libav.)


- Mark

Some examples:

Linux, i965 VAAPI + Beignet:

./ffmpeg_g -y -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -an -vf 'scale_vaapi=1280:720,hwmap=derive_device=opencl,program_opencl=test.cl:rotate_image,hwmap=derive_device=vaapi:reverse=1,format=vaapi' -c:v h264_vaapi -b 5M out.mp4

./ffmpeg_g -y -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device opencl=ocl at va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format vaapi -i in.mp4 -f image2 -r 1 -i overlays/%d.png -an -filter_hw_device ocl -filter_complex '[1:v]format=yuva420p,hwupload[x2]; [0:v]scale_vaapi=1280:720:yuv420p,hwmap[x1]; [x1][x2]overlay_opencl=0:0,program_opencl=test.cl:rotate_image,hwmap=derive_device=vaapi:reverse=1,format=vaapi,scale_vaapi=1280:720:nv12' -c:v h264_vaapi -b 5M out.mp4

(With a directory of RGBA PNGs overlays/%d.png to overlay.)

Windows, DXVA2 + OpenCL driver blob (any hardware):

./ffmpeg_g -y -hwaccel dxva2 -hwaccel_output_format dxva2_vld -i in.mp4 -an -vf 'hwmap=derive_device=opencl,program_opencl=test.cl:rotate_image,hwdownload,format=nv12' -c:v libx264 -b 5M out.mp4

(For Intel, OpenCL + libmfx on Windows doesn't work for me because of seeming bad interactions between drivers, might work for others.  D3D11 matches DXVA2, except the D3D11_RESOURCE_MISC_SHARED flag is required to be set on textures which are mapped, and that doesn't currently happen with the default initialisation in ffmpeg.)

Linux, Intel Media VAAPI + Intel Media OpenCL blob + libmfx:

./ffmpeg_g -y -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -an -vf 'scale_vaapi=1280:720,hwmap=derive_device=opencl,program_opencl=test.cl:rotate_image,hwmap=derive_device=qsv:reverse=1,format=qsv' -c:v h264_qsv -b 5M out.mp4

Linux, Rockchip MPP + ARM Mali (needs the rkmpp patch):

./ffmpeg_g -y -c:v h264_rkmpp -i in.mp4 -an -vf 'hwmap=derive_device=opencl,program_opencl=source=test.cl:kernel=rotate_image,hwdownload,format=nv12' -c:v libx264 -b 5M out.mp4

Test pixel shader (used as test.cl above):

__kernel void rotate_image(__write_only image2d_t dst,
                           __read_only  image2d_t src,
                           unsigned int index)
  const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |

  float angle = (float)index / 100;

  float2 dst_dim = convert_float2(get_image_dim(dst));
  float2 src_dim = convert_float2(get_image_dim(src));

  float2 dst_cen = dst_dim / 2;
  float2 src_cen = src_dim / 2;

  int2   dst_loc = (int2)(get_global_id(0), get_global_id(1));

  float2 dst_pos = convert_float2(dst_loc) - dst_cen;
  float2 src_pos = {
    cos(angle) * dst_pos.x - sin(angle) * dst_pos.y,
    sin(angle) * dst_pos.x + cos(angle) * dst_pos.y
  src_pos = src_pos * src_dim / dst_dim;

  float2 src_loc = src_pos + src_cen;

  if (src_loc.x < 0         || src_loc.y < 0 ||
      src_loc.x > src_dim.x || src_loc.y > src_dim.y)
    write_imagef(dst, dst_loc, 0.5);
    write_imagef(dst, dst_loc, read_imagef(src, sampler, src_loc));

More information about the ffmpeg-devel mailing list