[FFmpeg-devel] [PATCH 00/14] OpenCL infrastructure, filters
Mark Thompson
sw at jkqxz.net
Sun Sep 10 23:53:24 EEST 2017
This series adds OpenCL infrastructure to support hwcontext use, with mapping between OpenCL and some other hardware APIs for GPU-only transformations. (It has no interaction whatsoever with the existing code in libavutil, and adds no new external API beyond the enum values.)
It includes two filters: one runs an arbitrary simple pixel shader on the input, the other implements the overlay filter in OpenCL to run on GPU-side images.
(Several earlier iterations of this went to libav.)
Thanks,
- Mark
Some examples:
Linux, i965 VAAPI + Beignet:
./ffmpeg_g -y -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -an -vf 'scale_vaapi=1280:720,hwmap=derive_device=opencl,program_opencl=test.cl:rotate_image,hwmap=derive_device=vaapi:reverse=1,format=vaapi' -c:v h264_vaapi -b 5M out.mp4
./ffmpeg_g -y -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device opencl=ocl at va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format vaapi -i in.mp4 -f image2 -r 1 -i overlays/%d.png -an -filter_hw_device ocl -filter_complex '[1:v]format=yuva420p,hwupload[x2]; [0:v]scale_vaapi=1280:720:yuv420p,hwmap[x1]; [x1][x2]overlay_opencl=0:0,program_opencl=test.cl:rotate_image,hwmap=derive_device=vaapi:reverse=1,format=vaapi,scale_vaapi=1280:720:nv12' -c:v h264_vaapi -b 5M out.mp4
(With a directory of RGBA PNGs overlays/%d.png to overlay.)
Windows, DXVA2 + OpenCL driver blob (any hardware):
./ffmpeg_g -y -hwaccel dxva2 -hwaccel_output_format dxva2_vld -i in.mp4 -an -vf 'hwmap=derive_device=opencl,program_opencl=test.cl:rotate_image,hwdownload,format=nv12' -c:v libx264 -b 5M out.mp4
(For Intel, OpenCL + libmfx on Windows doesn't work for me because of seeming bad interactions between drivers, might work for others. D3D11 matches DXVA2, except the D3D11_RESOURCE_MISC_SHARED flag is required to be set on textures which are mapped, and that doesn't currently happen with the default initialisation in ffmpeg.)
Linux, Intel Media VAAPI + Intel Media OpenCL blob + libmfx:
./ffmpeg_g -y -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -an -vf 'scale_vaapi=1280:720,hwmap=derive_device=opencl,program_opencl=test.cl:rotate_image,hwmap=derive_device=qsv:reverse=1,format=qsv' -c:v h264_qsv -b 5M out.mp4
Linux, Rockchip MPP + ARM Mali (needs the rkmpp patch):
./ffmpeg_g -y -c:v h264_rkmpp -i in.mp4 -an -vf 'hwmap=derive_device=opencl,program_opencl=source=test.cl:kernel=rotate_image,hwdownload,format=nv12' -c:v libx264 -b 5M out.mp4
Test pixel shader (used as test.cl above):
__kernel void rotate_image(__write_only image2d_t dst,
__read_only image2d_t src,
unsigned int index)
{
const sampler_t sampler = (CLK_NORMALIZED_COORDS_FALSE |
CLK_FILTER_LINEAR);
float angle = (float)index / 100;
float2 dst_dim = convert_float2(get_image_dim(dst));
float2 src_dim = convert_float2(get_image_dim(src));
float2 dst_cen = dst_dim / 2;
float2 src_cen = src_dim / 2;
int2 dst_loc = (int2)(get_global_id(0), get_global_id(1));
float2 dst_pos = convert_float2(dst_loc) - dst_cen;
float2 src_pos = {
cos(angle) * dst_pos.x - sin(angle) * dst_pos.y,
sin(angle) * dst_pos.x + cos(angle) * dst_pos.y
};
src_pos = src_pos * src_dim / dst_dim;
float2 src_loc = src_pos + src_cen;
if (src_loc.x < 0 || src_loc.y < 0 ||
src_loc.x > src_dim.x || src_loc.y > src_dim.y)
write_imagef(dst, dst_loc, 0.5);
else
write_imagef(dst, dst_loc, read_imagef(src, sampler, src_loc));
}
More information about the ffmpeg-devel
mailing list