[FFmpeg-devel] [PATCH] lavfi: add opencl tonemap filter.

Mark Thompson sw at jkqxz.net
Tue May 22 15:41:23 EEST 2018


On 22/05/18 09:48, Song, Ruiling wrote:
>> -----Original Message-----
>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf Of
>> Mark Thompson
>> Sent: Tuesday, May 22, 2018 8:19 AM
>> To: ffmpeg-devel at ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH] lavfi: add opencl tonemap filter.
>>
>> On 21/05/18 07:50, Ruiling Song wrote:
>>> This filter does HDR(HDR10/HLG) to SDR conversion with tone-mapping.
>>>
>>> An example command to use this filter with vaapi codecs:
>>> FFMPEG -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device \
>>> opencl=ocl at va -hwaccel vaapi -hwaccel_device va -hwaccel_output_format \
>>> vaapi -i INPUT -filter_hw_device ocl -filter_complex \
>>> '[0:v]hwmap,tonemap_opencl=t=bt2020:tonemap=linear:format=p010[x1]; \
>>> [x1]hwmap=derive_device=vaapi:reverse=1' -c:v hevc_vaapi -profile 2
>> OUTPUT
>>>
>>> Signed-off-by: Ruiling Song <ruiling.song at intel.com>
>>> ---
>>
>> ...
>>
>>
>> On Mali:
>>
>> $ ./ffmpeg_g -v 55 -y -i ~/test/The\ World\ in\ HDR.mkv -init_hw_device opencl
>> -filter_hw_device opencl0 -an -vf
>> 'format=p010,hwupload,tonemap_opencl=t=bt2020:tonemap=linear:format=p0
>> 10,hwdownload,format=p010' -c:v libx264 out.mp4
>> ...
>> [tonemap_opencl @ 0x8201d7c0] Filter input: opencl, 3840x2160 (0).
>> [Parsed_tonemap_opencl_2 @ 0x8201d760] Failed to enqueue kernel: -5.
> The error seems map to OpenCL error CL_OUT_OF_RESOURCES. I don't have any idea yet.
> May be some limitation in the driver not queried?
> 
>>
>> That's an RK3288 with a Mali T760, clinfo: <https://0x0.st/se5r.txt>, full log:
>> <https://0x0.st/se5s.log>.
>>
>> (The Rockchip hardware decoder can do H.265 Main 10, but the output format
>> isn't P010 so it's easier to use VP9 here.)
> Not p010? Then which format? Planar?

It's two-plane like P010, but with the samples packed together to minimise the memory use - I think it uses all the bits to give you four (4 x 10 = 40 bits) luma samples (or two times two component chroma samples) in each five bytes (5 x 8 = 40 bits).  This form probably isn't usable directly by anything generic like OpenCL without more magic, though Rockchip's KMS and related processing code can handle it.

> And I don't quite understand here. What the relationship of format with VP9?

Oh, sorry - that's coming from the mostly-unrelated point that VP9 has much better software decode support in libavcodec.  Irrelevant, really - H.265 will also work.


>>> ...
>>> +
>>> +// detect peak/average signal of a frame, the algorithm was ported from:
>>> +// libplacebo (https://github.com/haasn/libplacebo)
>>> +struct detection_result
>>> +detect_peak_avg(global uint *util_buf, __local uint *sum_wg,
>>> +            float signal, float peak) {
>>> +    global uint *avg_buf = util_buf;
>>> +    global uint *peak_buf = avg_buf + DETECTION_FRAMES + 1;
>>> +    global uint *counter_wg_p = peak_buf + DETECTION_FRAMES + 1;
>>> +    global uint *max_total_p = counter_wg_p + 1;
>>> +    global uint *avg_total_p = max_total_p + 1;
>>> +    global uint *frame_idx_p = avg_total_p + 1;
>>> +    global uint *scene_frame_num_p = frame_idx_p + 1;
>>> +
>>> +    uint frame_idx = *frame_idx_p;
>>> +    uint scene_frame_num = *scene_frame_num_p;
>>> +
>>> +    size_t lidx = get_local_id(0);
>>> +    size_t lidy = get_local_id(1);
>>> +    size_t lsizex = get_local_size(0);
>>> +    size_t lsizey = get_local_size(1);
>>> +    uint num_wg = get_num_groups(0) * get_num_groups(1);
>>> +    size_t group_idx = get_group_id(0);
>>> +    size_t group_idy = get_group_id(1);
>>> +    struct detection_result r = {peak, sdr_avg};
>>> +    *sum_wg = 0;
>>
>> This is technically a data race - maybe set it in only the first workitem?
> When writing same value to it, this may be fine, we should still get correct result.
> But I agree it is better to only ask the first work-item to do the initialization.

C/C++ make it undefined behaviour, so even if when it's benign like this (writing the same value) I would prefer to avoid it.

>>> +    barrier(CLK_LOCAL_MEM_FENCE);
>>> +
>>> +    // update workgroup sum
>>> +    atomic_add(sum_wg, (uint)(signal * REFERENCE_WHITE));
>>
>> I think the numbers you're adding together here sum to at most something like
>> 16 * 16 * 100 * 1023?  Can you make sure this can't overflow and add a
>> comment on that.
> Niklas also pointed this out. It is 16 * 16 * 10000 at max. so, no overflow here.
> 
>>
>>> +    barrier(CLK_LOCAL_MEM_FENCE);
>>> +
>>> +    // update frame peak/avg using work-group-average.
>>> +    if (lidx == 0 && lidy == 0) {
>>> +        uint avg_wg = *sum_wg / (lsizex * lsizey);
>>> +        atomic_max(&peak_buf[frame_idx], avg_wg);
>>> +        atomic_add(&avg_buf[frame_idx], avg_wg);
>>
>> Similarly this one?  (width/16 * height/16 * 100 * 1023, I think, which might
>> overflow for 8K?)
>>
>>> +    }
>>> +
>>> +    if (scene_frame_num > 0) {
>>> +        float peak = (float)*max_total_p / (REFERENCE_WHITE *
>> scene_frame_num);
>>> +        float avg = (float)*avg_total_p / (REFERENCE_WHITE *
>> scene_frame_num);
>>> +        r.peak = max(1.0f, peak);
>>> +        r.average = max(0.25f, avg);
>>
>> fmax()?  (max() is an integer function, not sure what it does to 0.25f.)
> min()/max() also accept floating point values. You can refer chapter "6.12.4 Common Functions" in OpenCL Spec 1.2

Huh, you're right; that's fine then.  (Still confusing - fmax() is defined in 6.12.2, then max() for integers in 6.12.3, then separately max() for floats in 6.12.4.)

>>> ...
>>> +static int tonemap_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input)
>>> +{
>>> +    AVFilterContext    *avctx = inlink->dst;
>>> +    AVFilterLink     *outlink = avctx->outputs[0];
>>> +    TonemapOpenCLContext *ctx = avctx->priv;
>>> +    AVFrame *output = NULL;
>>> +    cl_int cle;
>>> +    int err;
>>> +    double peak = ctx->peak;
>>> +
>>> +    AVHWFramesContext *input_frames_ctx =
>>> +        (AVHWFramesContext*)input->hw_frames_ctx->data;
>>> +
>>> +    av_log(ctx, AV_LOG_DEBUG, "Filter input: %s, %ux%u (%"PRId64").\n",
>>> +           av_get_pix_fmt_name(input->format),
>>> +           input->width, input->height, input->pts);
>>> +
>>> +    if (!input->hw_frames_ctx)
>>> +        return AVERROR(EINVAL);
>>> +
>>> +    output = ff_get_video_buffer(outlink, outlink->w, outlink->h);
>>> +    if (!output) {
>>> +        err = AVERROR(ENOMEM);
>>> +        goto fail;
>>> +    }
>>> +
>>> +    err = av_frame_copy_props(output, input);
>>> +    if (err < 0)
>>> +        goto fail;

av_frame_copy_props() copies the side-data which will include the mastering/light-level information, but that's no longer valid after tonemapping?

>>> +
>>> +    if (!peak)
>>> +        peak = determine_signal_peak(input);
>>> +
>>> +    if (ctx->trc != -1)
>>> +        output->color_trc = ctx->trc;
>>> +    if (ctx->primaries != -1)
>>> +        output->color_primaries = ctx->primaries;
>>> +    if (ctx->colorspace != -1)
>>> +        output->colorspace = ctx->colorspace;
>>> +    if (ctx->range != -1)
>>> +        output->color_range = ctx->range;
>>> +
>>> +    ctx->trc_in = input->color_trc;
>>> +    ctx->trc_out = output->color_trc;
>>> +    ctx->colorspace_in = input->colorspace;
>>> +    ctx->colorspace_out = output->colorspace;
>>> +    ctx->primaries_in = input->color_primaries;
>>> +    ctx->primaries_out = output->color_primaries;
>>> +    ctx->range_in = input->color_range;
>>> +    ctx->range_out = output->color_range;
>>> +
>>> +    if (!ctx->initialised) {
>>> +        err = tonemap_opencl_init(avctx);
>>> +        if (err < 0)
>>> +            goto fail;
>>> +    }
>>> +
>>> +    switch(input_frames_ctx->sw_format) {
>>> +    case AV_PIX_FMT_P010:
>>> +        err = launch_kernel(avctx, ctx->kernel, output, input, peak);
>>> +        if (err < 0) goto fail;
>>> +        break;
>>> +    default:
>>> +        av_log(ctx, AV_LOG_ERROR, "unsupported format in
>> tonemap_opencl.\n");
>>> +        err = AVERROR(ENOSYS);
>>> +        goto fail;
>>> +    }
>>> +
>>> +    cle = clFinish(ctx->command_queue);
>>> +    if (cle != CL_SUCCESS) {
>>> +        av_log(avctx, AV_LOG_ERROR, "Failed to finish command queue: %d.\n",
>>> +               cle);
>>> +        err = AVERROR(EIO);
>>> +        goto fail;
>>> +    }
>>
>> It might be nice to add some debug output here showing the what
>> transformation was actually applied and maybe some of the persistent
>> parameters from util_buf (they would be easier to verify as sensible).
> I am not quite sure on this. What kind of message is preferred? Any specific idea?

I was thinking of the colour parameters and the peak/average values which actually got used for that frame.  It would be much easier to tell externally what is going on with that information, which would help with checking results.  (To be clear, I'm only thinking of this for debugging.)

Thanks,

- Mark


More information about the ffmpeg-devel mailing list