[FFmpeg-devel] [PATCH] libavfilter:add opencl warpper and opencl code for deshake

highgod0401 highgod0401 at gmail.com
Tue Feb 19 05:57:26 CET 2013


Hi, 


I am sorry that I was busy on writing the opencl scale code and share buffer code and did not check my email. I read the comments today.
I will modify the patch as the comments,but I have some questions as followed:


Don't you need to check for the presence of opencl
headers/libs/whatever?



I will add the check code, but I am not familiar with the configure file, should I reference the dxva2 part?


I'm not sure this is correct. Ideally the init should be done in the
module, so you don't need to init it in application code. This could
be moved to the filter code.


filter will use it and opendecode/openencode will use it too.So both codec and filter use the opencl init code that I am not sure where can I add the code to ffmpeg.


> +const char *kernel_deshake = KERNEL(

is this correct syntax?

Also public symbols should be prefixed by av_ or avfilter_ or ff_
(depending on the external visibility).


I will add "ff_", 


> +
> +inline unsigned char pixel(global const unsigned char *src,float x, float y,int w, int h,int stride, unsigned char def)
> +{
> +    return ((x) < 0 || (y) < 0) ? (def) : (((x) >= (w) || (y) >= (h)) ? (def) :src[(int)x + (int)y * (stride)]);

Additional parens around x, y, def (since this is not a macro they are
not relly required).

> +}

uint8_t is preferred over unsigned char. 


I will remove it,and use uint8_t instead.


> +inline const float av_clipf(float a, float amin, float amax)
> +{
> +    if      (a < amin) return amin;
> +    else if (a > amax) return amax;
> +    else               return a;
> +}

what's this good for (we already have an av_clipf() function)?


Yes,FFmpeg have the av_clipf() function.but the OpenCL kernel code will compile on GPU at the ffmpeg runtime,and you can see the "const char *kernel_deshake:", the openclwrapper code will pass the string to GPU to compile, so we need to rewrite the function for GPU to comiple.


 +kernel void avfilter_transform(global  unsigned char *src,
> +                               global  unsigned char *dst,
> +                               global          float *matrix,
> +                               global          float *matrix2,

> +                                                 int interpolate,
> +                                                 int fillmethod,
> +                                                 int src_stride_lu,

> +                                                 int dst_stride_lu,
> +                                                 int src_stride_ch,
> +                                                 int dst_stride_ch,
> +                                                 int height,
> +                                                 int width,
> +                                                 int ch,
> +                                                 int cw)

How does this relate to avfilter_transform() defined in
libavfilter/avfilter.h?


I reference the "avfilter_transform" function in transform.c, I must write it as a string and pass it to GPU. all the function in deshake_kernel.h is in one string "const char *kernel_deshake"


> +    OCLCHECK( clSetKernelArg, env->kernel, 12, sizeof(cl_int), (void*)&ch);
> +    OCLCHECK( clSetKernelArg, env->kernel, 13, sizeof(cl_int), (void*)&cw);

You could use a macro to ease readability/refactorization.


OCLCHECK is already a macro, it will run the function (the first parameter) and check the return value.Do you mean I should write a mocro for clSetKernelArg to encapsulate the first three parameters?

Please split the patch in two, one for libavfilter, and one for
libavutil, that should simplify review.

the opencl filter is dependent on the opencl wrapper to run and it uses the api of opencl wrapper.So, should I submit the opencl wrapper patch first and then submit the deshake opencl filter patch until the opencl wrapper patch accept?

> +#include "openclwrapper.h"
> +#include "avstring.h"
> +#include "log.h"

> +#include "libavfilter/deshake_kernel.h"

you are not supposed to include a libavfilter header from libavutil
(which is autocontained and must not depend on other FFmpeg
libraries, not even for compilation).

The wrapper will use the kernel string to compile on GPU, the deshake_kernel.h is just the kernel string.Should I move the file to libavutil?

> +#if defined(__APPLE__)
> +#include <OpenCL/cl.h>
> +#else
> +#include <CL/cl.h>
> +#endif
> +

This should be fixed at the configure level, feel free to ask if you
don't know why.

Sorry, I don't know what this mean.

> +#if defined(_MSC_VER)
> +#define strcasecmp strcmpi
> +#endif

we have a portable variant of strcasecmp, check in avstring.h.

I will check it.




highgod0401

From: Stefano Sabatini
Date: 2013-02-17 20:52
To: FFmpeg development discussions and patches
Subject: Re: [FFmpeg-devel] [PATCH] libavfilter:add opencl warpper and opencl code for deshake
On date Saturday 2013-02-16 11:24:55 +0800, Wei Gao encoded:
> I check the OpenCL Specification Version: 1.2, kernel functions and
> variables should have the "kernel" or "global" prefix. and the attachment
> is the fixed patch.
> 
> Thanks.

Sorry for the slow reply.

> 2013/2/7 highgod0401 <highgod0401 at gmail.com>
> 
> > **
> >  not many functions start with __ in ffmpeg code.
> > is this ok?
> >
> > I have remove the __global and __kernel
> >
> > ------------------------------
> > highgod0401
> >
> >  *From:* compn <tempn at twmi.rr.com>
> > *Date:* 2013-02-07 11:31
> > *To:* ffmpeg-devel <ffmpeg-devel at ffmpeg.org>
> > *Subject:* Re: [FFmpeg-devel] [PATCH] libavfilter:add opencl warpper and
> > opencl code for deshake
> >  On Wed, 6 Feb 2013 20:00:03 +0800, highgod0401 wrote:
> > >From 0d5e7ae7127f0037270d2cba7ee13efff8c72635 Mon Sep 17 00:00:00 2001
> > >From: highgod0401 <highgod0401 at gmail.com>
> > >Date: Wed, 6 Feb 2013 19:50:01 +0800
> > >Subject: [PATCH] add opencl warpper and opencl code for deshake
> >
> > >+__kernel void avfilter_transform(__global unsigned char *src,
> > >+                                 __global  unsigned char *dst,
> > >+                                 __global  float         *matrix,
> > >+                                 __global  float         *matrix2,
> > >+                                           int           interpolate,
> > >+                                           int           fillmethod,
> > >+                                           int           src_stride_lu,
> >
> > not many functions start with __ in ffmpeg code.
> > is this ok?
> >
> > -compn
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >

> From 1fc69eea5e927ad828b88fade1b01d9f11560203 Mon Sep 17 00:00:00 2001
> From: highgod0401 <highgod0401 at gmail.com>
> Date: Sat, 16 Feb 2013 11:18:40 +0800
> Subject: [PATCH] add opencl warpper and opencl code for deshake

wrapper

> 
> ---
>  configure                      |   4 +
>  ffmpeg.c                       |  12 +
>  libavfilter/Makefile           |   2 +
>  libavfilter/allfilters.c       |   1 +
>  libavfilter/deshake_kernel.h   | 182 +++++++++
>  libavfilter/transform_opencl.c | 153 ++++++++
>  libavfilter/transform_opencl.h |  38 ++
>  libavfilter/vf_deshake.c       | 209 ++++++++++-
>  libavutil/Makefile             |   4 +
>  libavutil/openclwrapper.c      | 812 +++++++++++++++++++++++++++++++++++++++++
>  libavutil/openclwrapper.h      | 202 ++++++++++
>  11 files changed, 1618 insertions(+), 1 deletion(-)
>  create mode 100644 libavfilter/deshake_kernel.h
>  create mode 100644 libavfilter/transform_opencl.c
>  create mode 100644 libavfilter/transform_opencl.h
>  create mode 100644 libavutil/openclwrapper.c
>  create mode 100644 libavutil/openclwrapper.h
> 
> diff --git a/configure b/configure
> index b61359c..b6d12b6 100755
> --- a/configure
> +++ b/configure
> @@ -140,6 +140,7 @@ Component options:
>    --disable-rdft           disable RDFT code
>    --disable-fft            disable FFT code
>    --enable-dxva2           enable DXVA2 code
> +  --enable-opencl          enable OpenCL code
>    --enable-vaapi           enable VAAPI code [autodetect]
>    --enable-vda             enable VDA code   [autodetect]
>    --enable-vdpau           enable VDPAU code [autodetect]
> @@ -1196,6 +1197,7 @@ CONFIG_LIST="
>      network
>      nonfree
>      openal
> +    opencl
>      openssl
>      pic
>      rdft

Don't you need to check for the presence of opencl
headers/libs/whatever?

> @@ -1990,6 +1992,7 @@ cropdetect_filter_deps="gpl"
>  decimate_filter_deps="gpl avcodec"
>  delogo_filter_deps="gpl"
>  deshake_filter_deps="avcodec"
> +deshake_opencl_filter_deps="opencl deshake_filter"
>  drawtext_filter_deps="libfreetype"
>  ebur128_filter_deps="gpl"
>  flite_filter_deps="libflite"
> @@ -4295,6 +4298,7 @@ echo "libx264 enabled           ${libx264-no}"
>  echo "libxavs enabled           ${libxavs-no}"
>  echo "libxvid enabled           ${libxvid-no}"
>  echo "openal enabled            ${openal-no}"
> +echo "opencl enabled            ${opencl-no}"
>  echo "openssl enabled           ${openssl-no}"
>  echo "zlib enabled              ${zlib-no}"
>  echo "bzlib enabled             ${bzlib-no}"
> diff --git a/ffmpeg.c b/ffmpeg.c
> index bbd21b6..38afbe8 100644
> --- a/ffmpeg.c
> +++ b/ffmpeg.c
> @@ -97,6 +97,10 @@
>  #include <pthread.h>
>  #endif
>  
> +#if CONFIG_OPENCL
> +#include "libavutil/openclwrapper.h"
> +#endif
> +
>  #include <time.h>
>  
>  #include "ffmpeg.h"
> @@ -3307,10 +3311,18 @@ int main(int argc, char **argv)
>  //         exit(1);
>  //     }
>  
> +#if CONFIG_OPENCL
> + if (av_init_opencl_run_env(0,NULL,"-I.",NULL)) {
> + av_log(NULL,AV_LOG_ERROR,"Init OpenCL Failed\n");
> + }
> +#endif

I'm not sure this is correct. Ideally the init should be done in the
module, so you don't need to init it in application code. This could
be moved to the filter code.

>      current_time = ti = getutime();
>      if (transcode() < 0)
>          exit(1);
>      ti = getutime() - ti;
> +#if CONFIG_OPENCL
> +    av_release_opencl_run_env();
> +#endif
>      if (do_benchmark) {
>          int maxrss = getmaxrss() / 1024;
>          printf("bench: utime=%0.3fs maxrss=%ikB\n", ti / 1000000.0, maxrss);
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index 938b183..69b8816 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -9,6 +9,7 @@ FFLIBS-$(CONFIG_ASYNCTS_FILTER)              += avresample
>  FFLIBS-$(CONFIG_ATEMPO_FILTER)               += avcodec
>  FFLIBS-$(CONFIG_DECIMATE_FILTER)             += avcodec
>  FFLIBS-$(CONFIG_DESHAKE_FILTER)              += avcodec
> +FFLIBS-$(CONFIG_DESHAKE_OPENCL_FILTER)       += avcodec
>  FFLIBS-$(CONFIG_MOVIE_FILTER)                += avformat avcodec
>  FFLIBS-$(CONFIG_MP_FILTER)                   += avcodec
>  FFLIBS-$(CONFIG_PAN_FILTER)                  += swresample
> @@ -108,6 +109,7 @@ OBJS-$(CONFIG_CROPDETECT_FILTER)             += vf_cropdetect.o
>  OBJS-$(CONFIG_DECIMATE_FILTER)               += vf_decimate.o
>  OBJS-$(CONFIG_DELOGO_FILTER)                 += vf_delogo.o
>  OBJS-$(CONFIG_DESHAKE_FILTER)                += vf_deshake.o
> +OBJS-$(CONFIG_DESHAKE_OPENCL_FILTER)         += vf_deshake.o transform_opencl.o
>  OBJS-$(CONFIG_DRAWBOX_FILTER)                += vf_drawbox.o
>  OBJS-$(CONFIG_DRAWTEXT_FILTER)               += vf_drawtext.o
>  OBJS-$(CONFIG_EDGEDETECT_FILTER)             += vf_edgedetect.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index 47158f9..1745611 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -102,6 +102,7 @@ void avfilter_register_all(void)
>      REGISTER_FILTER(DECIMATE,       decimate,       vf);
>      REGISTER_FILTER(DELOGO,         delogo,         vf);
>      REGISTER_FILTER(DESHAKE,        deshake,        vf);
> +    REGISTER_FILTER(DESHAKE_OPENCL, deshake_opencl, vf);
>      REGISTER_FILTER(DRAWBOX,        drawbox,        vf);
>      REGISTER_FILTER(DRAWTEXT,       drawtext,       vf);
>      REGISTER_FILTER(EDGEDETECT,     edgedetect,     vf);
> diff --git a/libavfilter/deshake_kernel.h b/libavfilter/deshake_kernel.h
> new file mode 100644
> index 0000000..ffdd12c
> --- /dev/null
> +++ b/libavfilter/deshake_kernel.h
> @@ -0,0 +1,182 @@
> +#include "libavutil/openclwrapper.h"
> +
> +const char *kernel_deshake = KERNEL(

is this correct syntax?

Also public symbols should be prefixed by av_ or avfilter_ or ff_
(depending on the external visibility).

> +
> +inline unsigned char pixel(global const unsigned char *src,float x, float y,int w, int h,int stride, unsigned char def)
> +{
> +    return ((x) < 0 || (y) < 0) ? (def) : (((x) >= (w) || (y) >= (h)) ? (def) :src[(int)x + (int)y * (stride)]);

Additional parens around x, y, def (since this is not a macro they are
not relly required).

> +}

uint8_t is preferred over unsigned char.


> +unsigned char interpolate_nearest(float x, float y, global const unsigned char *src,
> +                        int width, int height, int stride, unsigned char def)
> +{
> +    return pixel(src, (int)(x + 0.5), (int)(y + 0.5), width, height, stride, def);
> +}
> +
> +unsigned char interpolate_bilinear(float x, float y, global const unsigned char *src,
> +                        int width, int height, int stride, unsigned char def)
> +{
> +    int x_c, x_f, y_c, y_f;
> +    int v1, v2, v3, v4;
> +
> +    if (x < -1 || x > width || y < -1 || y > height) {
> +        return def;
> +    } else {
> +        x_f = (int)x;
> +        x_c = x_f + 1;
> +
> +        y_f = (int)y;
> +        y_c = y_f + 1;
> +
> +        v1 = pixel(src, x_c, y_c, width, height, stride, def);
> +        v2 = pixel(src, x_c, y_f, width, height, stride, def);
> +        v3 = pixel(src, x_f, y_c, width, height, stride, def);
> +        v4 = pixel(src, x_f, y_f, width, height, stride, def);
> +
> +        return (v1*(x - x_f)*(y - y_f) + v2*((x - x_f)*(y_c - y)) +
> +                v3*(x_c - x)*(y - y_f) + v4*((x_c - x)*(y_c - y)));
> +    }
> +}
> +
> +unsigned char interpolate_biquadratic(float x, float y, global const unsigned char *src,
> +                        int width, int height, int stride, unsigned char def)
> +{
> +    int     x_c, x_f, y_c, y_f;
> +    unsigned char v1,  v2,  v3,  v4;
> +    float   f1,  f2,  f3,  f4;
> +
> +    if (x < - 1 || x > width || y < -1 || y > height)
> +        return def;
> +    else {
> +        x_f = (int)x;
> +        x_c = x_f + 1;
> +        y_f = (int)y;
> +        y_c = y_f + 1;
> +
> +        v1 = pixel(src, x_c, y_c, width, height, stride, def);
> +        v2 = pixel(src, x_c, y_f, width, height, stride, def);
> +        v3 = pixel(src, x_f, y_c, width, height, stride, def);
> +        v4 = pixel(src, x_f, y_f, width, height, stride, def);
> +
> +        f1 = 1 - sqrt((x_c - x) * (y_c - y));
> +        f2 = 1 - sqrt((x_c - x) * (y - y_f));
> +        f3 = 1 - sqrt((x - x_f) * (y_c - y));
> +        f4 = 1 - sqrt((x - x_f) * (y - y_f));
> +        return (v1 * f1 + v2 * f2 + v3 * f3 + v4 * f4) / (f1 + f2 + f3 + f4);
> +    }
> +}
> +
> +

> +inline const float av_clipf(float a, float amin, float amax)
> +{
> +    if      (a < amin) return amin;
> +    else if (a > amax) return amax;
> +    else               return a;
> +}

what's this good for (we already have an av_clipf() function)?

> +
> +
> +kernel void avfilter_transform(global  unsigned char *src,
> +                               global  unsigned char *dst,
> +                               global          float *matrix,
> +                               global          float *matrix2,
> +                                                 int interpolate,
> +                                                 int fillmethod,
> +                                                 int src_stride_lu,
> +                                                 int dst_stride_lu,
> +                                                 int src_stride_ch,
> +                                                 int dst_stride_ch,
> +                                                 int height,
> +                                                 int width,
> +                                                 int ch,
> +                                                 int cw)

How does this relate to avfilter_transform() defined in
libavfilter/avfilter.h?

> +{
> +     int global_id = get_global_id(0);
> +
> +     global unsigned char *dstY = dst;
> +     global unsigned char *dstU = dstY + height * dst_stride_lu;
> +     global unsigned char *dstV = dstU + ch * dst_stride_ch;
> +
> +     global unsigned char *srcY = src;
> +     global unsigned char *srcU = srcY + height * src_stride_lu;
> +     global unsigned char *srcV = srcU + ch * src_stride_ch;
> +
> +     global unsigned char *tempdst;
> +     global unsigned char *tempsrc;
> +
> +     int x;
> +     int y;
> +     float x_s;
> +     float y_s;
> +     int tempsrc_stride;
> +     int tempdst_stride;
> +     int temp_height;
> +     int temp_width;
> +     int curpos;
> +     unsigned char def;
> +     if (global_id < width*height) {
> +        y = global_id/width;
> +        x = global_id%width;
> +        x_s = x * matrix[0] + y * matrix[1] + matrix[2];
> +        y_s = x * matrix[3] + y * matrix[4] + matrix[5];
> +        tempdst = dstY;
> +        tempsrc = srcY;
> +        tempsrc_stride = src_stride_lu;
> +        tempdst_stride = dst_stride_lu;
> +        temp_height = height;
> +        temp_width = width;
> +     }
> +     else if ((global_id >= width*height)&&(global_id < width*height + ch*cw)) {
> +        y = (global_id - width*height)/cw;
> +        x = (global_id - width*height)%cw;
> +        x_s = x * matrix2[0] + y * matrix2[1] + matrix2[2];
> +        y_s = x * matrix2[3] + y * matrix2[4] + matrix2[5];
> +        tempdst = dstU;
> +        tempsrc = srcU;
> +        tempsrc_stride = src_stride_ch;
> +        tempdst_stride = dst_stride_ch;
> +        temp_height = height;
> +        temp_width = width;
> +        temp_height = ch;
> +        temp_width = cw;
> +     }
> +     else {
> +        y = (global_id - width*height - ch*cw)/cw;
> +        x = (global_id - width*height - ch*cw)%cw;
> +        x_s = x * matrix2[0] + y * matrix2[1] + matrix2[2];
> +        y_s = x * matrix2[3] + y * matrix2[4] + matrix2[5];
> +        tempdst = dstV;
> +        tempsrc = srcV;
> +        tempsrc_stride = src_stride_ch;
> +        tempdst_stride = dst_stride_ch;
> +        temp_height = ch;
> +        temp_width = cw;
> +     }
> +     curpos = y * tempdst_stride + x;
> +     switch (fillmethod) {
> +        case 1:
> +            def = tempsrc[y*tempsrc_stride+x];
> +            break;
> +        case 2:
> +            y_s = av_clipf(y_s, 0, temp_height - 1);
> +            x_s = av_clipf(x_s, 0, temp_width - 1);
> +            def = tempsrc[(int)y_s * tempsrc_stride + (int)x_s];
> +            break;
> +        case 3:
> +            y_s = (y_s < 0) ? -y_s : (y_s >= temp_height) ? (temp_height + temp_height - y_s) : y_s;
> +            x_s = (x_s < 0) ? -x_s : (x_s >= temp_width) ? (temp_width + temp_width - x_s) : x_s;
> +            def = tempsrc[(int)y_s * tempsrc_stride + (int)x_s];
> +            break;
> +         }
> +    switch (interpolate) {
> +        case 0:
> +            tempdst[curpos] = interpolate_nearest(x_s, y_s, tempsrc, temp_width, temp_height, tempsrc_stride, def);
> +            break;
> +        case 1:
> +            tempdst[curpos] = interpolate_bilinear(x_s, y_s, tempsrc, temp_width, temp_height, tempsrc_stride, def);
> +            break;
> +        case 2:
> +            tempdst[curpos] = interpolate_biquadratic(x_s, y_s, tempsrc, temp_width, temp_height, tempsrc_stride, def);
> +            break;
> +        }
> +}
> +
> +);
> diff --git a/libavfilter/transform_opencl.c b/libavfilter/transform_opencl.c
> new file mode 100644
> index 0000000..0b57710
> --- /dev/null
> +++ b/libavfilter/transform_opencl.c
> @@ -0,0 +1,153 @@
> +/*
> + * Copyright (C) 2013 Wei Gao <weigao at multicorewareinc.com>
> + *
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * transform input video
> + */
> +
> +#include "libavutil/common.h"
> +#include "libavutil/avassert.h"
> +#include "libavutil/openclwrapper.h"
> +#include "transform.h"
> +#include "transform_opencl.h"
> +
> +
> +
> +static int ff_filter_transform_func(void **userdata, KernelEnv *kenv)
> +{
> +    cl_mem src = (cl_mem)userdata[0];
> +    cl_mem dst = (cl_mem)userdata[1];
> +    int src_stride_lu = (int)userdata[2];
> +    int dst_stride_lu = (int)userdata[3];
> +    int src_stride_ch = (int)userdata[4];
> +    int dst_stride_ch = (int)userdata[5];
> +    int width      = (int)userdata[6];
> +    int height     = (int)userdata[7];
> +    int cw          = (int)userdata[8];
> +    int ch     = (int)userdata[9];
> +    float *matrix  = (float *)userdata[10];
> +    float *matrix2  = (float *)userdata[11];
> +    int interpolate = (int)userdata[12];
> +    int fillmethod  = (int)userdata[13];
> +    cl_mem matrix_buf = (cl_mem)userdata[14];
> +    cl_mem matrix_buf2  = (cl_mem)userdata[15];
> +    KernelEnv *env  = (KernelEnv *)userdata[16];
> +    cl_uint status;
> +    void *mapped;
> +    const size_t global_work_size = width * height + 2 * ch * cw;
> +    int m_size   = 6;
> +
> +    mapped = clEnqueueMapBuffer(kenv->command_queue, matrix_buf, CL_TRUE, CL_MAP_WRITE, 0, m_size*sizeof(cl_float),0,NULL, NULL, NULL);
> +    memcpy(mapped,matrix,m_size*sizeof(cl_float));
> +    clEnqueueUnmapMemObject(kenv->command_queue, matrix_buf, mapped, 0, NULL, NULL);
> +
> +    mapped = clEnqueueMapBuffer(kenv->command_queue, matrix_buf2, CL_TRUE, CL_MAP_WRITE, 0, m_size*sizeof(cl_float),0,NULL, NULL, NULL);
> +    memcpy(mapped,matrix2,m_size*sizeof(cl_float));
> +    clEnqueueUnmapMemObject(kenv->command_queue, matrix_buf2, mapped, 0, NULL, NULL);
> +
> +    if (NULL == env->kernel) {
> +        status =  av_create_kernel("avfilter_transform", kenv);
> +        if (status) {
> +            av_log(NULL,AV_LOG_ERROR,"clCreateKernel Error %s\n","avfilter_transform");
> +            return 0;
> +        }
> +        env->command_queue = kenv->command_queue;
> +        env->context = kenv->context;
> +        env->kernel = kenv->kernel;
> +        strcpy(env->kernel_name,kenv->kernel_name);
> +        env->program = kenv->program;
> +    }
> +

> +    OCLCHECK( clSetKernelArg, env->kernel, 0, sizeof(cl_mem),  (void*)&src);
> +    OCLCHECK( clSetKernelArg, env->kernel, 1, sizeof(cl_mem),  (void*)&dst);
> +    OCLCHECK( clSetKernelArg, env->kernel, 2, sizeof(cl_mem),  (void*)&matrix_buf);
> +    OCLCHECK( clSetKernelArg, env->kernel, 3, sizeof(cl_mem),  (void*)&matrix_buf2);
> +    OCLCHECK( clSetKernelArg, env->kernel, 4, sizeof(cl_int),  (void*)&interpolate);
> +    OCLCHECK( clSetKernelArg, env->kernel, 5, sizeof(cl_int),  (void*)&fillmethod);
> +    OCLCHECK( clSetKernelArg, env->kernel, 6, sizeof(cl_int),  (void*)&src_stride_lu);
> +    OCLCHECK( clSetKernelArg, env->kernel, 7, sizeof(cl_int),  (void*)&dst_stride_lu);
> +    OCLCHECK( clSetKernelArg, env->kernel, 8, sizeof(cl_int),  (void*)&src_stride_ch);
> +    OCLCHECK( clSetKernelArg, env->kernel, 9, sizeof(cl_int),  (void*)&dst_stride_ch);
> +    OCLCHECK( clSetKernelArg, env->kernel, 10, sizeof(cl_int), (void*)&height);
> +    OCLCHECK( clSetKernelArg, env->kernel, 11, sizeof(cl_int), (void*)&width);
> +    OCLCHECK( clSetKernelArg, env->kernel, 12, sizeof(cl_int), (void*)&ch);
> +    OCLCHECK( clSetKernelArg, env->kernel, 13, sizeof(cl_int), (void*)&cw);

You could use a macro to ease readability/refactorization.

> +
> +    OCLCHECK( clEnqueueNDRangeKernel, env->command_queue, env->kernel, 1, NULL,
> +              &global_work_size, NULL, 0, NULL, NULL);
> +    clFinish(kenv->command_queue);//add for time test
> +    return 1;
> +}
> +
> +
> +void avfilter_transform_cl( void *src,  void *dst,
> +                        int src_stride_lu, int dst_stride_lu,
> +                        int src_stride_ch, int dst_stride_ch,
> +                        int width, int height, int cw, int ch,
> +                        const float *matrix, const float *matrix2,
> +                        const void *matrix_cl, const void *matrix2_cl,
> +                        enum InterpolateMethod interpolate,
> +                        enum FillMethod fill,KernelEnv *env)
> +
> +{
> +        int interpolate_t = interpolate;
> +        int fillmethod    = fill;
> +        void *userdata[17];
> +
> +        userdata[0] = (void *)src;
> +        userdata[1] = (void *)dst;
> +        userdata[2] = (void *)src_stride_lu;
> +        userdata[3] = (void *)dst_stride_lu;
> +        userdata[4] = (void *)src_stride_ch;
> +        userdata[5] = (void *)dst_stride_ch;
> +        userdata[6] = (void *)width;
> +        userdata[7] = (void *)height;
> +        userdata[8] = (void *)cw;
> +        userdata[9] = (void *)ch;
> +        userdata[10] = (void *)matrix;
> +        userdata[11] = (void *)matrix2;
> +        userdata[12] = (void *)interpolate_t;
> +        userdata[13] = (void *)fillmethod;
> +        userdata[14] = (void *)matrix_cl;
> +        userdata[15] = (void *)matrix2_cl;
> +        userdata[16] = (void *)env;
> +
> +
> +
> +        if(!av_run_kernel("avfilter_transform", userdata))
> +        {
> +            av_log( NULL,AV_LOG_ERROR,"run kernel[%s] faild\n", "avfilter_transform" );
> +            return;
> +        }
> +}
> +
> +int ff_init_transform(void)
> +{
> +    int st = av_register_kernel_wrapper( "avfilter_transform", ff_filter_transform_func);
> +    if (!st) {
> +        av_log(NULL,AV_LOG_ERROR, "register kernel[%s] faild\n", "avfilter_transform" );
> +        return -1;
> +    }
> +    return 0;
> +}
> +
> +
> diff --git a/libavfilter/transform_opencl.h b/libavfilter/transform_opencl.h
> new file mode 100644
> index 0000000..c792c55
> --- /dev/null
> +++ b/libavfilter/transform_opencl.h
> @@ -0,0 +1,38 @@
> +/*
> + * Copyright (C) 2013 Wei Gao <weigao at multicorewareinc.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef AVFILTER_TRANSFORM_OPENCL_H
> +#define AVFILTER_TRANSFORM_OPENCL_H
> +
> +#include <stdint.h>
> +
> +void avfilter_transform_cl( void *src,  void *dst,
> +                        int src_stride_lu, int dst_stride_lu,
> +                        int src_stride_ch, int dst_stride_ch,
> +                        int width, int height, int cw, int ch,
> +                        const float *matrix, const float *matrix2,
> +                        const void *matrix_cl, const void *matrix2_cl,
> +                        enum InterpolateMethod interpolate,
> +                        enum FillMethod fill, KernelEnv *env);
> +int ff_init_transform(void);
> +
> +
> +
> +#endif /* AVFILTER_TRANSFORM_H */
> diff --git a/libavfilter/vf_deshake.c b/libavfilter/vf_deshake.c
> index c03919c..f8f1f93 100644
> --- a/libavfilter/vf_deshake.c
> +++ b/libavfilter/vf_deshake.c
> @@ -1,6 +1,7 @@
>  /*
>   * Copyright (C) 2010 Georg Martius <georg.martius at web.de>
>   * Copyright (C) 2010 Daniel G. Taylor <dan at programmer-art.org>
> + * Modified by Wei Gao <weigao at multicorewareinc.com>
>   *
>   * This file is part of FFmpeg.
>   *
> @@ -59,6 +60,10 @@
>  #include "libavcodec/dsputil.h"
>  
>  #include "transform.h"
> +#if CONFIG_DESHAKE_OPENCL_FILTER
> +#include "libavutil/openclwrapper.h"
> +#include "transform_opencl.h"
> +#endif
>  
>  #define CHROMA_WIDTH(link)  -((-link->w) >> av_pix_fmt_desc_get(link->format)->log2_chroma_w)
>  #define CHROMA_HEIGHT(link) -((-link->h) >> av_pix_fmt_desc_get(link->format)->log2_chroma_h)
> @@ -84,7 +89,15 @@ typedef struct {
>      double angle;         ///< Angle of rotation
>      double zoom;          ///< Zoom percentage
>  } Transform;
> -
> +#if CONFIG_DESHAKE_OPENCL_FILTER
> +typedef struct {
> +    void *cl_inbuf;
> +    void *cl_outbuf;
> +    void *matrix_buf;
> +    void *matrix_buf2;
> +    KernelEnv kernelev;
> +}Deshake_opencl_ev;
> +#endif
>  typedef struct {
>      AVClass av_class;
>      AVFilterBufferRef *ref;    ///< Previous frame
> @@ -104,6 +117,9 @@ typedef struct {
>      int ch;
>      int cx;
>      int cy;
> +#if CONFIG_DESHAKE_OPENCL_FILTER
> +    Deshake_opencl_ev opencl_ev;
> +#endif
>  } DeshakeContext;
>  
>  static int cmp(const double *a, const double *b)
> @@ -536,6 +552,161 @@ static int filter_frame(AVFilterLink *link, AVFilterBufferRef *in)
>  
>      return ff_filter_frame(outlink, out);
>  }
> +#if CONFIG_DESHAKE_OPENCL_FILTER
> +static av_cold int init_opencl(AVFilterContext *ctx, const char *args)
> +{
> +    DeshakeContext *deshake = ctx->priv;
> +    init(ctx,args);
> +    memset(&(deshake->opencl_ev),0,sizeof(Deshake_opencl_ev));
> +    deshake->opencl_ev.cl_inbuf= NULL;
> +    deshake->opencl_ev.cl_outbuf = NULL;
> +    av_create_buffer(&(deshake->opencl_ev.matrix_buf), CL_MEM_READ_ONLY,6*sizeof(cl_float));
> +    av_create_buffer(&(deshake->opencl_ev.matrix_buf2),CL_MEM_READ_ONLY,6*sizeof(cl_float));
> +    if (ff_init_transform())
> +        return -1;
> +    return 0;
> +}
> +
> +static av_cold void uninit_opencl(AVFilterContext *ctx)
> +{
> +    DeshakeContext *deshake = ctx->priv;
> +    if (deshake->opencl_ev.cl_inbuf) {
> +        av_release_buffer(deshake->opencl_ev.cl_inbuf);
> +    }
> +    if (deshake->opencl_ev.cl_outbuf) {
> +        av_release_buffer(deshake->opencl_ev.cl_outbuf);
> +    }
> +    if (deshake->opencl_ev.matrix_buf) {
> +        av_release_buffer(deshake->opencl_ev.matrix_buf);
> +    }
> +    if (deshake->opencl_ev.matrix_buf2) {
> +        av_release_buffer(deshake->opencl_ev.matrix_buf2);
> +    }
> +    uninit(ctx);
> +}
> +
> +static int filter_frame_opencl(AVFilterLink *link, AVFilterBufferRef *in)
> +{
> +    DeshakeContext *deshake = link->dst->priv;
> +    AVFilterLink *outlink = link->dst->outputs[0];
> +    AVFilterBufferRef *out;
> +    Transform t = {{0},0}, orig = {{0},0};
> +    float alpha = 2.0 / deshake->refcount;
> +    char tmp[256];
> +    float matrixY[9];
> +    float matrixUV[9];
> +
> +    out = ff_get_video_buffer(outlink, AV_PERM_WRITE, outlink->w, outlink->h);
> +    if (!out) {
> +        avfilter_unref_bufferp(&in);
> +        return AVERROR(ENOMEM);
> +    }
> +    avfilter_copy_buffer_ref_props(out, in);
> +
> +    if (deshake->cx < 0 || deshake->cy < 0 || deshake->cw < 0 || deshake->ch < 0) {
> +        // Find the most likely global motion for the current frame
> +        find_motion(deshake, (deshake->ref == NULL) ? in->data[0] : deshake->ref->data[0], in->data[0], link->w, link->h, in->linesize[0], &t);
> +    } else {
> +        uint8_t *src1 = (deshake->ref == NULL) ? in->data[0] : deshake->ref->data[0];
> +        uint8_t *src2 = in->data[0];
> +
> +        deshake->cx = FFMIN(deshake->cx, link->w);
> +        deshake->cy = FFMIN(deshake->cy, link->h);
> +
> +        if ((unsigned)deshake->cx + (unsigned)deshake->cw > link->w) deshake->cw = link->w - deshake->cx;
> +        if ((unsigned)deshake->cy + (unsigned)deshake->ch > link->h) deshake->ch = link->h - deshake->cy;
> +
> +        // Quadword align right margin
> +        deshake->cw &= ~15;
> +
> +        src1 += deshake->cy * in->linesize[0] + deshake->cx;
> +        src2 += deshake->cy * in->linesize[0] + deshake->cx;
> +
> +        find_motion(deshake, src1, src2, deshake->cw, deshake->ch, in->linesize[0], &t);
> +    }
> +
> +
> +    // Copy transform so we can output it later to compare to the smoothed value
> +    orig.vector.x = t.vector.x;
> +    orig.vector.y = t.vector.y;
> +    orig.angle = t.angle;
> +    orig.zoom = t.zoom;
> +
> +    // Generate a one-sided moving exponential average
> +    deshake->avg.vector.x = alpha * t.vector.x + (1.0 - alpha) * deshake->avg.vector.x;
> +    deshake->avg.vector.y = alpha * t.vector.y + (1.0 - alpha) * deshake->avg.vector.y;
> +    deshake->avg.angle = alpha * t.angle + (1.0 - alpha) * deshake->avg.angle;
> +    deshake->avg.zoom = alpha * t.zoom + (1.0 - alpha) * deshake->avg.zoom;
> +
> +    // Remove the average from the current motion to detect the motion that
> +    // is not on purpose, just as jitter from bumping the camera
> +    t.vector.x -= deshake->avg.vector.x;
> +    t.vector.y -= deshake->avg.vector.y;
> +    t.angle -= deshake->avg.angle;
> +    t.zoom -= deshake->avg.zoom;
> +
> +    // Invert the motion to undo it
> +    t.vector.x *= -1;
> +    t.vector.y *= -1;
> +    t.angle *= -1;
> +
> +    // Write statistics to file
> +    if (deshake->fp) {
> +        snprintf(tmp, 256, "%f, %f, %f, %f, %f, %f, %f, %f, %f, %f, %f, %f\n", orig.vector.x, deshake->avg.vector.x, t.vector.x, orig.vector.y, deshake->avg.vector.y, t.vector.y, orig.angle, deshake->avg.angle, t.angle, orig.zoom, deshake->avg.zoom, t.zoom);
> +        fwrite(tmp, sizeof(char), strlen(tmp), deshake->fp);
> +    }
> +
> +    // Turn relative current frame motion into absolute by adding it to the
> +    // last absolute motion
> +    t.vector.x += deshake->last.vector.x;
> +    t.vector.y += deshake->last.vector.y;
> +    t.angle += deshake->last.angle;
> +    t.zoom += deshake->last.zoom;
> +
> +    // Shrink motion by 10% to keep things centered in the camera frame
> +    t.vector.x *= 0.9;
> +    t.vector.y *= 0.9;
> +    t.angle *= 0.9;
> +
> +    // Store the last absolute motion information
> +    deshake->last.vector.x = t.vector.x;
> +    deshake->last.vector.y = t.vector.y;
> +    deshake->last.angle = t.angle;
> +    deshake->last.zoom = t.zoom;
> +
> +    if (NULL == deshake->opencl_ev.cl_inbuf)
> +        if (av_create_buffer(&(deshake->opencl_ev.cl_inbuf),CL_MEM_READ_ONLY,(in->linesize[0] * in->video->h) + (in->linesize[1] * (in->video->h>>1)) + (in->linesize[2] * (in->video->h>>1)))) {
> +            av_log(NULL,AV_LOG_ERROR,"create opencl buffer\n");
> +            return -1;
> +        }
> +    if (NULL == deshake->opencl_ev.cl_outbuf)
> +        if (av_create_buffer(&(deshake->opencl_ev.cl_outbuf),CL_MEM_WRITE_ONLY,(out->linesize[0] * out->video->h) + (out->linesize[1] * (out->video->h>>1)) + (out->linesize[2] * (out->video->h>>1)))) {
> +            av_log(NULL,AV_LOG_ERROR,"create opencl buffer\n");
> +            return -1;
> +        }
> +    av_write_opencl_buffer(deshake->opencl_ev.cl_inbuf,in->data[0],in->data[1],in->data[2],in->linesize[0],in->linesize[1],in->linesize[2],link->h,0);
> +    avfilter_get_matrix(t.vector.x, t.vector.y, t.angle, 1.0 + t.zoom / 100.0, matrixY);
> +    avfilter_get_matrix(t.vector.x / (link->w / CHROMA_WIDTH(link)), t.vector.y / (link->h / CHROMA_HEIGHT(link)), t.angle, 1.0 + t.zoom / 100.0, matrixUV);
> +    avfilter_transform_cl(deshake->opencl_ev.cl_inbuf,deshake->opencl_ev.cl_outbuf,
> +                    in->linesize[0], out->linesize[0],
> +                    in->linesize[1], out->linesize[1],
> +                    link->w, link->h, CHROMA_WIDTH(link), CHROMA_HEIGHT(link),
> +                    matrixY, matrixUV,
> +                    deshake->opencl_ev.matrix_buf, deshake->opencl_ev.matrix_buf2,
> +                    INTERPOLATE_BILINEAR, deshake->edge,&(deshake->opencl_ev.kernelev));
> +    av_read_opencl_frame_buffer(deshake->opencl_ev.cl_outbuf,out->data[0],out->data[1],out->data[2],out->linesize[0],out->linesize[1],out->linesize[2],link->h);
> +
> +    // Cleanup the old reference frame
> +    avfilter_unref_buffer(deshake->ref);
> +
> +    // Store the current frame as the reference frame for calculating the
> +    // motion of the next frame
> +    deshake->ref = in;
> +
> +    return ff_filter_frame(outlink, out);
> +}
> +#endif
> +
>  
>  static const AVFilterPad deshake_inputs[] = {
>      {
> @@ -566,3 +737,39 @@ AVFilter avfilter_vf_deshake = {
>      .inputs        = deshake_inputs,
>      .outputs       = deshake_outputs,
>  };
> +
> +
> +
> +
> +#if CONFIG_DESHAKE_OPENCL_FILTER
> +
> +static const AVFilterPad deshake_opencl_inputs[] = {
> +    {
> +        .name         = "default",
> +        .type         = AVMEDIA_TYPE_VIDEO,
> +        .filter_frame = filter_frame_opencl,
> +        .config_props = config_props,
> +        .min_perms    = AV_PERM_READ | AV_PERM_PRESERVE,
> +    },
> +    { NULL }
> +};
> +
> +static const AVFilterPad deshake_opencl_outputs[] = {
> +    {
> +        .name = "default",
> +        .type = AVMEDIA_TYPE_VIDEO,
> +    },
> +    { NULL }
> +};
> +
> +AVFilter avfilter_vf_deshake_opencl = {
> +    .name          = "deshake_opencl",

> +    .description   = NULL_IF_CONFIG_SMALL("Stabilize shaky video OpenCL."),

... using OpenCL.

> +    .priv_size     = sizeof(DeshakeContext),
> +    .init          = init_opencl,
> +    .uninit        = uninit_opencl,
> +    .query_formats = query_formats,
> +    .inputs        = deshake_opencl_inputs,
> +    .outputs       = deshake_opencl_outputs,
> +};
> +#endif

> diff --git a/libavutil/Makefile b/libavutil/Makefile
> index 544c33f..307d2f8 100644
> --- a/libavutil/Makefile
> +++ b/libavutil/Makefile

Please split the patch in two, one for libavfilter, and one for
libavutil, that should simplify review.

> @@ -50,6 +50,8 @@ HEADERS = adler32.h                                                     \
>  
>  HEADERS-$(CONFIG_LZO)                   += lzo.h
>  
> +HEADERS-$(CONFIG_OPENCL)                += openclwrapper.h
> +
>  ARCH_HEADERS = bswap.h                                                  \
>                 intmath.h                                                \
>                 intreadwrite.h                                           \
> @@ -104,6 +106,8 @@ OBJS-$(CONFIG_LZO)                      += lzo.o
>  
>  OBJS += $(COMPAT_OBJS:%=../compat/%)
>  
> +OBJS-$(CONFIG_OPENCL)                   += openclwrapper.o
> +
>  SKIPHEADERS          = old_pix_fmts.h
>  
>  TESTPROGS = adler32                                                     \
> diff --git a/libavutil/openclwrapper.c b/libavutil/openclwrapper.c
> new file mode 100644
> index 0000000..f11aed6
> --- /dev/null
> +++ b/libavutil/openclwrapper.c
> @@ -0,0 +1,812 @@
> +/*
> + * Copyright (C) 2012 Peng Gao <peng at multicorewareinc.com>
> + * Copyright (C) 2012 Li   Cao <li at multicorewareinc.com>
> + * Copyright (C) 2012 Wei  Gao <weigao at multicorewareinc.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +

> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>

this should not be required

> +#include <windows.h>

That looks definitively non portable, should be avoided.

> +#include "openclwrapper.h"
> +#include "avstring.h"
> +#include "log.h"

> +#include "libavfilter/deshake_kernel.h"

you are not supposed to include a libavfilter header from libavutil
(which is autocontained and must not depend on other FFmpeg
libraries, not even for compilation).

> +
> +
> +

> +#if defined(__APPLE__)
> +#include <OpenCL/cl.h>
> +#else
> +#include <CL/cl.h>
> +#endif
> +

This should be fixed at the configure level, feel free to ask if you
don't know why.

> +#if defined(_MSC_VER)
> +#define strcasecmp strcmpi
> +#endif

we have a portable variant of strcasecmp, check in avstring.h.

> +
> +#define MAX_KERNEL_STRING_LEN   64
> +#define MAX_CLFILE_NUM 50
> +#define MAX_CLKERNEL_NUM 200
> +#define MAX_CLFILE_PATH 255
> +#define MAX_KERNEL_NUM  50
> +#define MAX_KERNEL_NAME_LEN 64
> +
> +#ifndef INVALID_HANDLE_VALUE
> +#define INVALID_HANDLE_VALUE NULL
> +#endif
> +
> +#define THREAD_PRIORITY_TIME_CRITICAL 15
> +
> +typedef struct _GPUEnv {

The "_" prefix is reserved, so should be avoided in
application/library code.

> +    //share vb in all modules in hb library
> +    cl_platform_id platform;

> +    cl_device_type dType;

style: camelCase for variable names should be avoided, device_type
should be good enough (and more explicative).

> +    cl_context context;
> +    cl_device_id * devices;
> +    cl_device_id  dev;
> +    cl_command_queue command_queue;
> +    cl_kernel kernels[MAX_CLFILE_NUM];
> +    cl_program programs[MAX_CLFILE_NUM]; //one program object maps one kernel source file
> +    char  kernelSrcFile[MAX_CLFILE_NUM][256];   //the max len of kernel file name is 256
> +    int file_count; // only one kernel file
> +
> +    char kernel_names[MAX_CLKERNEL_NUM][MAX_KERNEL_STRING_LEN+1];
> +    cl_kernel_function kernel_functions[MAX_CLKERNEL_NUM];
> +    int kernel_count;
> +    int isUserCreated; // 1: created , 0:no create and needed to create by opencl wrapper
> +    unsigned char *temp_buffer;
> +}GPUEnv;
> +
> +typedef struct {
> +    char kernelName[MAX_KERNEL_NAME_LEN+1];
> +    char * kernelStr;
> +}av_kernel_node;

Is this meant to be public? If yes you should move it to a public
header, otherwise avoid the "av_" prefix.

> +
> +static GPUEnv gpu_env = {0};;

double ";"

> +static int isInited = 0;
> +static av_kernel_node gKernels[MAX_KERNEL_NUM] = {{"", NULL}};
> +
> +#define ADD_KERNEL_CFG( idx, s, p ){\

> +        strcpy( gKernels[idx].kernelName, s );\
> +        gKernels[idx].kernelStr = p;\
> +        strcpy( gpu_env.kernel_names[idx], s );\

strcpy is potentially unsafe, and should be avoided in favor of
av_str* variants in avstring.h.

[...]
> +static int generat_bin_from_kernel_source(cl_program program, const char * cl_file_name)
> +{
> +    int i = 0;
> +    cl_int status;
> +    size_t *binarySizes, numDevices;
> +    cl_device_id *devices;
> +    char **binaries;
> +    char *str = NULL;
> +
> +    status = clGetProgramInfo(program,
> +                              CL_PROGRAM_NUM_DEVICES,
> +                              sizeof(numDevices),
> +                              &numDevices,
> +                              NULL);
> +    if (status != CL_SUCCESS) {
> +        av_log(NULL,AV_LOG_ERROR,"generat_bin_from_kernel_source:clGetProgramInfo error,status = %d\n",status);
> +        return -1;
> +    }
> +    devices = (cl_device_id*)av_malloc(sizeof(cl_device_id) * numDevices);
> +    if( devices == NULL )
> +        return -1;
> +    /* grab the handles to all of the devices in the program. */
> +    status = clGetProgramInfo(program,
> +                              CL_PROGRAM_DEVICES,
> +                              sizeof(cl_device_id) * numDevices,
> +                              devices,
> +                              NULL);
> +
> +    /* figure out the sizes of each of the binaries. */
> +    binarySizes = (size_t*)av_malloc(sizeof(size_t) * numDevices);
> +
> +    status = clGetProgramInfo(program,
> +                              CL_PROGRAM_BINARY_SIZES,
> +                              sizeof(size_t) * numDevices,
> +                              binarySizes, NULL);

> +    if (status != CL_SUCCESS) {
> +        av_log(NULL,AV_LOG_ERROR,"generat_bin_from_kernel_source:clGetProgramInfo error,status = %d\n",status);
> +        return -1;
> +    }

About logging: av_log(NULL, ...) should be avoided in the lib, you should
provide a logging context to the library.

Also it is good practice to return an AVERROR code, this could be for
example AVERROR_EXTERNAL (libavutil/error.h) or AVERROR(errno) if the
function sets errno.

> +    /* copy over all of the generated binaries. */
> +    binaries = (char**)av_malloc(sizeof(char *) * numDevices);

> +    if(binaries == NULL)

Nit++: if_(!binaries)

is more customary

> +        return -1;

AVERROR(ENOMEM);

> +
> +    for (i = 0; i < numDevices; i++) {
> +        if (binarySizes[i] != 0) {

> +            binaries[i] = (char*)av_malloc(sizeof(char) * binarySizes[i]);
> +            if(binaries[i] == NULL)
> +                return -1;

ditto

> +        } else
> +            binaries[i] = NULL;

you can memset this to 0 before looping and avoid the branching logic.

> +    }
> +
> +    status = clGetProgramInfo(program,
> +                              CL_PROGRAM_BINARIES,
> +                              sizeof(char *) * numDevices,
> +                              binaries,
> +                              NULL);
> +    if (status != CL_SUCCESS) {
> +        av_log(NULL,AV_LOG_ERROR,"generat_bin_from_kernel_source:clGetProgramInfo error,status = %d\n",status);
> +        return -1;
> +    }
> +    /* dump out each binary into its own separate file. */
> +    for (i = 0; i < numDevices; i++) {
> +        char fileName[256] = {0};
> +        char cl_name[128] = {0};
> +        if (binarySizes[i] != 0) {
> +            char deviceName[1024];
> +            status = clGetDeviceInfo(devices[i],
> +                                     CL_DEVICE_NAME,
> +                                     sizeof(deviceName),
> +                                     deviceName,
> +                                     NULL);
> +            if (status != CL_SUCCESS) {
> +                av_log(NULL,AV_LOG_ERROR,"generat_bin_from_kernel_source:clGetDeviceInfo error,status = %d\n",status);
> +                return -1;
> +            }
> +            str = (char*)strstr(cl_file_name, (char*)".cl");
> +            memcpy(cl_name, cl_file_name, str - cl_file_name);
> +            cl_name[str - cl_file_name] = '\0';
> +            snprintf(fileName,sizeof(fileName), "./%s-%s.bin", cl_name, deviceName);
> +
> +            if (!write_binary_to_file( fileName, binaries[i], binarySizes[i])) {
> +                av_log(NULL,AV_LOG_ERROR,"generat_bin_from_kernel_source:clGetDeviceInfo write_binary_to_file\n");
> +                return -1;
> +            }
> +        }
> +    }
> +

> +    // Release all resouces and memory
> +    for (i = 0;i < numDevices;i++ ) {

> +        if (binaries[i] != NULL) {
> +            av_free(binaries[i]);
> +            binaries[i] = NULL;

This is equivalent to:
av_freep(&binaries[i]);

> +        }
> +    }
> +

> +    if (binaries != NULL) {
> +        av_free(binaries);
> +        binaries = NULL;
> +    }
> +
> +    if (binarySizes != NULL) {
> +        av_free(binarySizes);
> +        binarySizes = NULL;
> +    }
> +
> +    if (devices != NULL) {
> +        av_free(devices);
> +        devices = NULL;

av_freep

[...]
> diff --git a/libavutil/openclwrapper.h b/libavutil/openclwrapper.h
> new file mode 100644
> index 0000000..23b8cfb
> --- /dev/null
> +++ b/libavutil/openclwrapper.h
> @@ -0,0 +1,202 @@
> +/*
> + * Copyright (C) 2012 Peng Gao <peng at multicorewareinc.com>
> + * Copyright (C) 2012 Li   Cao <li at multicorewareinc.com>
> + * Copyright (C) 2012 Wei  Gao <weigao at multicorewareinc.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "config.h"
> +
> +#ifndef __OPENCL_WRAPPER_H
> +#define __OPENCL_WRAPPER_H

LIBAVUTIL_OPENCLWRAPPER_H

> +
> +#include <CL/cl.h>
> +#include <CL/cl_ext.h>
> +
> +#define CL_QUEUE_THREAD_HANDLE_AMD 0x403E
> +#define CL_MAP_WRITE_INVALIDATE_REGION (1 << 2)

public prefixes should contain AV_ at the beginning.

> +
> +#define OCLCHECK( method, ... )\
> +    status = method( __VA_ARGS__ ); if( status != CL_SUCCESS ) {\
> +        av_log(NULL,AV_LOG_ERROR, " error %s %d\n", # method, status );  return status; }
> +
> +#define CL_FREE( buf )\
> +{\
> +    if( buf )\
> +    {\
> +        if( is_clflag ) { clReleaseMemObject( buf ); } else {av_free( buf ); }\
> +        buf = NULL;\
> +    }\
> +}
> +
> +#define CREATEBUF( out, flags, size )\
> +    out = clCreateBuffer( kenv->context, (flags), (size), NULL, &status );\
> +    if( status != CL_SUCCESS ) { av_log(NULL,AV_LOG_ERROR, "clCreateBuffer error '%d'\n", status ); return -1; }
> +
> +#define KERNEL( ... )# __VA_ARGS__
> +

> +typedef struct _KernelEnv
> +{
> +    cl_context context;
> +    cl_command_queue command_queue;
> +    cl_program program;
> +    cl_kernel kernel;
> +    char kernel_name[150];
> +}KernelEnv;

Ditto about "_" and public prefixes, here and below.

> +
> +typedef struct _OpenCLEnv
> +{
> +    cl_platform_id platform;
> +    cl_context   context;
> +    cl_device_id devices;
> +    cl_command_queue command_queue;
> +}OpenCLEnv;
> +
> +typedef struct _ExtOpenCLInfo
> +{
> +    cl_platform_id platform;
> +    cl_device_type dType;
> +    cl_context context;
> +    cl_device_id * devices;
> +    cl_device_id  dev;
> +    cl_command_queue command_queue;
> +    char *platformName;
> +}ExtOpenCLInfo;
> +
> +
> +/**
> + * user defined, this is function wrapper which is used to set the input parameters.

> + * luanch kernel and copy data from GPU to CPU or CPU to GPU.

typo

> + */
> +
> +typedef int (*cl_kernel_function)(void **userdata, KernelEnv *kenv);

> +/**
> + * registe a wapper for running the kernel specified by the kernel name.

Register a wrapper

> + *
> + */
> +
> +int av_register_kernel_wrapper(const char *kernel_name, cl_kernel_function function);

> +/**
> + *run kernel , user call this function to luanch kernel.

Run ... launch

> + *kernel_name: this kernel name is used to find the kernel in opencl runtime environment.
> + *userdata: this userdata is the all parameters for running the kernel specified by kernel name
> + */

also please use usual doxygen @param syntax.

> +int av_run_kernel(const char *kernel_name, void **userdata);
> +int av_init_opencl_run_env(int argc, char **argv, const char *build_option,void *extOpenCLInfo);
> +int av_release_opencl_run_env(void);
> +int av_opencl_stats(void);
> +int av_init_opencl_attr(OpenCLEnv * env);
> +int av_create_kernel(const char * kernelname, KernelEnv * env);
> +int av_release_kernel(KernelEnv * env);
> +int av_get_kernel_env(KernelEnv *env);
> +int av_create_buffer(void **cl_Buf, int flags, int size);
> +int av_read_opencl_buffer(void *cl_inBuf, unsigned char *outbuf, int size);
> +int av_write_opencl_buffer(void *cl_inBuf, unsigned char *Ybuf, unsigned char *Ubuf, unsigned char *Vbuf, int linesize0, int linesize1, int linesize2, int height, int offset);
> +cl_device_id av_get_device_id(void);
> +cl_context av_get_context(void);
> +cl_command_queue av_get_command_queue(void);
> +void av_release_buffer(void *cl_Buf);
> +int av_read_opencl_frame_buffer(void *cl_inBuf, unsigned char *Ybuf, unsigned char *Ubuf, unsigned char *Vbuf, int linesize0, int linesize1, int linesize2, int height);

I suggest a common av_cl_ or av_opencl_ prefix for all the opencl
functions, this is the usual convention adopted for public
interfaces.

[...]
-- 
FFmpeg = Fast and Forgiving MultiPurpose Eager Gem
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel at ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


More information about the ffmpeg-devel mailing list