[FFmpeg-devel] [PATCH 1/2] lavfi/transpose: support slice threading

Fri Aug 16 02:25:13 CEST 2013

On Thu, Aug 15, 2013 at 11:07:55PM +0000, Paul B Mahol wrote:
> On 8/15/13, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Wed, Aug 14, 2013 at 09:39:32PM +0000, Paul B Mahol wrote:
> >> Signed-off-by: Paul B Mahol <onemda at gmail.com>
> >> ---
> >>  libavfilter/vf_transpose.c | 72
> >> ++++++++++++++++++++++++++++++----------------
> >>  1 file changed, 47 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/libavfilter/vf_transpose.c b/libavfilter/vf_transpose.c
> >> index 3ee9c6d..82f68e5 100644
> >> --- a/libavfilter/vf_transpose.c
> >> +++ b/libavfilter/vf_transpose.c
> >> @@ -133,31 +133,19 @@ static AVFrame *get_video_buffer(AVFilterLink
> >> *inlink, int w, int h)
> >>          ff_default_get_video_buffer(inlink, w, h);
> >>  }
> >>
> >> -static int filter_frame(AVFilterLink *inlink, AVFrame *in)
> >> +typedef struct ThreadData {
> >> +    AVFrame *in, *out;
> >> +} ThreadData;
> >> +
> >> +static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr,
> >> +                        int nb_jobs)
> >>  {
> >> -    TransContext *trans = inlink->dst->priv;
> >> -    AVFilterLink *outlink = inlink->dst->outputs[0];
> >> -    AVFrame *out;
> >> +    TransContext *trans = ctx->priv;
> >> +    ThreadData *td = arg;
> >> +    AVFrame *out = td->out;
> >> +    AVFrame *in = td->in;
> >>      int plane;
> >>
> >> -    if (trans->passthrough)
> >> -        return ff_filter_frame(outlink, in);
> >> -
> >> -    out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> >> -    if (!out) {
> >> -        av_frame_free(&in);
> >> -        return AVERROR(ENOMEM);
> >> -    }
> >> -
> >> -    out->pts = in->pts;
> >> -
> >> -    if (in->sample_aspect_ratio.num == 0) {
> >> -        out->sample_aspect_ratio = in->sample_aspect_ratio;
> >> -    } else {
> >> -        out->sample_aspect_ratio.num = in->sample_aspect_ratio.den;
> >> -        out->sample_aspect_ratio.den = in->sample_aspect_ratio.num;
> >> -    }
> >> -
> >>      for (plane = 0; out->data[plane]; plane++) {
> >>          int hsub = plane == 1 || plane == 2 ? trans->hsub : 0;
> >>          int vsub = plane == 1 || plane == 2 ? trans->vsub : 0;
> >> @@ -165,12 +153,14 @@ static int filter_frame(AVFilterLink *inlink,
> >> AVFrame *in)
> >>          int inh  = in->height  >> vsub;
> >>          int outw = FF_CEIL_RSHIFT(out->width,  hsub);
> >>          int outh = FF_CEIL_RSHIFT(out->height, vsub);
> >> +        int start = (outh *  jobnr   ) / nb_jobs;
> >> +        int end   = (outh * (jobnr+1)) / nb_jobs;
> >
> > squares should be faster than long thin rectangles
> > (this should be also true for the single thread case)
> 
> Sorry this does not make any sense to me.
> If you got idea how to do it better than either go and do it or say
> exactly what should be done and why.

consider a 1024x1024 image, if you transpose it line wise either
input or output will be accessing pixels along one column
each of these accessed bytes will cause a cache line to be read,
(64byte for example) so after processing 1024 pixels
1024 + 64*1024 byte would be in the cache, the L1 data cache of most
cpus is probably smaller than that so you might end up with 50-100%
L1 cache misses

transposing a 32x32 or maybe 64x64 byte block OTOH should fit nicely
in the L1 cache


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130816/9a5894dc/attachment.asc>