[FFmpeg-devel] [PATCH 1/2] lavfi/transpose: support slice threading
Michael Niedermayer
michaelni at gmx.at
Fri Aug 16 02:25:13 CEST 2013
On Thu, Aug 15, 2013 at 11:07:55PM +0000, Paul B Mahol wrote:
> On 8/15/13, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Wed, Aug 14, 2013 at 09:39:32PM +0000, Paul B Mahol wrote:
> >> Signed-off-by: Paul B Mahol <onemda at gmail.com>
> >> ---
> >> libavfilter/vf_transpose.c | 72
> >> ++++++++++++++++++++++++++++++----------------
> >> 1 file changed, 47 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/libavfilter/vf_transpose.c b/libavfilter/vf_transpose.c
> >> index 3ee9c6d..82f68e5 100644
> >> --- a/libavfilter/vf_transpose.c
> >> +++ b/libavfilter/vf_transpose.c
> >> @@ -133,31 +133,19 @@ static AVFrame *get_video_buffer(AVFilterLink
> >> *inlink, int w, int h)
> >> ff_default_get_video_buffer(inlink, w, h);
> >> }
> >>
> >> -static int filter_frame(AVFilterLink *inlink, AVFrame *in)
> >> +typedef struct ThreadData {
> >> + AVFrame *in, *out;
> >> +} ThreadData;
> >> +
> >> +static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr,
> >> + int nb_jobs)
> >> {
> >> - TransContext *trans = inlink->dst->priv;
> >> - AVFilterLink *outlink = inlink->dst->outputs[0];
> >> - AVFrame *out;
> >> + TransContext *trans = ctx->priv;
> >> + ThreadData *td = arg;
> >> + AVFrame *out = td->out;
> >> + AVFrame *in = td->in;
> >> int plane;
> >>
> >> - if (trans->passthrough)
> >> - return ff_filter_frame(outlink, in);
> >> -
> >> - out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> >> - if (!out) {
> >> - av_frame_free(&in);
> >> - return AVERROR(ENOMEM);
> >> - }
> >> -
> >> - out->pts = in->pts;
> >> -
> >> - if (in->sample_aspect_ratio.num == 0) {
> >> - out->sample_aspect_ratio = in->sample_aspect_ratio;
> >> - } else {
> >> - out->sample_aspect_ratio.num = in->sample_aspect_ratio.den;
> >> - out->sample_aspect_ratio.den = in->sample_aspect_ratio.num;
> >> - }
> >> -
> >> for (plane = 0; out->data[plane]; plane++) {
> >> int hsub = plane == 1 || plane == 2 ? trans->hsub : 0;
> >> int vsub = plane == 1 || plane == 2 ? trans->vsub : 0;
> >> @@ -165,12 +153,14 @@ static int filter_frame(AVFilterLink *inlink,
> >> AVFrame *in)
> >> int inh = in->height >> vsub;
> >> int outw = FF_CEIL_RSHIFT(out->width, hsub);
> >> int outh = FF_CEIL_RSHIFT(out->height, vsub);
> >> + int start = (outh * jobnr ) / nb_jobs;
> >> + int end = (outh * (jobnr+1)) / nb_jobs;
> >
> > squares should be faster than long thin rectangles
> > (this should be also true for the single thread case)
>
> Sorry this does not make any sense to me.
> If you got idea how to do it better than either go and do it or say
> exactly what should be done and why.
consider a 1024x1024 image, if you transpose it line wise either
input or output will be accessing pixels along one column
each of these accessed bytes will cause a cache line to be read,
(64byte for example) so after processing 1024 pixels
1024 + 64*1024 byte would be in the cache, the L1 data cache of most
cpus is probably smaller than that so you might end up with 50-100%
L1 cache misses
transposing a 32x32 or maybe 64x64 byte block OTOH should fit nicely
in the L1 cache
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130816/9a5894dc/attachment.asc>
More information about the ffmpeg-devel
mailing list