[FFmpeg-devel] [PATCH 2/2] Add hflip filter.

Ronald S. Bultje rsbultje
Thu Aug 12 18:49:25 CEST 2010


Hi,

On Thu, Aug 12, 2010 at 12:39 PM, Stefano Sabatini
<stefano.sabatini-lala at poste.it> wrote:
> On date Wednesday 2010-08-04 14:23:49 +0200, Michael Niedermayer encoded:
>> On Sat, Jul 31, 2010 at 02:07:29AM +0200, Stefano Sabatini wrote:
> [...]
>> > +static void draw_slice(AVFilterLink *inlink, int y, int h, int slice_dir)
>> > +{
>> > + ? ?FlipContext *flip = inlink->dst->priv;
>> > + ? ?AVFilterPicRef *inpic ?= inlink->cur_pic;
>> > + ? ?AVFilterPicRef *outpic = inlink->dst->outputs[0]->outpic;
>> > + ? ?uint8_t *inrow, *outrow;
>> > + ? ?int i, j, plane, step, hsub, vsub;
>> > +
>> > + ? ?for (plane = 0; plane < 4 && inpic->data[plane]; plane++) {
>> > + ? ? ? ?step = flip->max_step[plane];
>> > + ? ? ? ?hsub = (plane == 1 || plane == 2) ? flip->hsub : 0;
>> > + ? ? ? ?vsub = (plane == 1 || plane == 2) ? flip->vsub : 0;
>> > +
>> > + ? ? ? ?outrow = outpic->data[plane] + (y>>vsub) * outpic->linesize[plane];
>> > + ? ? ? ?inrow ?= inpic ->data[plane] + (y>>vsub) * inpic ->linesize[plane] + ((inlink->w >> hsub) - 1) * step;
>> > + ? ? ? ?for (i = 0; i < h>>vsub; i++) {
>> > + ? ? ? ? ? ?for (j = 0; j < (inlink->w >> hsub); j++)
>> > + ? ? ? ? ? ? ? ?memcpy(outrow + j*step, inrow - j*step, step);
>>
>> variable length memcpy on a per pixel base is slow
>
> Updated.
>
> I didn't manage to understand how bswap/dsputils may be used, I don't
> know if that would improve it.

You could create a VideoFilterDSPContext (or a
HFlipVideoFilterDSPContext), add a function hflip to it, and then any
one of us could optimize it. E.g. for RGBA32, where step is probably
4, we would read it as 8/16-bytes-at-once, flip them using e.g. pshufw
or something, (do the same for the opposite pixels at the end of the
row, ) and then write them out again -> you just did 2x 2/4 pixels at
once. By using multiple registries and making sure there's enough
padding (which I think is always the case), this'd get even faster,
also because for at least the left read/write, we can use aligned r/w
which is faster.

Not sure if that's what Michael meant, but I guess it's sort of in the
right direction.

Ronald



More information about the ffmpeg-devel mailing list