[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output

Stefano Sabatini stefasab at gmail.com
Sat Oct 29 00:56:15 CEST 2011


On date Thursday 2011-10-27 01:01:40 +0200, Michael Niedermayer encoded:
> On Thu, Oct 27, 2011 at 12:25:43AM +0200, Stefano Sabatini wrote:
> > From 72b3c79a550961b3e215e5f1e6d42da3c362751e Mon Sep 17 00:00:00 2001
> > From: Stefano Sabatini <stefasab at gmail.com>
> > Date: Mon, 24 Oct 2011 20:00:21 +0200
> > Subject: [PATCH] vf_overlay: add support to RGB packed input and output
> > 
> > Also add support to alpha pre-multiplication in the RGBA path.
> > 
> > Based on the work of Mark Himsley <mark at mdsh.com>.
> > 
> > See thread:
> > Subject: [FFmpeg-devel] libavfilter: extending overlay filter
> > Date: Sun, 13 Mar 2011 14:18:42 +0000
> > ---
> >  doc/filters.texi         |   15 +++++-
> >  libavfilter/vf_overlay.c |  134 +++++++++++++++++++++++++++++++++++++++------
> >  2 files changed, 129 insertions(+), 20 deletions(-)
[...]
> >          for (i = 0; i < height; i++) {
> >              uint8_t *d = dp, *s = sp;
> >              for (j = 0; j < width; j++) {
> 
> > -                d[r] = (d[r] * (0xff - s[3]) + s[0] * s[3] + 128) >> 8;
> > -                d[1] = (d[1] * (0xff - s[3]) + s[1] * s[3] + 128) >> 8;
> > -                d[b] = (d[b] * (0xff - s[3]) + s[2] * s[3] + 128) >> 8;
> > -                d += 3;
> > -                s += 4;
> > +                // compute the blend multiplication of overlay over the main
> > +                alpha = s[over->overlay_rgba_map[A]];
> > +                // if the main channel has an alpha channel, alpha has to be calculated
> > +                // to create an un-premultiplied (straight) alpha value
> > +                if (over->main_has_alpha) {
> > +                    // apply the general equation:
> > +                    // alpha = alpha_overlay / ((alpha_main + alpha_overlay) - alpha_main * alpha_overlay)
> > +                    //
> > +                    // if alpha_main = 0 => alpha = 0
> > +                    // if alpha_main = 1 => alpha = alpha_overlay
> > +                    switch (alpha) {
> > +                        case 0:
> > +                        case 0xff:
> > +                            break;
> > +                        default:
> > +                            // the un-premultiplied calculation is:
> > +                            // (255 * 255 * overlay_alpha) / ( 255 * (overlay_alpha + main_alpha) - (overlay_alpha * main_alpha) )
> > +                            alpha =
> > +                            // the next line is a faster version of:  255 * 255 * alpha
> > +                                ( (alpha << 16) - (alpha << 9) + alpha )
> > +                                / (
> > +                            // the next line is a faster version of: 255 * (blend + d[over->inout_rgba_map[A]])
> > +                                    ((alpha + d[over->main_rgba_map[A]]) << 8 ) - (alpha + d[over->main_rgba_map[A]])
> > +                                    - d[over->main_rgba_map[A]] * alpha
> > +                                );
> > +                    }
> > +                }
> > +                switch (alpha) {
> > +                    case 0:
> > +                        break;
> > +                    case 0xff:
> > +                        d[over->main_rgba_map[R]] = s[over->overlay_rgba_map[R]];
> > +                        d[over->main_rgba_map[G]] = s[over->overlay_rgba_map[G]];
> > +                        d[over->main_rgba_map[B]] = s[over->overlay_rgba_map[B]];
> > +                        break;
> > +                    default:
> > +                        d[over->main_rgba_map[R]] = (d[over->main_rgba_map[R]] * (255 - alpha) + s[over->overlay_rgba_map[R]] * alpha) / 255;
> > +                        d[over->main_rgba_map[G]] = (d[over->main_rgba_map[G]] * (255 - alpha) + s[over->overlay_rgba_map[G]] * alpha) / 255;
> > +                        d[over->main_rgba_map[B]] = (d[over->main_rgba_map[B]] * (255 - alpha) + s[over->overlay_rgba_map[B]] * alpha) / 255;
> > +                }
> > +                if (over->main_has_alpha) {
> > +                    switch (alpha) {
> > +                    case 0:
> > +                        break;
> > +                    case 0xff:
> > +                        d[over->main_rgba_map[A]] = s[over->overlay_rgba_map[A]];
> > +                        break;
> > +                    default:
> > +                        d[over->main_rgba_map[A]] = (
> > +                            (d[over->main_rgba_map[A]] << 8) + (0x100 - d[over->main_rgba_map[A]]) * s[over->overlay_rgba_map[A]]
> > +                        ) >> 8;
> > +                    }
> > +                }
> 

> please benchmark this with START/STOP_TIMER against the previous code

RGB path was disabled before this one, I split the present patch and
did some tests.

* Test with no alpha in the main input

before alpha premultiplication
1287135 dezicycles in first, 2 runs, 0 skips
1335442 dezicycles in first, 4 runs, 0 skips
1245555 dezicycles in first, 8 runs, 0 skips
1162359 dezicycles in first, 16 runs, 0 skips
1144390 dezicycles in first, 32 runs, 0 skips
1134602 dezicycles in first, 64 runs, 0 skips
1133281 dezicycles in first, 128 runs, 0 skips
1114852 dezicycles in first, 256 runs, 0 skips
1108999 dezicycles in first, 512 runs, 0 skips
1101536 dezicycles in first, 1024 runs, 0 skips
1096821 dezicycles in first, 2048 runs, 0 skips
1090508 dezicycles in first, 4096 runs, 0 skips
1085896 dezicycles in first, 8192 runs, 0 skips
1084802 dezicycles in first, 16384 runs, 0 skips
1083604 dezicycles in first, 32768 runs, 0 skips

after alpha premultiplication
1224390 dezicycles in second, 2 runs, 0 skips
1202235 dezicycles in second, 4 runs, 0 skips
1191453 dezicycles in second, 8 runs, 0 skips
1183031 dezicycles in second, 16 runs, 0 skips
1230087 dezicycles in second, 32 runs, 0 skips
1227492 dezicycles in second, 64 runs, 0 skips
1230488 dezicycles in second, 128 runs, 0 skips
1215128 dezicycles in second, 256 runs, 0 skips
1207364 dezicycles in second, 512 runs, 0 skips
1199813 dezicycles in second, 1024 runs, 0 skips
1195857 dezicycles in second, 2048 runs, 0 skips
1193954 dezicycles in second, 4096 runs, 0 skips
1194128 dezicycles in second, 8192 runs, 0 skips
1187481 dezicycles in second, 16384 runs, 0 skips
1181874 dezicycles in second, 32768 runs, 0 skips

* Test with alpha in the main input:
28684935 dezicycles in first, 2 runs, 0 skips
28553902 dezicycles in first, 4 runs, 0 skips
28776015 dezicycles in first, 8 runs, 0 skips
29073680 dezicycles in first, 16 runs, 0 skips
28816918 dezicycles in first, 32 runs, 0 skips
28908704 dezicycles in first, 64 runs, 0 skips
28745401 dezicycles in first, 128 runs, 0 skips
28614980 dezicycles in first, 256 runs, 0 skips
28609710 dezicycles in first, 512 runs, 0 skips
28537037 dezicycles in first, 1024 runs, 0 skips
28517850 dezicycles in first, 2048 runs, 0 skips
28466515 dezicycles in first, 4096 runs, 0 skips
28438388 dezicycles in first, 8192 runs, 0 skips
28440383 dezicycles in first, 16384 runs, 0 skips
28426314 dezicycles in first, 32768 runs, 0 skips

33347880 dezicycles in second, 2 runs, 0 skips
33131272 dezicycles in second, 4 runs, 0 skips
38018970 dezicycles in second, 8 runs, 0 skips
48715928 dezicycles in second, 16 runs, 0 skips
44290285 dezicycles in second, 32 runs, 0 skips
43696766 dezicycles in second, 64 runs, 0 skips
38599173 dezicycles in second, 128 runs, 0 skips
36112571 dezicycles in second, 256 runs, 0 skips
34737837 dezicycles in second, 512 runs, 0 skips
34066213 dezicycles in second, 1024 runs, 0 skips
33640178 dezicycles in second, 2048 runs, 0 skips
33368757 dezicycles in second, 4096 runs, 0 skips
33233522 dezicycles in second, 8192 runs, 0 skips
33132908 dezicycles in second, 16384 runs, 0 skips
33062949 dezicycles in second, 32768 runs, 0 skips

Results are as expected, alpha pre-multiplication is significantly
slower but it may also be what the user wants, so I could make it
optional (and preserve the original alpha?, enabled by default?).
-- 
FFmpeg = Fabulous Fancy Magnificient Practical Ecstatic Gadget
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-vf_overlay-use-opt.h-API-for-setting-options.patch
Type: text/x-diff
Size: 3031 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111029/405ba59a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-vf_overlay-enable-RGB-path.patch
Type: text/x-diff
Size: 9686 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111029/405ba59a/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-vf_overlay-add-support-to-alpha-pre-multiplication-i.patch
Type: text/x-diff
Size: 3682 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111029/405ba59a/attachment-0002.bin>


More information about the ffmpeg-devel mailing list