[FFmpeg-devel] [PATCH] vf_overlay: add support to RGBA packed input and output

Sat Oct 29 18:33:52 CEST 2011

On Sat, Oct 29, 2011 at 05:50:37PM +0200, Michael Niedermayer wrote:
> On Sat, Oct 29, 2011 at 04:33:59PM +0100, Mark Himsley wrote:
> > On 29/10/2011 03:10, Michael Niedermayer wrote:
> > >On Sat, Oct 29, 2011 at 12:56:15AM +0200, Stefano Sabatini wrote:
> > >>On date Thursday 2011-10-27 01:01:40 +0200, Michael Niedermayer encoded:
> > 
> > [...]
> > 
> > >>the original code looked like this:
> > >>>>  -                d[r] = (d[r] * (0xff - s[3]) + s[0] * s[3] + 128)>>  8;
> > >>>>  -                d[1] = (d[1] * (0xff - s[3]) + s[1] * s[3] + 128)>>  8;
> > >>>>  -                d[b] = (d[b] * (0xff - s[3]) + s[2] * s[3] + 128)>>  8;
> > >>when i saw what you replaced it by i was ... scared ;)
> > >>
> > >>if and switch are added in the innermost loop
> > >>constants are replaced by variables
> > >>variables are replaced by reading out of arrays from structures
> > >>a division is added
> > >>
> > >>all this make the code significantly slower
> > 
> > That is not correct.
> > 
> > Please correct me if I am wrong, but the code you quoted current can
> > not be executed, because currently the overlay filter only outputs
> > PIX_FMT_YUV420P, and the section you quoted can only be executed if
> > the destination filter has negotiated PIX_FMT_BGR24 ||
> > PIX_FMT_RGB24.
> > 
> > Further, I believe I added significant speed increases compared to
> > the previous (unused) implementation.
> > 
> > An example of a speed improvement is the switch statement. Where as
> > the previous implementation always multiplied every pixel, in my
> > implementation; if the key channel is zero or the key channel is 255
> > then no multiplication happens. For many real-world use-cases, such
> > as keying a bug over a video, this is of large benefit - speeding up
> > such use-cases by 15% or more.
> 
> if you have large areas of 0 or 255 it will be faster to detect them
> in larger blocks like checking aligned 32 pr 64bit words to be all 255
> or all 0.
> this also makes it more friendly to SIMD optimization which alone
> can make teh code 4+ times faster.
> Also making sure width/height of the overlay is minimal should
> help.
> 
> 
> > 
> > Of cause, if further optimisations can be applied that's great, but
> > since the RGB workflow is not used currently I hope you can accept
> > additional functionality even if it is not 100% optimised.
> 
> Thats a misunderstanding here somewhere. Iam very happy to accpet the
> new functionality, i am unhappy about the included optimization because
> if i want to optimize this further i first have to reverse engeneer
> and undo this optimization

if there simply was a unoptimized variant under #if 0 as reference
for SIMD optimization (for example)
this would resolve my concern

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Let us carefully observe those good qualities wherein our enemies excel us
and endeavor to excel them, by avoiding what is faulty, and imitating what
is excellent in them. -- Plutarch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111029/300b8e7c/attachment.asc>