[FFmpeg-devel] what is h264_idct_add8()?

Michael Niedermayer michaelni
Mon Sep 13 23:49:07 CEST 2010


On Mon, Sep 13, 2010 at 11:26:22PM +0200, Michael Niedermayer wrote:
> On Sun, Sep 12, 2010 at 08:24:45PM -0400, Ronald S. Bultje wrote:
> > Hi,
> > 
> > On Sun, Sep 12, 2010 at 8:26 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > On Fri, Sep 10, 2010 at 09:48:53PM -0400, Ronald S. Bultje wrote:
> > >> On Mon, Sep 6, 2010 at 4:32 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > >> > On Mon, Sep 06, 2010 at 12:33:13PM -0400, Ronald S. Bultje wrote:
> > >> > [...]
> > >> >> Michael, do you still have the patch that enables using idct_add8()
> > >> >> for chroma (probably in h264.c) so I can test it performance of
> > >> >> yasmified idct_add8 against the current code that doesn't use
> > >> >> idct_add8()?
> > >> >
> > >> > i tried a bit of find and grep but it seems iam not looking at the right
> > >> > place or not searching for the right thing
> > >>
> > >> So what do you suggest we do?
> > >> a) remove the idct_add8() functions from H264DSPContext
> > >> b) leave as-is (because I can't test the my yasm conversion is correct)
> > >> c) convert it to yasm along with the rest, hope that it is correct
> > >> without testing (?)
> > >> d) something else?
> > >>
> > >> (A) is easiest, but (C) may have some benefit if I decide to test the
> > >> performance benefit in the future with the yasmified version. (B)
> > >> means duplication of code and thus sounds like a bad plan...
> > >
> > > iam against a, i dont care about the rest, mans suggestion is possible too but
> > > seems much work
> > 
> > I appear to waste too much time on this already, so let's get this
> > over with. I only did a single measure because the difference is quite
> > strong (the reason is obviously MMX vs SSE2, along with what you did
> > earlier to not have to call a vfunc 8 times)
> > 
> > Current SVN:
> > 1838 dezicycles in chroma idct add8, 262111 runs, 33 skips
> > 
> > Using add8 (see attached patch):
> > 1745 dezicycles in chroma idct add8, 262124 runs, 20 skips
> > 
> > add8, SSE2:
> > 1264 dezicycles in chroma idct add8, 262106 runs, 38 skips
> > 
> > My recommendation: we should apply this (along with the rest of my
> > yasmification).
> > 
> > The rest of the yasmification patch is attached and will have to be
> > applied with it. I can in all honesty (I measured them all, bleh) say
> > that no single function is slower in yasm at this point, although that
> > took a good hack in h264_idct_add16_sse2() (somehow the unroll of the
> > loop plus inlining of scan8[] makes it a good 20% faster - right now
> > it's 10 cycles faster than the gcc one, but the not-unrolled one was
> > 20-25% slower than gcc (which unrolls it too)).
> > 
> > Many (+/- half of the) functions are a few (5-30) cycles faster in
> > yasm, the other half is approximately equal speed. The speedups are
> > generally in functions where gcc screws up loop conditionals (e.g. for
> > (x=0;<16;x++) { if (a || b) { .. } }, which it performs horribly at by
> > creating something like if (!a1) goto end1; { yes1: .. } if (!a2) goto
> > end2; { yes2: .. } [.. and so on until 16 ..] end1: if (b1) goto yes1;
> > if (b2) goto yes2; [.. and so on ..]). It's quite hilarious.
> > 
> > Ronald
> 
> >  h264.c |    8 ++++++++
> >  1 file changed, 8 insertions(+)
> > b89da7914f847f12bbd9c9ca547deedafe4f6326  h264_use_add8.patch
> 
> if its faster (also time ./ffmpeg) and someone looked over the code

also instead or in addition to time ./ffmpeg START_TIMER over the
whole mb decode code could be used. The idea here is to not only test
the idct on its own but make sure that the whole code doesnt get slower

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement. For
even the very wise cannot see all ends. -- Gandalf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100913/f6450b05/attachment.pgp>



More information about the ffmpeg-devel mailing list