[FFmpeg-devel] [PATCH] Move MLP's dot product to DSPContext
Tue Apr 21 05:31:14 CEST 2009
On Mon, Apr 20, 2009 at 11:01:10PM -0300, Ramiro Polla wrote:
> On Mon, Apr 20, 2009 at 9:40 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Mon, Apr 20, 2009 at 02:29:09AM -0300, Ramiro Polla wrote:
> >> On Mon, Apr 20, 2009 at 12:14 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> > On Sun, Apr 19, 2009 at 10:10:05PM -0300, Ramiro Polla wrote:
> >> >> Attached file move MLP's dot product to DSPContext. The filter order
> >> >> is a maximum of 8, and in the rematrix stage it's a maximum of 5+2
> >> >> channels for MLP and 7+0 channels for TrueHD, so it all fits in 8
> >> >> (hopefully) optimized functions.
> >> >
> >> > the functions are too small, the call overhead is too much
> >> > 1-8 multiplicatons and 1-8 additions is not enough ...
> >> I thought that would happen too, but strangely there was a speedup.
> > you wrote the whole function in asm() and that was slower?
> Attached are three asm variants: sse2, sse4, and altivec.
1. i meant non SIMD asm :)
If one wanted to do this in SIMD, it should do several channels
at once, or FIR & IIR at once or several blocks at once, then
SIMD should be faster but as is its not SIMD friendly
2. i mean the whole outer function not the 1-8 arithemtic ops one
this one as said is too small, the call overhead will kill it when
you try the same code (that is asm not gcc deoptiranomized C)
ahh and note, gcc and 64 operations -> very poor code, naive asm
will be much faster at least it was that way in the past ...
> "emms \n\t"
NEVER do emms in a inner loop
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> ... defining _GNU_SOURCE...
For the love of all that is holy, and some that is not, don't do that.
-- Luca & Mans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel