[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2
Fri Aug 25 01:23:30 CEST 2006
On Fri, Aug 25, 2006 at 01:37:28AM +0300, Uoti Urpala wrote:
> On Thu, 2006-08-24 at 18:02 -0400, Rich Felker wrote:
> > On Thu, Aug 24, 2006 at 11:16:19PM +0300, Uoti Urpala wrote:
> > > > > Not necessarily, and certainly not gcc-compatible inline asm. How many
> > > > > asm routines are there in FFmpeg or MPlayer that could not achieve
> > > > > comparable speed with intrinsics only?
> > > >
> > > > s/comparable/same or better/. 1-5% slowdown is not acceptable. And
> > > > with this correction I suspect the answer is _NONE_.
> > > Didn't some of the workarounds for old gcc versions cause slowdowns in
> > > that range?
> > At worst they caused a 1% slowdown in a portion of the code that uses
> > maybe 5% of the overall cpu time, not a 1% overall slowdown, IIRC.
> Recent vorbis commit messages mention a 0.5% overall vorbis speedup from
> converting intrinsics to inline asm, and a 0.5% overall vorbis slowdown
> from supporting older gcc versions. So that's an order of magnitude more
> slowdown than you said they caused "at worst", and a counterexample to
> your claim that using intrinsics instead of asm would cause over 1%
This is because of refusal to properly put the loop in the asm, iirc..
More information about the ffmpeg-devel