[Ffmpeg-devel] Re: fastmemcpy in ffmpeg

Rich Felker dalias
Wed Sep 27 17:59:57 CEST 2006

On Wed, Sep 27, 2006 at 12:49:16PM +0200, Gunnar von Boehn wrote:
> Rich,
> You seem take this technical question personal.
> Please don't do this.

I always take bloat personally.

> I was stating known facts, you can take them or not.

I don't respect argument-by-insisting-your-claim-is-"known facts".

> >>- An optimized version will be about twice as fast
> >> for sizes higher than 500 byte / 1KB.
> >
> >
> >Proof???
> Don't get silly.
> For a start, I'e written benchmarks for several CPUs in the regard. But 
> you don't need to trust me, take a look at the recommendations of AMD 
> and INTEL.
> Here is a link to AMD's recommended memcpy:
> http://www.greyhound-data.com/gunnar/glibc/memcpy_amd.cpp

This code is huge and ugly.

> A trivial copy, like you favor, has the side effect of trashing the 2nd 
> level cache. So you will loose your data and code cache which of course 
> will have a very negative impact on overall performance.

What do you mean by this? It won't "trash" the l2 cache any more than
any other reasonable implementation, unless you're talking about the
likes of "movntq" which is completely unacceptable for a
general-purpose memcpy and will result in very bad performance in the
worst case.

> For tiny copies smaller than ONE CPU cache line! - movsd is good.
> Its a fact, that for bigger copies (of 1 KB or more) you will only 
> achive about 25%-60% of the performance of an streaming optimized copy.

I'm extremely skeptical of the 25%. 60% I can totally believe, and
IMNSHO it's totally acceptable for the general-purpose implementation
not to be optimized for large copies. Programs needing specialized
memcpy performance (like media apps) are welcome to use their own
implementations. The system memcpy should be optimized for the sort of
app that does not merit custom asm.

> But you don't need to believe me, simply ask or trust AMD or INTEL or 
> IBM on these numbers.

LMAO, I _don't_ trust AMD or Intel. They want you to write code which
forces people to upgrade to their latest crap...

> Whats the point in buying memory with DDR400 or faster
> if your memcpy cripples the bus transmission rates to PC133 speed?

1. It doesn't.
2. If the main traffic on the memory bus is memcpy, your program is
   crap. The only large memcpy should be from system to video memory,
   which requires a specialized implementation with movntq anyway.


More information about the ffmpeg-devel mailing list