[Ffmpeg-devel] fastmemcpy in ffmpeg

Gunnar von Boehn gunnar
Mon Sep 25 15:55:57 CEST 2006


Hi Ulrich,

Ulrich von Zadow wrote:
> Gunnar von Boehn wrote:
> 
>>Hi
>>
>>Diego Biurrun wrote:
>>
>>>On Mon, Sep 25, 2006 at 10:47:40AM +0200, Michel Bardiaux wrote:
>>>
>>>
>>>>Silvano Galliani (kysucix) wrote:
>>>>
>>>>
>>>>>Is there some plan to include and use fastmemcpy implementation from
>>>>>mplayer?
>>
>>I've once collected and benchmarked a number of memcopy routines both
>>for x86 and PowerPC.
>>
>>http://www.greyhound-data.com/gunnar/glibc/
> 
> 
> I only found ppc data on the site, which, while very interesting, is
> only half of what you promised ;-). Did I just miss a link?

The charts of the detailed benchmarks only shows PPC CPUs, that is 
right. But it did some tests on x86 (Intel/AMD) CPUs as well.

This overview chart shows the difference between Linux memcpy and 
optimized version on different CPUs including x86.
http://www.greyhound-data.com/gunnar/glibc/membench_memcpy.gif

A simple but very effective way to double the memcopy speed
is to prefetch (stream) the source in while copying. Just by
adding one prefetch instruction to the normal Linux memcpy you can speed 
it up a lot 50%.

These few very simple rules work on nearly all CPU architectures

The below two points are needed for best burst write speed
- unrole the copy to write in cache lines
- align the copy so that the write is aligned to the cacheline borders

Prefetch the source some cache lines ahead to prevent source data 
stalls. The best prefetch distance depends on the CPU, typically very 
good numbers are 3-7 lines ahead.

On x86 its good to use MMX registers/instructions for prefetching and 
copying the data (by using MMX instructions you can memcpy without 
totally loosing (read overwriting) your data cache during a memcopy.

My focus was on PPC but some x86 routines should be included in the 
source of the Linux benchmarks. I can extract these routines from the 
sources for you if needed.

For copies bigger than 128 bytes a CPU optimized routines will usually 
be about twice as fast as the normal glibc Linux version.


Cheers
Gunnar




More information about the ffmpeg-devel mailing list