[Ffmpeg-devel] [RFC] smallcpy for h264

Michael Niedermayer michaelni
Sat Oct 7 13:37:54 CEST 2006


Hi

On Sat, Oct 07, 2006 at 01:25:59PM +0200, Luca Barbato wrote:
> Michael Niedermayer wrote:
> > Hi
> > 
> > 
> > gcc on x86 replaces memcpy(constant) by inlined and fast code IIRC so
> > anything like the proposed patch needs to be very carefully benchmarked
> > on x86
> 
> that's why I'm posting it
> > 
> > additionally due to call overhead compared to inlined double/uint64_t based
> > copy most of this cannot be faster even if the functions could do the copy
> > in 0 cpu cycles
> 
> so do you think macros are better?

yes, and there should be 3 user selectable cases
1. always use memcpy and leave it to gcc
2. use generic uint64_t based copy
3. use cpu specific tricks which of course will break runtime cpu selection

but before i will agree to this i want
1. to know why we spend a significnat time doing small memcpys
2. why ppc doesnt inline memcpy like x86 does

furthermore these aligment related changes must be split,reviewed
and applied before any benchmarking makes sense (= your benchmark
of missaliged arrays with memcpy vs. your code with aligned arrays
might show more the speed difference of alignment and less that
of the actual code)

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is




More information about the ffmpeg-devel mailing list