[FFmpeg-devel] [RFC] fix lpc_mmx.c compilation with --enable-pic

Reimar Döffinger Reimar.Doeffinger
Fri Jan 1 23:30:16 CET 2010


On Sun, Dec 13, 2009 at 10:18:01PM +0100, Reimar D?ffinger wrote:
> On Tue, Nov 24, 2009 at 01:06:32PM +0100, Michael Niedermayer wrote:
> > On Sun, Nov 22, 2009 at 11:36:01AM +0100, Reimar D?ffinger wrote:
> > > Hello,
> > > unfortunately gcc is as usual quite stupid so a constraint like
> > > "=m"(autoc[j]), "=m"(autoc[j+1]), "=m"(autoc[j+2]) takes up a lot of
> > > registers (strangely this also causes all other asm in the same function
> > > to fail with "out of registers", including any asm from inlined
> > > functions).
> > > Attached is a possible solution I found which seems reasonable to me.
> > > Comments?
> > 
> > >  lpc_mmx.c |   19 ++++++++++---------
> > >  1 file changed, 10 insertions(+), 9 deletions(-)
> > > 1eacb77301b919079355642b630288a25d1eebce  pic_lpc.diff
> > 
> > if it works and is not slower, ok
> 
> Anyone volunteering to benchmark? I am still limited to my Atom-based computer
> and I don't think benchmarks are that useful on it (and mostly I am just lazy).

Ok, here are the numbers. I consider them good enough, with variance
too high to make it possible to see any difference, however as always I
am no good at benchmarking.
Ok to apply considering these numbers?

Best (considering only the 2048 runs number) out of 2x5 runs on AMD Phenom(tm) 9750 Quad-Core Processor:
Before:

1065030 dezicycles in ff_lpc_compute_autocorr_sse2, 1 runs, 0 skips
784355 dezicycles in ff_lpc_compute_autocorr_sse2, 2 runs, 0 skips
639530 dezicycles in ff_lpc_compute_autocorr_sse2, 4 runs, 0 skips
565688 dezicycles in ff_lpc_compute_autocorr_sse2, 8 runs, 0 skips
528453 dezicycles in ff_lpc_compute_autocorr_sse2, 16 runs, 0 skips
512771 dezicycles in ff_lpc_compute_autocorr_sse2, 32 runs, 0 skips
501901 dezicycles in ff_lpc_compute_autocorr_sse2, 64 runs, 0 skips
497215 dezicycles in ff_lpc_compute_autocorr_sse2, 128 runs, 0 skips
495358 dezicycles in ff_lpc_compute_autocorr_sse2, 256 runs, 0 skips
494296 dezicycles in ff_lpc_compute_autocorr_sse2, 512 runs, 0 skips
493776 dezicycles in ff_lpc_compute_autocorr_sse2, 1024 runs, 0 skips
493847 dezicycles in ff_lpc_compute_autocorr_sse2, 2048 runs, 0 skips

1093640 dezicycles in ff_lpc_compute_autocorr_sse2, 1 runs, 0 skips
795420 dezicycles in ff_lpc_compute_autocorr_sse2, 2 runs, 0 skips
644845 dezicycles in ff_lpc_compute_autocorr_sse2, 4 runs, 0 skips
574260 dezicycles in ff_lpc_compute_autocorr_sse2, 8 runs, 0 skips
539517 dezicycles in ff_lpc_compute_autocorr_sse2, 16 runs, 0 skips
517032 dezicycles in ff_lpc_compute_autocorr_sse2, 32 runs, 0 skips
510134 dezicycles in ff_lpc_compute_autocorr_sse2, 64 runs, 0 skips
500647 dezicycles in ff_lpc_compute_autocorr_sse2, 128 runs, 0 skips
496047 dezicycles in ff_lpc_compute_autocorr_sse2, 256 runs, 0 skips
496335 dezicycles in ff_lpc_compute_autocorr_sse2, 512 runs, 0 skips
495205 dezicycles in ff_lpc_compute_autocorr_sse2, 1024 runs, 0 skips
494260 dezicycles in ff_lpc_compute_autocorr_sse2, 2048 runs, 0 skips

After:
1131750 dezicycles in ff_lpc_compute_autocorr_sse2, 1 runs, 0 skips
820755 dezicycles in ff_lpc_compute_autocorr_sse2, 2 runs, 0 skips
658820 dezicycles in ff_lpc_compute_autocorr_sse2, 4 runs, 0 skips
575586 dezicycles in ff_lpc_compute_autocorr_sse2, 8 runs, 0 skips
533710 dezicycles in ff_lpc_compute_autocorr_sse2, 16 runs, 0 skips
512734 dezicycles in ff_lpc_compute_autocorr_sse2, 32 runs, 0 skips
502335 dezicycles in ff_lpc_compute_autocorr_sse2, 64 runs, 0 skips
497093 dezicycles in ff_lpc_compute_autocorr_sse2, 128 runs, 0 skips
495054 dezicycles in ff_lpc_compute_autocorr_sse2, 256 runs, 0 skips
494586 dezicycles in ff_lpc_compute_autocorr_sse2, 512 runs, 0 skips
494056 dezicycles in ff_lpc_compute_autocorr_sse2, 1024 runs, 0 skips
493315 dezicycles in ff_lpc_compute_autocorr_sse2, 2048 runs, 0 skips

1064190 dezicycles in ff_lpc_compute_autocorr_sse2, 1 runs, 0 skips
851065 dezicycles in ff_lpc_compute_autocorr_sse2, 2 runs, 0 skips
675667 dezicycles in ff_lpc_compute_autocorr_sse2, 4 runs, 0 skips
584622 dezicycles in ff_lpc_compute_autocorr_sse2, 8 runs, 0 skips
544371 dezicycles in ff_lpc_compute_autocorr_sse2, 16 runs, 0 skips
518779 dezicycles in ff_lpc_compute_autocorr_sse2, 32 runs, 0 skips
505830 dezicycles in ff_lpc_compute_autocorr_sse2, 64 runs, 0 skips
501376 dezicycles in ff_lpc_compute_autocorr_sse2, 128 runs, 0 skips
497203 dezicycles in ff_lpc_compute_autocorr_sse2, 256 runs, 0 skips
495284 dezicycles in ff_lpc_compute_autocorr_sse2, 512 runs, 0 skips
494523 dezicycles in ff_lpc_compute_autocorr_sse2, 1024 runs, 0 skips
494685 dezicycles in ff_lpc_compute_autocorr_sse2, 2048 runs, 0 skips



More information about the ffmpeg-devel mailing list