[FFmpeg-devel] [PATCH] skip no-op loop iterations in ff_flac_compute_autocorr

Justin Ruggles justin.ruggles
Fri Apr 10 00:04:46 CEST 2009


Bobby Bingham wrote:
> On Wed, 08 Apr 2009 17:42:55 -0400
> Justin Ruggles <justin.ruggles at gmail.com> wrote:
> 
>> Bobby Bingham wrote:
>>> Attached patch skips a bunch of iterations which are no-ops because
>>> data1[-lag...-1] are all zero.
>> looks ok. do the ASM versions do the same?
> 
> No, the SSE2 version does not do this, but if I'm reading this right,
> changing it to do the same doesn't actually help ...
> 
> This is on an Athlon 64 X2, in case it matters.  I'm attaching the
> (rather trivial) patch in case it is useful on some other processor.
> 
> Before:
> 2918550 dezicycles in flac_compute_autocorr_sse2, 1 runs, 0 skips
> 2522720 dezicycles in flac_compute_autocorr_sse2, 2 runs, 0 skips
> 2319855 dezicycles in flac_compute_autocorr_sse2, 4 runs, 0 skips
> 2220241 dezicycles in flac_compute_autocorr_sse2, 8 runs, 0 skips
> 2169318 dezicycles in flac_compute_autocorr_sse2, 16 runs, 0 skips
> 2152224 dezicycles in flac_compute_autocorr_sse2, 32 runs, 0 skips
> 2138448 dezicycles in flac_compute_autocorr_sse2, 64 runs, 0 skips
> 2133790 dezicycles in flac_compute_autocorr_sse2, 128 runs, 0 skips
> 2132774 dezicycles in flac_compute_autocorr_sse2, 256 runs, 0 skips
> 2148665 dezicycles in flac_compute_autocorr_sse2, 512 runs, 0 skips
> 2136874 dezicycles in flac_compute_autocorr_sse2, 1024 runs, 0 skips
> 2132130 dezicycles in flac_compute_autocorr_sse2, 2048 runs, 0 skips
> 2139767 dezicycles in flac_compute_autocorr_sse2, 4096 runs, 0 skips
> 2146832 dezicycles in flac_compute_autocorr_sse2, 8191 runs, 1 skips
> 2147972 dezicycles in flac_compute_autocorr_sse2, 16382 runs, 2 skips
> 
> After:
> 2888910 dezicycles in flac_compute_autocorr_sse2, 1 runs, 0 skips
> 2516665 dezicycles in flac_compute_autocorr_sse2, 2 runs, 0 skips
> 2321807 dezicycles in flac_compute_autocorr_sse2, 4 runs, 0 skips
> 2223960 dezicycles in flac_compute_autocorr_sse2, 8 runs, 0 skips
> 2419460 dezicycles in flac_compute_autocorr_sse2, 16 runs, 0 skips
> 2275824 dezicycles in flac_compute_autocorr_sse2, 32 runs, 0 skips
> 2205534 dezicycles in flac_compute_autocorr_sse2, 64 runs, 0 skips
> 2169755 dezicycles in flac_compute_autocorr_sse2, 128 runs, 0 skips
> 2153054 dezicycles in flac_compute_autocorr_sse2, 256 runs, 0 skips
> 2143111 dezicycles in flac_compute_autocorr_sse2, 512 runs, 0 skips
> 2137918 dezicycles in flac_compute_autocorr_sse2, 1024 runs, 0 skips
> 2138707 dezicycles in flac_compute_autocorr_sse2, 2048 runs, 0 skips
> 2144707 dezicycles in flac_compute_autocorr_sse2, 4096 runs, 0 skips
> 2147715 dezicycles in flac_compute_autocorr_sse2, 8192 runs, 0 skips
> 2148168 dezicycles in flac_compute_autocorr_sse2, 16384 runs, 0 skips

Oh well, thanks for trying it out and giving a patch.  I also have an
Athlon 64 X2, so someone else will need to test other CPUs to see if it
helps.

-Justin




More information about the ffmpeg-devel mailing list