[Ffmpeg-devel] VP3/Theora Perfection
Michael Niedermayer
michaelni
Thu May 19 12:12:03 CEST 2005
Hi
On Thursday 19 May 2005 04:47, Mike Melanson wrote:
> Hi,
> I have replaced unpack_token() with a series of lookup tables in vp3.c.
> Now vp3data.h has more lines than vp3.c. Again, please test as I do not
> have great testing facilities right now. However, I did run a series of
> tests that validated a bunch of decoded tokens against the old function.
>
> Numbers for the speed freaks:
>
> [original]
> 1223 dezicycles in unpack_token, 32757 runs, 11 skips
> 1202 dezicycles in unpack_token, 65512 runs, 24 skips
> [new]
> 845 dezicycles in unpack_token, 32735 runs, 33 skips
> 841 dezicycles in unpack_token, 65466 runs, 70 skips
well, not here, after a cvs up unpack_dct_coeffs (which includes the
unpack_token()) speed droped by 20%, to exclude possible effects of local
changes i tried on a clean tree
[original]
47208165 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
46909636 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
47450793 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
[new]
43178650 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
42991589 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
43081780 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
FLAGS=-O3 -g -Wall -Wno-switch -fomit-frame-pointer -mcpu=athlon -march=athlon
which matches your claim, but it didnt make sense, not only is my dev tree
reacting in the opposite direction but the code shouldnt be faster, as you
replaced a single often unpredicted jump in a jump table with a few if()
which likely wont be predicted better
another try with different cflags confirmed my suspicion, your new code seems
slower but its smaller and gcc seems to inline it while it didnt previously
[original]
41514189 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
41710143 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
41758835 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
[new]
43992551 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
44276594 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
43972657 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
OPTFLAGS=-O3 -g -Wall -Wno-switch -fomit-frame-pointer -mcpu=athlon
-march=athlon -finline-limit=2000
>
> What should I optimize next?
retry same function with -finline-limit=2000 :)
--
Michael
More information about the ffmpeg-devel
mailing list