[FFmpeg-devel] [PATCH] Unroll base64 decode loop.

Reimar Döffinger Reimar.Doeffinger at gmx.de
Sat Jan 21 12:51:58 CET 2012


On Sat, Jan 21, 2012 at 12:45:09PM +0100, Reimar Döffinger wrote:
> Around 50% faster.
> decode:       374139 -> 248852 decicycles
> syntax check: 236955 -> 123854 decicycles

Note that this is despite gcc failing completely and utterly,
randomly deciding to make the "goto out" path the "fast" path
and sometimes not.
The code the optimizer creates IMO simply makes no sense.
I did not try it with this code, but using the __builtin_expect
cluebat did not help one bit on the previous try (which did
not use the larger table and thus resulted in even messier code).
The numbers mean that it still needs about 24 cycles per byte on
the Phenom2. Not sure if I should consider that good or bad...
It means that at the lowest clock speed of 800 MHz this does about
40 MB/s.


More information about the ffmpeg-devel mailing list