[FFmpeg-devel] [PATCH] Unroll base64 decode loop.
michaelni at gmx.at
Sat Jan 21 15:58:45 CET 2012
On Sat, Jan 21, 2012 at 12:51:58PM +0100, Reimar Döffinger wrote:
> On Sat, Jan 21, 2012 at 12:45:09PM +0100, Reimar Döffinger wrote:
> > Around 50% faster.
> > decode: 374139 -> 248852 decicycles
> > syntax check: 236955 -> 123854 decicycles
> Note that this is despite gcc failing completely and utterly,
> randomly deciding to make the "goto out" path the "fast" path
> and sometimes not.
> The code the optimizer creates IMO simply makes no sense.
> I did not try it with this code, but using the __builtin_expect
> cluebat did not help one bit on the previous try (which did
> not use the larger table and thus resulted in even messier code).
> The numbers mean that it still needs about 24 cycles per byte on
> the Phenom2. Not sure if I should consider that good or bad...
id consider it bad if it was a human who wrote the asm :)
also it probably can be improved by making the table signed and making
invalid values negativ, with that if the bits get ored together the
final value will be negative if any input was so fewer checks could
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
During times of universal deceit, telling the truth becomes a
revolutionary act. -- George Orwell
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel