[FFmpeg-devel] [PATCH rfc] use bswap builtins where available
Sat Aug 15 01:27:55 CEST 2009
On Fri, Aug 14, 2009 at 3:09 PM, Alexander Strange <astrange at ithinksw.com>wrote:
> gcc 4.2+ provides __builtin_bswap32/64. Since it's usually a good idea to
> use these instead of asm (they can be optimized more, don't clobber flags,
> their size is known, etc) I tried using them for bswap_32/64.
> The resulting binary is ~32kb smaller on x86-32; it actually has less bswap
> instructions (3658 vs 4072) but this is likely due to more optimizations.
> H.264 CABAC:
> old: avg 4.274 min 4.274 max 4.274 std.dev. 0.0
> new: avg 4.25 min 4.25 max 4.25 std.dev. 0.0
> old: avg 0.599 min 0.599 max 0.599 std.dev. 0.0
> new: avg 0.598 min 0.598 max 0.598 std.dev. 0.0
> Unfortunately the code for __builtin_bswap64+gcc 4.2+x86-32 is terrible,
> although fine in later versions, so it's under HAVE_FAST_64BIT for now.
> And there's no __builtin_bswap16; (x>>8)|(x<<8) generates rotates on its
> own even with gcc2.95, but I ended up with a slightly larger binary when I
> tried it here.
shifts will generate new constants if you use literals. You might get away
with a const overloaded version.
> Any different numbers for other architectures?
PPC can do load and store with end swapping, but not a bswap persaystwbrx,
More information about the ffmpeg-devel