[FFmpeg-devel] Indeo3 replacement, part 2
Maxim
max_pole
Thu Oct 8 19:59:54 CEST 2009
Reimar D?ffinger schrieb:
> [...]
> These do the same thing to hi and lo independently, and each of these are 32 bits.
> I think this would be better as
> static inline uint32_t requant(uint32_t a) {
> #if HAVE_BIGENDIAN
> a &= 0xFF00FF00;
> a |= a >> 8;
> #else
> a &= 0x00FF00FF;
> a |= a << 8;
> #endif
> return a;
> }
>
> That is of course unless it makes more sense to get rid of the lo/hi
> split very early on and just use uint64_t - that depends on how much the
> complier would mess that up on 32 bit architectures I think, for what I
> can tell it should work a lot better on 64 bit architectures.
>
>
Ok, I've tested the possibility of using one uint64_t variable instead
of the hi/lo split. The really big trouble is not the compiler mess on
32bit machines but the endianness issue introduced through the intel's
design. Please look at the following scheme:
Indeo3 dyad correction (add 2 x 32bit delta):
1st DWORD(lo) 2nd DWORD(hi)
_______________ _______________
B1, B2, B3, B4 B5, B6, B7, B8
------------------------ ------------------------
byte1 byte0
B1-B8 are pixels in the memory grouped into 2 x 32bit DWORDs
"byte0" and "byte1" are bytes in the bitstream telling which delta
should be applied
As you can see ALWAYS above is in the little-endian order because Intel
= LE!
If we have a big-endian machine we need to do this processing in
reverse; otherwise it won't work right...
If we would use a uint64_t variable splitted into hi/lo parts all what
we need is to swap the order of the hi/lo parts of the delta table at
the time of this generation. The resting code leaves unchanged because
we apply both delta parts separately in the right (little-endian) order
like this:
pix_lo = ref_pix_lo + delta_lo[byte1];
pix_hi = ref_pix_hi + delta_lo[byte0];
If we use one monolitic uint64_t variable we need to add some endianness
compensation like this:
if (HAVE_BIGENDIAN)
FFSWAP(..., byte0, byte1);
delta64 = (delta_lo[byte0] << 32) + delta_lo[byte1];
pix64 = ref_pix64 + delta64;
As one can see it will be slower on big-endian architectures.
So I have the following dilemma:
monolitic uint64_t:
advantages
- the code is more compact and readable
- better operation on 64bit architectures
drawbacks
- requires more instructions on 32bit architectures
- requires extra code to handle endianness therefore tends to be
slower
splitted uint64_t:
advantages
- no extra endianness handling
- better operation on 32bit architectures
drawbacks
- separated code doing the same for both hi/lo parts requiered
- both C code and resulting machine code are bigger
Which design will be preferable? I cannot make any decision...
Regards
Maxim
More information about the ffmpeg-devel
mailing list