[FFmpeg-devel] Indeo3 replacement, part 2

Måns Rullgård mans
Fri Oct 2 15:18:12 CEST 2009


Maxim <max_pole at gmx.de> writes:

>>>> Thanks! All my pointers are always aligned at least on the int32_t
>>>> boundary. Should I rework my code to use casts all the way (remove all
>>>> AV_RNxx macros respectively)?
>>>>       
>>> if alignment is sufficient then a cast is better, int64_t needs 8 byte
>>> alignment though ...
>>>     
>>
>> Two aligned 32-bit accesses are generally at least as fast as one
>> unaligned 64-bit access.
>>
>>   
>
> Hmm, indeo3 switches between 4x4 and 8x8 block processing depending on
> the energy of the cell (low energy = big blocks, high energy = small
> blocks). The 4-byte alignment is guaranteed automatically because both
> width and hight must be a multiply of 4. This is true for the luminance
> plane. For the chrominance planes I perform the proper 4-byte alignment
> during the buffer allocation.
>
> Both luma and chroma planes can contain cells coded using the 8x8 block
> mode. I'm about to rewrite the code for this mode using int64_t all the
> way. Until now I used two int32_t variables representing the high and
> low parts respectively. Using an int64_t instead makes the whole code
> much simplier while adding some small overhead (2-4 extra instructions)
> on the 32-bit machines. I cannot test this code on a 64-bit machine but
> I consider it will be significantly faster then the code using a
> splitted hi/low variable...
> That would surely take the effect only on the 64-bit machines. On the
> 32-bit arch it doesn't matter because the memory operations for the
> int64_t will be coded using two 32-bit instructions anyway...

Many 32-bit machines, e.g. ARM, have double-word load/store
instructions.  On ARM, these instructions are either require 8-byte
alignment (ARMv5) or take one cycle less if the address is 8-byte
aligned (v6 and v7).

> Should I align each line of the frame buffer on the 8-byte boundary or
> leave the AV_RN64/AV_WN64 macros intact?

AV_RN64 is much slower than two 32-bit aligned loads since it is
written to allow any misalignment.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list