[FFmpeg-devel] Indeo3 replacement, part 2

Maxim max_pole
Fri Oct 2 15:03:08 CEST 2009


M?ns Rullg?rd schrieb:
> Michael Niedermayer <michaelni at gmx.at> writes:
>
>   
>> On Fri, Oct 02, 2009 at 12:56:38AM +0200, Maxim wrote:
>>     
>>> M?ns Rullg?rd schrieb:
>>>       
>>>> Maxim <max_pole at gmx.de> writes:
>>>>
>>>>   
>>>>         
>>>>> Reimar D?ffinger schrieb:
>>>>>     
>>>>>           
>>>>>> On Tue, Sep 22, 2009 at 12:09:23AM +0200, Maxim wrote:
>>>>>>   
>>>>>> [...]
>>>>>>       
>>>>>>             
>>>>>>> /**
>>>>>>>  *  Average 4/8 pixels at once without rounding using softSIMD
>>>>>>>  */
>>>>>>> #define AVG_32(dst, src, ref)   AV_WN32((dst), ((AV_RN32(src) + AV_RN32(ref)) >> 1) & 0x7F7F7F7F)
>>>>>>> #define AVG_64(dst, src, ref)   AV_WN64((dst), ((AV_RN64(src) + AV_RN64(ref)) >> 1) & 0x7F7F7F7F7F7F7F7F)
>>>>>>>     
>>>>>>>         
>>>>>>>               
>>>>>> Are all of src, dst, ref unaligned in general? If not, you should be
>>>>>> using casts instead of AV_RN*
>>>>>>   
>>>>>>       
>>>>>>             
>>>>> Could someone skilled in the art explain me the difference between a
>>>>> cast and an AV_RNxx?
>>>>> I don't see any because the AV_RNxx macros use casts as well...
>>>>>     
>>>>>           
>>>> AV_RN* support unaligned reads, simple casts do not.
>>>>
>>>>   
>>>>         
>>>>> The code above looks more readable for me when using those macros than
>>>>> smth like this:
>>>>>
>>>>> #define AVG_64(dst, src, ref) \
>>>>>         *((uint64_t *)(dst)) = ((*((uint64_t *)(src)) + *((uint64_t
>>>>> *)(ref))) >> 1) & 0x7F7F7F7F7F7F7F7F
>>>>>     
>>>>>           
>>>> That will not work if any of the pointers are unaligned.
>>>>
>>>>   
>>>>         
>>> Thanks! All my pointers are always aligned at least on the int32_t
>>> boundary. Should I rework my code to use casts all the way (remove all
>>> AV_RNxx macros respectively)?
>>>       
>> if alignment is sufficient then a cast is better, int64_t needs 8 byte
>> alignment though ...
>>     
>
> Two aligned 32-bit accesses are generally at least as fast as one
> unaligned 64-bit access.
>
>   

Hmm, indeo3 switches between 4x4 and 8x8 block processing depending on
the energy of the cell (low energy = big blocks, high energy = small
blocks). The 4-byte alignment is guaranteed automatically because both
width and hight must be a multiply of 4. This is true for the luminance
plane. For the chrominance planes I perform the proper 4-byte alignment
during the buffer allocation.

Both luma and chroma planes can contain cells coded using the 8x8 block
mode. I'm about to rewrite the code for this mode using int64_t all the
way. Until now I used two int32_t variables representing the high and
low parts respectively. Using an int64_t instead makes the whole code
much simplier while adding some small overhead (2-4 extra instructions)
on the 32-bit machines. I cannot test this code on a 64-bit machine but
I consider it will be significantly faster then the code using a
splitted hi/low variable...
That would surely take the effect only on the 64-bit machines. On the
32-bit arch it doesn't matter because the memory operations for the
int64_t will be coded using two 32-bit instructions anyway...

Should I align each line of the frame buffer on the 8-byte boundary or
leave the AV_RN64/AV_WN64 macros intact?
This would significantly increase the buffer requirements though...

Regards
Maxim



More information about the ffmpeg-devel mailing list