[FFmpeg-devel] [PATCH] Altivec version of h264_idct_add

Luca Barbato lu_zero
Sun Jun 3 06:38:27 CEST 2007


David Conrad wrote:
> On Jun 2, 2007, at 10:15 PM, Luca Barbato wrote:
> 
>> Loren Merritt wrote:
>>>
>>> The switch could be changed to a table if it matters.
>>
>> In theory vec_ste is all we need here sadly, I cannot manage to get it
>> working right for the unaligned cases.
> 
> I've never really looked at vec_ste before today, but it seems that
> vec_ste will always write the first element of the vector to the
> rounded-down 16-byte address, and to store to an unaligned address you
> have to move the data in the vector and store that element. The attached
> patch does this with a permute and uses it instead of the switch. It
> requires an additional 4 permutes and constant vector the aligned case,
> but it seems to be a bit faster overall on my G4.

vec_splat() should be enough (another perm spared)

> +    vec_u8_t repeatperm = (vec_u8_t)AVV(0x00, 0x01, 0x02, 0x03, 0x00, 0x01, 0x02, 0x03,
> +                                        0x00, 0x01, 0x02, 0x03, 0x00, 0x01, 0x02, 0x03);

lu

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero





More information about the ffmpeg-devel mailing list