[FFmpeg-devel] [PATCH] h264 luma interpolation 8x8 for altivec

Tue Jun 26 13:07:13 CEST 2007

Hi,

On 6/18/07, Luca Barbato <lu_zero at gentoo.org> wrote:
>
> Mauricio Alvarez wrote:
> > Hi All,
> >
> > Here I'm sending a patch that adds support for luma interpolation of 8x8
> > blocks using Altivec.
>

first, the patch is about 700 lines, a bit big, so I'll be slow
> commenting, maybe you should try to split it in pieces.

OK, I divided the patch in different parts.

part1: cosmetics: replace all the variable definitions in h264_altivec.c and
h264_template_altivec.c with the data types defined in types_altivec.h
including the use of the LOAD_ZERO and the corresponding zero_xxxxv
definitions.

part2: removes alignment correction of the destination pointers in luma_16x6
interpolations. As I mentioned before I have made a test and these pointers
are always aligned to 128-bit- The only source of misalignment (with respect
to 128-bit) is the variable block size, but in the 16x16 case the
destination is always aligned.

part3. add luma_8x8 interpolation routines. I made a complete patch for
this.  This functions are very similar to the 16x16 ones, but taking into
account the alignment of the 8x8 cases and other minor details.

> +static void PREFIX_h264_qpel8_h_lowpass_altivec(uint8_t * dst,
> > uint8_t * src, int dstStride, int srcStride) {
> > +  POWERPC_PERF_DECLARE(PREFIX_h264_qpel8_h_lowpass_num, 1);
>
> DO you really use this? I'm actively deprecating it since last year and
> probably I'll remove it anytime soon if nobody screams, I think dtrace
> on macosx and oprofile on linux cover all our performance counting needs

I'm not using these functions for profiling, So, I have not included them in
the new patch.

> \
> > static void OPNAME ## h264_qpel ## SIZE ## _mc10_ ## CODETYPE(uint8_t
> >*dst, uint8_t *src, int stride){ \
> >-    DECLARE_ALIGNED_16(uint8_t, half[SIZE*SIZE]);\
> >-    put_h264_qpel ## SIZE ## _h_lowpass_ ## CODETYPE(half, src, SIZE,
> >stride);\
> >+    DECLARE_ALIGNED_16(uint8_t, half[16*16]);\
> >+    put_h264_qpel ## SIZE ## _h_lowpass_ ## CODETYPE(half, src, 16,
> >stride);\
> >     OPNAME ## pixels ## SIZE ## _l2_ ## CODETYPE(dst, src, half,
> >stride, stride, SIZE);\
>
> doesn't look right

Well, the possible values of SIZE are 16 and 8 (for Altivec), but in both
cases we have to perform loads (and stores) in the temporary arrays of 16
bytes. Because of that I replaced uint8_t, half[SIZE*SIZE] by uint8_t,
half[16*16]. In this way we avoid to perform extra alignment checks in the
8x8 case.

>-  if ( (unsigned long) dst & 0x0f) {
> ...
> >+  if (((unsigned long)dst) % 16 == 0) {
>
> hm..

that, was corrected.

I guess that's all for now...

Thanks for your comments,

Mauricio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part1_h264_altivec_data_types.diff
Type: text/x-diff
Size: 50398 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070626/67d0d8bc/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part2_h264_altivec_align.diff
Type: text/x-diff
Size: 4679 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070626/67d0d8bc/attachment-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part3_h264_altivec_luma8x8.diff
Type: text/x-diff
Size: 13935 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070626/67d0d8bc/attachment-0002.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part3_h264_template_altivec_luma8x8.diff
Type: text/x-diff
Size: 14531 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070626/67d0d8bc/attachment-0003.diff>