[Ffmpeg-devel] patch: altivec optimizations for h264 decoder

Mon Feb 6 13:39:33 CET 2006

Hi

On Mon, Feb 06, 2006 at 11:24:14AM +0100, Mauricio Alvarez wrote:
> Hi all
> 
> As a part of my academic research on architectures for video
> decoding I am doing some optimizations to the h.264 decoder for the ppc
> architecture using altivec, and I want to submit them back to the ffmpeg
> project.
> 
> I have implemented the following functions:
> - luma motion compensation for 8x8 and 4x4 pixels blocks
> - chroma motion compensation for 4x4 pixel blocks
> - inverse transforms: 8x8 and 4x4
> 
> i) for the 4x4 inverse transform I have implemented two versions: the
> first one, called ff_h264_idct_add_altivec, implements the transform
> with the same algorith as the c version. The second one is
> ff_h264_idct_add_altivec_mat which implements an optimized matrix
> multiply algorithm described in Chen paper [1]. In the altivec
> implementation the second (matrix) algorithm has a speed-up of 2.95 with
> respect to the C version while the first version has 1.55.
> 
> ii)The 8x8 luma motion compensation implementation with altivec has a
> 2.12 speed-up compared with the C version and the 4x4 has 1.30.
> 
> iii)The chroma 4x4 motion compensation has a speed-up of 1.85 again
> compared with the C version.
> 
> iv) I have performed a regresion test and the new optimizations passed
> it ok. Also I have decocoded some videos[2] coded with the JM and x264
> encoders at HD resolution and all of them decode well.
> The speed-ups for the sequences used is described in the next table:
> 
> Coding options:
> - resolution: 1920x1088p25,
> - profile: main, level: 5.0
> - qp for I,P slices: 22
> - qp for B slices: 24
> - coded sequence: I-P-B-B-P-B-B
> - direct mode: temporal
> - Weighted prediction
> 
> 
> sequence	ffmpeg-cvs	ffmpeg-patch
> 		time [s]	time [s]	speed-up
> pedestrian	11,89		10,15		17,14 %
> riverbed	19,11		17,73		7,78 %
> blue sky	11,33		10,13		11,85 %
> rush hour	12,34		11,24		9,79 %
> AVG						11,64 %
> 
> I hope the patch is OK for FFMPEG developers. Any comments or suggestion
> to improve the patch are welcome.
> 
> Mauricio Alvarez
> Department of Computer Architecture
> Universitat Polit?cnica de Catalunya
> Barcelona-Spain.
> 
> [1] Yen-Kuang Chen, Eric Q. Li, Xiaosong Zhou?, and Steven Ge.
> Implementation of H.264 Encoder and Decoder on Personal Computers.
> Journal of Visual Communication and Image Representation, July 2005.
> 
> [2] Mpeg test sequences at HD resolution
> http://www.ldv.ei.tum.de/liquid.php?page=70
> 

[...]
> +      } break;										
> +    }
> +  
> +    vector unsigned char vdst_mask = vec_lvsl(0, dst);

mixing declarations and statements, romain is this an issue for ppc-asm or do
all compilers which support ppc-asm support this too?

[...]

the patch is also full of tabs and trailing whitespace, whoever applies it
will have to run this through clean_diff ...

the mixed indention style is ugly too but the files are already in this
messed up mix so its ok, fixin the indention of the whole ppc/* should be
separate if we do it ...

[...]
> +  if ( (unsigned long)dst & 0xF){		/* unaligned access to dst for add */
> +
> +    switch ((unsigned long)dst % 16){    						

hmm why not &0xF in both?

[...]
> @@ -264,3 +816,5 @@ void dsputil_h264_init_ppc(DSPContext* c
>      // ... pending ...
>    }
>  }
> +
> +

hmm

[...]
> +    signed int ABCD[4] __attribute__((aligned(16)));

please use the new DECLARE_ALIGNED macros

[...]

romain please review and test, you are the ppc maintainer

-- 
Michael