[FFmpeg-devel] SH4: optimization attempts

Guennadi Liakhovetski g.liakhovetski
Thu Jan 27 10:27:03 CET 2011


Hi all

I've been trying since some time to optimize decoding of video and audio 
for some "popular" codecs on SuperH (SH4A) CPUs. The target have been vp8, 
mp3, vorbis, aac, h.264, but I have also tried some others, for which I 
thought, I could do somewhat better, than the present generic code. A 
couple of words to the SH4A family: it has an FPU, has 
multiply-and-accumulate instructions for both integers and fp, has 
double-precision fp, also supports some operations with vectors of 4 fp 
registers: inner product and multiplication by a 4x4 matrix - both vector 
operations with 28-bit precision, has approximate sin and cos operations. 
The CPU also supports parallel instruction execution. Attached are my 
various attempts, of them the mp3 patch is known, I've posted a couple 
iterations of it to the list, will have to do a new one... The others are 
for info. Below is a table of my profiling results and optimizations 
attempts. I'd be glad to hear any comments to this, further optimizations 
ideas. Some of the proposed patches are generic, like the vp8, replacing 
multiplication by addition in several filter functions, but, as you see 
below, it didn't bring any results. Still, it might be better to switch to 
those versions, because I think, on other CPUs this might make a 
difference and it also looks better. Further, I'll be attending this year 
FOSDEM on the first February weekend, so, would be happy to continue 
discussing any of these topics there too.

Test cases: decoding of

codec		profile fn			%	optimization		time-drop, %

MP3		apply_window_mp3_c		53	SUM8_* additions	42
		MULH				17

VP8		vp8_decode_frame		14	*filter*: '*' to '+'	0
		vp8_h_loop_filter16_inner_c	10
		vp8_h_loop_filter8uv_c		10
		vp8_v_loop_filter8uv_c		10
		vp8_v_loop_filter16_c		10
		vp8_v_loop_filter16_inner_c	9
		vp8_h_loop_filter16_c		8

Vorbis		vorbis_residue_decode		25	.vector_fmul()		0
		pass				16
		ff_imdct_half_c			9

AAC		decode_spectrum_and_dequant	14	VMUL2, VMUL4		0
		ff_imdct_half_c			12
		pass				11

ALS		decode_var_block_data		30	multiple		15
		get_bits1			16
		decode_rice			14
		get_unary			14

h.264		get_cabac			9
		put_h264_qpel_v_lowpass		9

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aac.diff
Type: text/x-diff
Size: 299 bytes
Desc: 
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: als.diff
Type: text/x-diff
Size: 10522 bytes
Desc: 
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dsputil.diff
Type: text/x-diff
Size: 15594 bytes
Desc: 
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment-0002.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vp8.diff
Type: text/x-diff
Size: 3138 bytes
Desc: 
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment-0003.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mp3.diff
Type: text/x-diff
Size: 1954 bytes
Desc: 
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment-0004.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mathops.diff
Type: text/x-diff
Size: 6867 bytes
Desc: 
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110127/440d06b3/attachment-0005.diff>



More information about the ffmpeg-devel mailing list