[FFmpeg-devel] [PATCH] Extra build options for ALS (and others)

Thilo Borgmann thilo.borgmann
Fri Nov 27 17:09:35 CET 2009


M?ns Rullg?rd schrieb:
> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
> 
>> M?ns Rullg?rd schrieb:
>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>
>>>> M?ns Rullg?rd schrieb:
>>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> recently the need for an extra build option for the ALS decoder arose.
>>>>> Is it impossible to achieve the desired outcome with some combination
>>>>> of always_inline, noinline, and flatten attributes?
>>>> No. See [PATCH] Split reading and decoding of blocks in ALS.
>>>>
>>>> Although I've managed to have the functions from the alsdec.c inlined
>>>> manually according to the grep'ed output of the assembler code, it seems
>>>> like it is not enough to manually inline functions from within that .c
>>>> file only using these technique.
>>> I'm confused.  Can it be done in the C code only or not?  This kind of
>>> issue should really not be solved in the makefile.
>> The issue is the big slowdown. The patch that causes this splits a big
>> function into two, which are then called successively.
>>
>> To overcome the slowdown issue, I inspected the functions being inlined
>> with and without the -finline-limit option. I can use av_always_inline
>> for many functions within alsdec.c to have the same functions inlined
>> like -finline-limit does.
>>
>> Unfortunately, using -finline-limit removes the slowdown introduced by
>> the patch while using av_always_inline does not.
> 
> So it's not doing the same thing.  What is it doing differently?
> Where did you get the limit number from?
> 

All function calls within alsdec.s when using -finline-limit=4096:
   1 	call	L1102
   1 	call	L138
   1 	call	L456
   2 	call	L___udivdi3$stub
  10 	call	L_av_freep$stub
   1 	call	L_av_get_bits_per_sample_format$stub
  12 	call	L_av_log$stub
   5 	call	L_av_log_missing_feature$stub
   8 	call	L_av_malloc$stub
   2 	call	L_av_mallocz$stub
   1 	call	L_ff_mpeg4audio_get_config$stub
   6 	call	L_memcpy$stub
   2 	call	L_memmove$stub
   1 	call	L_memset$stub
   2 	call	_decode_blocks_ind
   4 	call	_decode_end
  36 	call	_decode_rice
  10 	call	_get_bits_long
  11 	call	_parse_bs_info
   2 	call	_zero_remaining


All function calls within alsdec.s when using many av_always_inline's.
This is designed to inline the same functions from alsdec.c like the
unpatched alsdec.c would yield without any extra build option:
   1 	call	L1561
   1 	call	L176
   1 	call	L21
   2 	call	L___udivdi3$stub
  10 	call	L_av_freep$stub
   1 	call	L_av_get_bits_per_sample_format$stub
  13 	call	L_av_log$stub
   5 	call	L_av_log_missing_feature$stub
   8 	call	L_av_malloc$stub
   2 	call	L_av_mallocz$stub
   1 	call	L_ff_mpeg4audio_get_config$stub
   1 	call	L_memcpy$stub
   1 	call	L_memmove$stub
   2 	call	L_memset$stub
   8 	call	___inline_memcpy_chk
   2 	call	___inline_memmove_chk
   6 	call	_align_get_bits
   5 	call	_av_ceil_log2
   4 	call	_av_clip
   4 	call	_decode_end
  47 	call	_get_bits
  90 	call	_get_bits1
   3 	call	_get_bits_count
  61 	call	_get_bits_left
  39 	call	_get_bits_long
   4 	call	_get_sbits_long
  60 	call	_get_unary
   2 	call	_init_get_bits
   3 	call	_parse_bs_info
   3 	call	_read_time
   7 	call	_skip_bits
   2 	call	_skip_bits1
   5 	call	_skip_bits_long


So -finline-limit can inline many functions in the object file which are
not part of alsdec.c. Which might be the reason for the performance
difference.

But using -finline-limit does not yield a speed gain for the unpatched
file! So there might be something else but I don't see.

The value of 4096 has been choosen randomly. As long as I don't know
exactly why -finline-limit removes the slowdown and that it cannot be
replaced by another approach, there is no need to figure out a more
optimal value...

-Thilo



More information about the ffmpeg-devel mailing list