[FFmpeg-devel] [PATCH] avcodec/cbrt_tablegen: avoid pow and speed up cbrt_tableinit

Ganesh Ajjanagadde gajjanagadde at gmail.com
Thu Nov 26 04:10:33 CET 2015

On Wed, Nov 25, 2015 at 6:32 PM, Ganesh Ajjanagadde
<gajjanagadde at gmail.com> wrote:
> On Wed, Nov 25, 2015 at 6:19 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> Hi,
>> On Wed, Nov 25, 2015 at 5:17 PM, Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> wrote:
>>> On systems having cbrt, there is no reason to use the slow pow function.
>>> Sample benchmark (x86-64, Haswell, GNU/Linux):
>>> new:
>>> 5124920 decicycles in cbrt_tableinit,       1 runs,      0 skips
>>> old:
>>> 12321680 decicycles in cbrt_tableinit,       1 runs,      0 skips
>>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>>> -------------------------------------------------------------------------------
>>> What I wonder about is why --enable-hardcoded-tables is not the default
>>> for
>>> FFmpeg. Unless I am missing something, static storage is anyway allocated
>>> even
>>> if hardcoded tables are not used, and the cost is deferred to runtime
>>> instead of
>>> build time. Thus binary size should not be affected, but users burn cycles
>>> unnecessarily for every codec having these kinds of tables. I have these
>>> patches,
>>> simply because at the moment users are paying a price for the typical
>>> default.
>>> ---
>>>  libavcodec/cbrt_tablegen.h | 6 +++---
>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>> This has been discussed extensively in the past...
> Can you please give a link and/or timeframe to search for?

For any interested, dug up at least one extensive thread on this:

So I now see that static tables (even with array size explicitly
written down) do not neccesarily occupy memory until runtime. That was
the main source of my question. Now that I understand it, I realize
that picking a default is not easy and is very much environment
dependent - the cost of e.g a TLB miss/page fault due to a larger
binary can be a serious issue. As a practical example, servers with
beefed up hardware may prefer the extra memory footprint, but
lightweight devices may care a lot about memory requirements.
Ultimately, it is impossible to satisfy both simultaneously, and doing
e.g a configure time picking of a default heuristically is not

Nevertheless, FFmpeg should IMHO do a best possible effort. For
clients who use hardcoded tables, there are no issues. For clients who
prefer a smaller binary, we should make the runtime
"loading/initialization" as fast as possible. Nevertheless, this is
not the kind of thing where micro-optimizations are worthwhile, but
nevertheless, simple, clear optimizations with obvious benefits are
certainly valuable IMHO. As such, I consider these patches worth
pursuing, especially since I have demonstrated > 50% in both cases
with minimal diffs.


More information about the ffmpeg-devel mailing list