[FFmpeg-devel] [PATCH] avcodec/cbrt_tablegen: avoid pow and speed up cbrt_tableinit

Ganesh Ajjanagadde gajjanag at mit.edu
Thu Nov 26 04:46:13 CET 2015


On Wed, Nov 25, 2015 at 10:13 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Wed, Nov 25, 2015 at 8:48 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
>>
>> On Wed, Nov 25, 2015 at 8:29 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
>> wrote:
>> > On Wed, Nov 25, 2015 at 8:19 PM, Ronald S. Bultje <rsbultje at gmail.com>
>> > wrote:
>> >> Hi,
>> >>
>> >> On Wed, Nov 25, 2015 at 7:36 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
>> >> wrote:
>> >>
>> >>> On Wed, Nov 25, 2015 at 6:49 PM, James Almer <jamrial at gmail.com>
>> >>> wrote:
>> >>> > On 11/25/2015 8:32 PM, Ganesh Ajjanagadde wrote:
>> >>> >> On Wed, Nov 25, 2015 at 6:19 PM, Ronald S. Bultje
>> >>> >> <rsbultje at gmail.com>
>> >>> wrote:
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> On Wed, Nov 25, 2015 at 5:17 PM, Ganesh Ajjanagadde <
>> >>> gajjanagadde at gmail.com>
>> >>> >>> wrote:
>> >>> >>>>
>> >>> >>>> On systems having cbrt, there is no reason to use the slow pow
>> >>> function.
>> >>> >>>>
>> >>> >>>> Sample benchmark (x86-64, Haswell, GNU/Linux):
>> >>> >>>> new:
>> >>> >>>> 5124920 decicycles in cbrt_tableinit,       1 runs,      0 skips
>> >>> >>>>
>> >>> >>>> old:
>> >>> >>>> 12321680 decicycles in cbrt_tableinit,       1 runs,      0 skips
>> >>> >>>>
>> >>> >>>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> >>> >>>>
>> >>> >>>>
>> >>>
>> >>> -------------------------------------------------------------------------------
>> >>> >>>> What I wonder about is why --enable-hardcoded-tables is not the
>> >>> default
>> >>> >>>> for
>> >>> >>>> FFmpeg. Unless I am missing something, static storage is anyway
>> >>> allocated
>> >>> >>>> even
>> >>> >>>> if hardcoded tables are not used, and the cost is deferred to
>> >>> >>>> runtime
>> >>> >>>> instead of
>> >>> >>>> build time. Thus binary size should not be affected, but users
>> >>> >>>> burn
>> >>> cycles
>> >>> >>>> unnecessarily for every codec having these kinds of tables. I
>> >>> >>>> have
>> >>> these
>> >>> >>>> patches,
>> >>> >>>> simply because at the moment users are paying a price for the
>> >>> >>>> typical
>> >>> >>>> default.
>> >>> >>>> ---
>> >>> >>>>  libavcodec/cbrt_tablegen.h | 6 +++---
>> >>> >>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>> >>> >>>
>> >>> >>>
>> >>> >>> This has been discussed extensively in the past...
>> >>> >>
>> >>> >> Can you please give a link and/or timeframe to search for?
>> >>> >>
>> >>> >>>
>> >>> >>> As for the patch, don't forget that tablegen runs on the host
>> >>> >>> (build),
>> >>> not
>> >>> >>> target (runtime), whereas libm.h is for target (runtime) and may
>> >>> >>> not be
>> >>> >>> compatible. I believe that's why we don't use libm.h in tablegen
>> >>> >>> files.
>> >>> >>
>> >>> >> I don't understand this, it seems to me like any other code (at
>> >>> >> least
>> >>> >> in the default configure), it gets called, and like all other such
>> >>> >> things, we use libavutil/libm for hackery. This host/target
>> >>> >> business
>> >>> >> affects other things as well. What is the issue?
>> >>> >
>> >>> > libavutil/libm.h uses defines from config.h, which are based on the
>> >>> tests run
>> >>> > by configure for the target, and not the host where compilation
>> >>> > takes
>> >>> place.
>> >>> > The tablegen applications all run at compile time. What is available
>> >>> > on
>> >>> the
>> >>> > target may not be on the host.
>> >>>
>> >>> Ok. So I would like an answer to two simple questions that are outside
>> >>> my knowledge or interest.
>> >>>
>> >>> Is it possible with some hackery to get this change through, or not?
>> >>> If so, what is it?
>> >>
>> >>
>> >> You need to understand the issue before you can evaluate hacks.
>> >>
>> >> The issue is:
>> >> - I'm using a linux x86-64 machine using gcc as a compiler, with
>> >> libc=glibc
>> >> 2.18 (A);
>> >> - to build a binary that will run on a Windows Mobile ARMv7 machine,
>> >> with
>> >> libC=something-from-Microsoft (B).
>> >>
>> >> tablegen runs on A, but ffmpeg.exe runs on B. libavutil/libm.h only
>> >> works
>> >> for B. If you want a version of libm.h on A, you need to generate a
>> >> version
>> >> of libm.h that works on A. There is no relationship between A and B,
>> >> and
>> >> thus there can not possibly ever be any relationship between A's libm.h
>> >> and
>> >> B's libavutil/libm.h.
>> >>
>> >> It's probably possible to generate a version of libm.h for A, but
>> >> that's
>> >> not so much a coding issue, as it is an issue of understanding the
>> >> build
>> >> system and including detection for stuff on machine A, as opposed to
>> >> machine B (which is what most of configure does).
>> >
>> > Thanks a lot for the detail. So how about using a local
>> > #ifndef cbrt
>> > #define cbrt(x) pow(x, 1 / 3.0)
>> > code...
>> > #undef cbrt // at the very end of the file
>> > #endif
>>
>> Not that simple, something more like
>> #ifndef cbrt
>> #define ff_cbrt(x) pow(x, 1/3.0)
>> #else
>> #define ff_cbrt(x) cbrt(x)
>> code...
>> #undef ff_cbrt // at the very end of the file
>> #endif
>>
>> - this will resolve a glitch with the above in not undef'ing an
>> important symbol (all this is of course without including
>> libavutil/libm.h).
>
>
> I don't think cbrt is a macro? But anyway, I don't think any of this is
> meaningful, you're basically creating a crappy per-file libm.h for tablegen
> files which will be duplicated all over the place. How about we do this
> correctly, if we do it at all?

I lack the knowledge for doing this in configure, all this avutil/libm
and compat business is black magic to me, and worse still I can hardly
test any of it since I run exclusively a GNU/Linux setup, as evidenced
by some recent patches. I can put in an effort if someone is willing
to review and if it is reasonably simple in configure.

>
> (Don't forget these table inits run once per process, so the cost of
> increased code messiness and code complexity has a much higher weight than
> it does in normal situations. Runtime isn't that important because it only
> runs once.)

I doubt the patch itself qualifies as complex/messy as is. It is all
this cross compiling business and associated hackery that is.

An extra 0.1 ms of CPU burning across 1000's of clients across the
globe is significant, especially when it is trivially avoided, going
back to the above point.

>
> Ronald


More information about the ffmpeg-devel mailing list