[FFmpeg-devel] [PATCH] avcodec/nellymoserenc: avoid wasteful pow

Kacper Michajlow kasper93 at gmail.com
Fri Dec 18 10:06:35 CET 2015


One minor nitpick about commit message. You could mention which compiler
was used to generate code for benchmark. For example Clang 3.7 replaces
pow(2,...) with exp2(...) call by itself. So you probably did use gcc.
Anyway since it is already merged I guess take my reply as a hint for next
time :)

Regards,
Kacper
17 gru 2015 5:14 PM "Ganesh Ajjanagadde" <gajjanag at mit.edu> napisaƂ(a):

> On Tue, Dec 15, 2015 at 6:40 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
> > On Tue, Dec 15, 2015 at 5:25 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
> >> On Tue, Dec 15, 2015 at 2:23 AM, Michael Niedermayer <michaelni at gmx.at>
> wrote:
> >>> On Wed, Dec 09, 2015 at 06:55:25PM -0500, Ganesh Ajjanagadde wrote:
> > [...]
> >>>>
> >>>> diff --git a/libavcodec/nellymoserenc.c b/libavcodec/nellymoserenc.c
> >>>> index d998dba..e6023e3 100644
> >>>> --- a/libavcodec/nellymoserenc.c
> >>>> +++ b/libavcodec/nellymoserenc.c
> >>>> @@ -179,8 +179,15 @@ static av_cold int encode_init(AVCodecContext
> *avctx)
> >>>>
> >>>>      /* Generate overlap window */
> >>>>      ff_init_ff_sine_windows(7);
> >>>> -    for (i = 0; i < POW_TABLE_SIZE; i++)
> >>>> -        pow_table[i] = pow(2, -i / 2048.0 - 3.0 + POW_TABLE_OFFSET);
> >>>> +    pow_table[0] = 1;
> >>>> +    pow_table[1024] = M_SQRT1_2;
> >>>> +    for (i = 1; i < 513; i++) {
> >>>> +        double tmp = exp2(-i / 2048.0);
> >>>> +        pow_table[i] = tmp;
> >>>> +        pow_table[1024-i] = M_SQRT1_2 / tmp;
> >>>> +        pow_table[1024+i] = tmp * M_SQRT1_2;
> >>>> +        pow_table[2048-i] = 0.5 / tmp;
> >>>
> >>> how much overall init time is gained by this ?
> >>> that is time in ffmpeg main() from start to finish when just opening
> >>> the file with no decoding aka ./ffmpeg -i somefile
> >>
> >> Don't know, all I know is cycles are unnecessarily wasted. Will put in
> >> cycle numbers.
> >>
> >
> > Here they are:
> > proposed: 424160 decicycles in pow_table,     512 runs,      0 skips
> > exp2 only: 1262093 decicycles in pow_table,     512 runs,      0 skips
> > old: 2849085 decicycles in pow_table,     512 runs,      0 skips
> >
> > Thus old to exp2 is roughly 2.25x speedup, exp2 to proposed roughly 3x
> > speedup, net ~ 6.7x speedup.
>
> took Michael's comment as a general ack, so pushed with addition of a
> comment and cycle numbers.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>


More information about the ffmpeg-devel mailing list