[FFmpeg-devel] [PATCH 09/10] avfilter/vsrc_mandelbrot: use hypot()
Ganesh Ajjanagadde
gajjanag at mit.edu
Mon Nov 23 19:57:24 CET 2015
On Mon, Nov 23, 2015 at 1:02 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Nov 23, 2015 at 12:43:52PM -0500, Ganesh Ajjanagadde wrote:
>> On Sun, Nov 22, 2015 at 3:56 PM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
>> > On Sun, Nov 22, 2015 at 3:07 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> On Sun, Nov 22, 2015 at 12:05:49PM -0500, Ganesh Ajjanagadde wrote:
>> >>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> >>> ---
>> >>> libavfilter/vsrc_mandelbrot.c | 2 +-
>> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> >>>
>> >>> diff --git a/libavfilter/vsrc_mandelbrot.c b/libavfilter/vsrc_mandelbrot.c
>> >>> index 950c5c8..a0c101e 100644
>> >>> --- a/libavfilter/vsrc_mandelbrot.c
>> >>> +++ b/libavfilter/vsrc_mandelbrot.c
>> >>> @@ -291,7 +291,7 @@ static void draw_mandelbrot(AVFilterContext *ctx, uint32_t *color, int linesize,
>> >>>
>> >>> use_zyklus= (x==0 || s->inner!=BLACK ||color[x-1 + y*linesize] == 0xFF000000);
>> >>> if(use_zyklus)
>> >>> - epsilon= scale*1*sqrt(SQR(x-s->w/2) + SQR(y-s->h/2))/s->w;
>> >>> + epsilon= scale*hypot(x-s->w/2, y-s->h/2)/s->w;
>> >>
>> >> old:
>> >> 704 decicycles in hypo, 1048570 runs, 6 skips
>> >>
>> >> new:
>> >> 1075 decicycles in hypo, 1048566 runs, 10 skips
>> >>
>> >> that is from START/STOP_TIMER over hypot()
>> >>
>> >> the code is speed relevant as its executed per pixel
>> >
>> > Thanks for testing. Looking more closely, I see no reason for
>> > expensive sqrt calls anyway: one can simply square both sides; it
>> > should be cheaper. Will rework, post benchmark if it is indeed faster
>> > and does not suffer from floating point overflow, else will simply
>> > push a trivial removal of the "1".
>>
>> It seems like getting rid of the sqrt altogether has a very slight
>> positive impact (if any at all). I can post the patch, but would like
>> to know what to benchmark. There are numerous choices, e.g
>> draw_mandelbrot as a whole, the outer loop, or the inner loop.
>> I personally think the inner x loop (lines 268-388) is a good place to
>> look at, since the difference is very small anyway, and further
>> localization is impossible.
>
> please post the patch
bench posted first to see if it is considered interesting enough.
Bench over whole draw_mandelbrot using START/STOP timer on x86-64,
Haswell, GNU/Linux, command line:
ffmpeg -v error -f lavfi -i mandelbrot -f null -
new (draw_mandelbrot):
2145115730 decicycles in draw_mandelbrot, 1 runs, 0 skips
1728922365 decicycles in draw_mandelbrot, 2 runs, 0 skips
1564209877 decicycles in draw_mandelbrot, 4 runs, 0 skips
1638069308 decicycles in draw_mandelbrot, 8 runs, 0 skips
1823837319 decicycles in draw_mandelbrot, 16 runs, 0 skips
2076266287 decicycles in draw_mandelbrot, 32 runs, 0 skips
2445350509 decicycles in draw_mandelbrot, 64 runs, 0 skips
3076303786 decicycles in draw_mandelbrot, 128 runs, 0 skips
3976923705 decicycles in draw_mandelbrot, 256 runs, 0 skips
7601861275 decicycles in draw_mandelbrot, 512 runs, 0 skips
20857881401 decicycles in draw_mandelbrot, 1024 runs, 0 skips
old (draw_mandelbrot):
2134144710 decicycles in draw_mandelbrot, 1 runs, 0 skips
1801213045 decicycles in draw_mandelbrot, 2 runs, 0 skips
1632609867 decicycles in draw_mandelbrot, 4 runs, 0 skips
1668765306 decicycles in draw_mandelbrot, 8 runs, 0 skips
1845019538 decicycles in draw_mandelbrot, 16 runs, 0 skips
2064489945 decicycles in draw_mandelbrot, 32 runs, 0 skips
2485309937 decicycles in draw_mandelbrot, 64 runs, 0 skips
3149893392 decicycles in draw_mandelbrot, 128 runs, 0 skips
4048037523 decicycles in draw_mandelbrot, 256 runs, 0 skips
7954800255 decicycles in draw_mandelbrot, 512 runs, 0 skips
21393227201 decicycles in draw_mandelbrot, 1024 runs, 0 skips
[...]
More information about the ffmpeg-devel
mailing list