[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
Ganesh Ajjanagadde
gajjanag at mit.edu
Tue Oct 13 07:16:56 CEST 2015
On Tue, Oct 13, 2015 at 1:03 AM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
> On Tue, Oct 13, 2015 at 12:44 AM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
>> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>>> On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote:
>>> > Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>>> >
>>> >> Bench from libavfilter/astats on a 15 min clip.
>>> >
>>> > I believe that your test would indicate that the
>>> > old variant is faster or that no result can be
>>> > given which is what my tests show.
>>>
>>> Look at the bench and the numbers again, I have
>>> provided it above.
>>
>> Ok:
>> old:
>> 389 decicycles in abs, 64 runs, 0 skips
>> 350 decicycles in abs, 128 runs, 0 skips
>> 331 decicycles in abs, 256 runs, 0 skips
>> 321 decicycles in abs, 512 runs, 0 skips
>> 319 decicycles in abs, 1024 runs, 0 skips
>> 318 decicycles in abs, 2048 runs, 0 skips
>> 315 decicycles in abs, 4096 runs, 0 skips
>> 317 decicycles in abs, 8192 runs, 0 skips
>> 335 decicycles in abs, 16384 runs, 0 skips
>> 335 decicycles in abs, 32768 runs, 0 skips
>>
>> mew:
>> 382 decicycles in abs, 64 runs, 0 skips
>> 361 decicycles in abs, 128 runs, 0 skips
>> 356 decicycles in abs, 256 runs, 0 skips
>> 334 decicycles in abs, 512 runs, 0 skips
>> 322 decicycles in abs, 1024 runs, 0 skips
>> 317 decicycles in abs, 2048 runs, 0 skips
>> 315 decicycles in abs, 4096 runs, 0 skips
>> 341 decicycles in abs, 8192 runs, 0 skips
>> 363 decicycles in abs, 16383 runs, 1 skips
>> 342 decicycles in abs, 32767 runs, 1 skips
>> Numbers with high skips or low runs are not so
>> relevant afaik.
>
> Not so relevant, but as I said: it is still better.
>
>>
>>> They are essentially identical in the best case
>>> (most number of runs), the new variant is faster in
>>> the worst case.
>>
>> I would say the opposite is true but we can certainly
>> agree that there is no proof that one is faster.
>
> Do a random float test, the difference is more pronounced.
Simple bench for all abs stuff:
#include <math.h>
#include <time.h>
#include <float.h>
#include <stdlib.h>
#include <stdio.h>
#define FFABS(a) ((a) >= 0 ? (a) : (-(a)))
#define NUM_TRIALS 100000
#define NUM_ITER 100000
static float f[NUM_TRIALS];
static double g[NUM_TRIALS];
static int i[NUM_TRIALS];
static long long ll[NUM_TRIALS];
int main(void) {
int c, d;
clock_t start, end;
double time;
float abs_f;
double abs_d;
int abs_i;
long long abs_ll;
for (c = 0; c < NUM_TRIALS; ++c) {
ll[c] = random();
i[c] = rand();
f[c] = (float)rand()/(float)(RAND_MAX/FLT_MAX);
g[c] = (double)random()/(double)(RAND_MAX/DBL_MAX);
}
start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
f[c] = fabsf(f[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("fabsf: %lf\n", time);
start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
f[c] = FFABS(f[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("FFABS: %lf\n", time);
start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
g[c] = fabs(g[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("fabs: %lf\n", time);
start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
g[c] = FFABS(g[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("FFABS: %lf\n", time);
start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
i[c] = abs(i[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("abs: %lf\n", time);
start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
i[c] = FFABS(i[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("FFABS: %lf\n", time);
start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
ll[c] = llabs(ll[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("llabs: %lf\n", time);
start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
ll[c] = FFABS(ll[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("FFABS: %lf\n", time);
return 0;
}
>
>>
>>> You have not provided a bench proving otherwise.
>>
>> old:
>> user 0m20.338s
>> user 0m20.408s
>> user 0m20.287s
>> user 0m20.365s
>> user 0m20.208s
>> new:
>> user 0m20.197s
>> user 0m20.577s
>> user 0m20.434s
>> user 0m20.322s
>> user 0m20.356s
Am also curious how you got your bench. What plaftform, what command line?
>
> The difference here is imo too small to say anything. My point is
> precisely this: on most inputs, there is no difference. On bad (worst
> case) inputs, using fabs instead of the macro is far superior. The
> random float bench proves this. Translating that to some audio file
> should be easy: I suspect placing most samples near a silence value
> (0) does this.
>
>>
>>> > I am not sure if it makes sense to apply a patch
>>> > that is meant to improve speed if this improvement
>>> > can't be shown.
>>>
>>> I believe I have shown it above clearly.
>>
>> Imo, you have shown clearly that neither variant can
>> be shown to be faster.
Now I have with the above random bench.
>>
>> Carl Eugen
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list