[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

Tue Oct 13 07:16:56 CEST 2015

On Tue, Oct 13, 2015 at 1:03 AM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
> On Tue, Oct 13, 2015 at 12:44 AM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
>> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>>> On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote:
>>> > Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>>> >
>>> >> Bench from libavfilter/astats on a 15 min clip.
>>> >
>>> > I believe that your test would indicate that the
>>> > old variant is faster or that no result can be
>>> > given which is what my tests show.
>>>
>>> Look at the bench and the numbers again, I have
>>> provided it above.
>>
>> Ok:
>> old:
>>     389 decicycles in abs,      64 runs,      0 skips
>>     350 decicycles in abs,     128 runs,      0 skips
>>     331 decicycles in abs,     256 runs,      0 skips
>>     321 decicycles in abs,     512 runs,      0 skips
>>     319 decicycles in abs,    1024 runs,      0 skips
>>     318 decicycles in abs,    2048 runs,      0 skips
>>     315 decicycles in abs,    4096 runs,      0 skips
>>     317 decicycles in abs,    8192 runs,      0 skips
>>     335 decicycles in abs,   16384 runs,      0 skips
>>     335 decicycles in abs,   32768 runs,      0 skips
>>
>> mew:
>>     382 decicycles in abs,      64 runs,      0 skips
>>     361 decicycles in abs,     128 runs,      0 skips
>>     356 decicycles in abs,     256 runs,      0 skips
>>     334 decicycles in abs,     512 runs,      0 skips
>>     322 decicycles in abs,    1024 runs,      0 skips
>>     317 decicycles in abs,    2048 runs,      0 skips
>>     315 decicycles in abs,    4096 runs,      0 skips
>>     341 decicycles in abs,    8192 runs,      0 skips
>>     363 decicycles in abs,   16383 runs,      1 skips
>>     342 decicycles in abs,   32767 runs,      1 skips
>> Numbers with high skips or low runs are not so
>> relevant afaik.
>
> Not so relevant, but as I said: it is still better.
>
>>
>>> They are essentially identical in the best case
>>> (most number of runs), the new variant is faster in
>>> the worst case.
>>
>> I would say the opposite is true but we can certainly
>> agree that there is no proof that one is faster.
>
> Do a random float test, the difference is more pronounced.

Simple bench for all abs stuff:

#include <math.h>
#include <time.h>
#include <float.h>
#include <stdlib.h>
#include <stdio.h>

#define FFABS(a) ((a) >= 0 ? (a) : (-(a)))
#define NUM_TRIALS 100000
#define NUM_ITER 100000

static float f[NUM_TRIALS];
static double g[NUM_TRIALS];
static int i[NUM_TRIALS];
static long long ll[NUM_TRIALS];

int main(void) {
    int c, d;
    clock_t start, end;
    double time;
    float abs_f;
    double abs_d;
    int abs_i;
    long long abs_ll;

    for (c = 0; c < NUM_TRIALS; ++c) {
        ll[c] = random();
        i[c] = rand();
        f[c] = (float)rand()/(float)(RAND_MAX/FLT_MAX);
        g[c] = (double)random()/(double)(RAND_MAX/DBL_MAX);
    }

    start = clock();
    for (d = 0; d < NUM_ITER; ++d)
    for (c = 0; c < NUM_TRIALS; ++c)
        f[c] = fabsf(f[c]);
    end = clock();
    time = ((double) (end - start)) / CLOCKS_PER_SEC;
    printf("fabsf: %lf\n", time);

    start = clock();
    for (d = 0; d < NUM_ITER; ++d)
    for (c = 0; c < NUM_TRIALS; ++c)
        f[c] = FFABS(f[c]);
    end = clock();
    time = ((double) (end - start)) / CLOCKS_PER_SEC;
    printf("FFABS: %lf\n", time);

    start = clock();
    for (d = 0; d < NUM_ITER; ++d)
    for (c = 0; c < NUM_TRIALS; ++c)
        g[c] = fabs(g[c]);
    end = clock();
    time = ((double) (end - start)) / CLOCKS_PER_SEC;
    printf("fabs: %lf\n", time);

    start = clock();
    for (d = 0; d < NUM_ITER; ++d)
    for (c = 0; c < NUM_TRIALS; ++c)
        g[c] = FFABS(g[c]);
    end = clock();
    time = ((double) (end - start)) / CLOCKS_PER_SEC;
    printf("FFABS: %lf\n", time);

    start = clock();
    for (d = 0; d < NUM_ITER; ++d)
    for (c = 0; c < NUM_TRIALS; ++c)
        i[c] = abs(i[c]);
    end = clock();
    time = ((double) (end - start)) / CLOCKS_PER_SEC;
    printf("abs: %lf\n", time);

    start = clock();
    for (d = 0; d < NUM_ITER; ++d)
    for (c = 0; c < NUM_TRIALS; ++c)
        i[c] = FFABS(i[c]);
    end = clock();
    time = ((double) (end - start)) / CLOCKS_PER_SEC;
    printf("FFABS: %lf\n", time);

    start = clock();
    for (d = 0; d < NUM_ITER; ++d)
    for (c = 0; c < NUM_TRIALS; ++c)
        ll[c] = llabs(ll[c]);
    end = clock();
    time = ((double) (end - start)) / CLOCKS_PER_SEC;
    printf("llabs: %lf\n", time);

    start = clock();
    for (d = 0; d < NUM_ITER; ++d)
    for (c = 0; c < NUM_TRIALS; ++c)
        ll[c] = FFABS(ll[c]);
    end = clock();
    time = ((double) (end - start)) / CLOCKS_PER_SEC;
    printf("FFABS: %lf\n", time);

    return 0;
}

>
>>
>>> You have not provided a bench proving otherwise.
>>
>> old:
>> user    0m20.338s
>> user    0m20.408s
>> user    0m20.287s
>> user    0m20.365s
>> user    0m20.208s
>> new:
>> user    0m20.197s
>> user    0m20.577s
>> user    0m20.434s
>> user    0m20.322s
>> user    0m20.356s

Am also curious how you got your bench. What plaftform, what command line?

>
> The difference here is imo too small to say anything. My point is
> precisely this: on most inputs, there is no difference. On bad (worst
> case) inputs, using fabs instead of the macro is far superior. The
> random float bench proves this. Translating that to some audio file
> should be easy: I suspect placing most samples near a silence value
> (0) does this.
>
>>
>>> > I am not sure if it makes sense to apply a patch
>>> > that is meant to improve speed if this improvement
>>> > can't be shown.
>>>
>>> I believe I have shown it above clearly.
>>
>> Imo, you have shown clearly that neither variant can
>> be shown to be faster.

Now I have with the above random bench.

>>
>> Carl Eugen
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel