[FFmpeg-devel] [PATCH] 1D DCT for dsputil

Michael Niedermayer michaelni
Wed Jan 20 01:13:53 CET 2010


On Mon, Jan 18, 2010 at 11:49:03PM -0500, Vitor Sessak wrote:
> Vitor Sessak wrote:
>> Loren Merritt wrote:
>>> On Mon, 18 Jan 2010, Vitor Sessak wrote:
>>>
>>>> + data[i    ] =   COS(s,n,i) * val1 + SIN(s,n,i) * val2;
>>>> + data[i + 1] =   SIN(s,n,i) * val1 - COS(s,n,i) * val2;
>>>
>>> data aliases costab, so the SIN/COS loads will be duplicated.
>> Done.
>>>> + float tmp1 = data[i        ] * (1./n);
>>>> + float tmp2 = data[n - i - 1] * (1./n);
>>>> + float sin1 = 0.5/SIN(s,n,2*i+1);
>>>
>>> division?
>> I don't see how it is avoidable, I've tried a LUT and it is slower.
>
> I made a stupid mistake that was getting the benchmarks wrong. Actually a 
> LUT is faster. New patch attached.
>
[...]
> +static void ff_dct_calc_c(DCTContext *ctx, FFTSample *data)
> +{
> +    int n = 1 << ctx->nbits;
> +    int i;
> +
> +    if (ctx->inverse) {
> +        float next = data[n - 1];
> +
> +        for (i = n - 2; i >= 2; i -= 2) {
> +            float val1 = data[i    ];
> +            float val2 = data[i - 1] - data[i + 1];
> +            float c = COS(ctx, n, i);
> +            float s = SIN(ctx, n, i);
> +
> +            data[i    ] = c * val1 + s * val2;
> +            data[i + 1] = s * val1 - c * val2;
> +        }
> +
> +        data[1] = 2 * next;
> +
> +        ff_rdft_calc(&ctx->rdft, data);
> +
> +        for (i = 0; i < n / 2; i++) {
> +            float tmp1 = data[i        ] * (1. / n);
> +            float tmp2 = data[n - i - 1] * (1. / n);

float f= (1. / n); prior to the loop would make sure the compiler
does nothing silly


> +            float csc = ctx->csc2[i];
> +
> +            data[i        ] = tmp1 + tmp2 + csc * (tmp1 - tmp2);
> +            data[n - i - 1] = tmp1 + tmp2 - csc * (tmp1 - tmp2);

do wetrust the comiler that much?

float csc = ctx->csc2[i] * (tmp1 - tmp2);
tmp1+=tmp2;
data[i        ] = tmp1 + csc;
data[n - i - 1] = tmp1 - csc;

(if both are the same speed pick what you prefer)
also the patch is pretty much ok, commit if there are no other comments

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100120/c8a92f87/attachment.pgp>



More information about the ffmpeg-devel mailing list