[FFmpeg-devel] [PATCH][RFC] Lagarith Decoder.

Wed Aug 12 16:41:01 CEST 2009

Reimar D?ffinger <Reimar.Doeffinger at gmx.de> writes:

> On Wed, Aug 12, 2009 at 02:12:55PM +0200, Michael Niedermayer wrote:
>> On Mon, Aug 10, 2009 at 11:42:19PM -0600, Nathan Caldwell wrote:
>> > On Sat, Aug 8, 2009 at 6:32 AM, Michael Niedermayer<michaelni at gmx.at> wrote:
>> > >> +/* Fast round up to least power of 2 >= to x */
>> > >> +static inline uint32_t clp2(uint32_t x)
>> > >> +{
>> > >> +    x--;
>> > >> +    x |= (x >> 1);
>> > >> +    x |= (x >> 2);
>> > >> +    x |= (x >> 4);
>> > >> +    x |= (x >> 8);
>> > >> +    x |= (x >> 16);
>> > >> +    return x+1;
>> > >> +}
>> > >
>> > > is 1<<av_log2(x) faster?
>> > 
>> > Might be, but it gives different results, so it's a moot point.
>> 
>> 2<<av_log2(x-1)
>> or whatever
>
> Well, that all depends on what input range is needed.
> E.g. for 0 the documentation does not match the behaviour
> for the original function (returns 0 which is not even a
> power of 2).
> In the worst case, you'd have to do
> return x > 1 ? 2 << av_log(x - 1) : x;
> I think, which has a small but still existing chance of
> being faster.

That's still easy to optimise, at least for ARM:

subs  r1, r0, #1
clz   r1, r1
movgt r0, #2
rsb   r1, r1, #31
lslgt r0, r0, r1

This should be about twice as fast as the shift/or version.

-- 
M?ns Rullg?rd
mans at mansr.com