# [FFmpeg-devel] [PATCH][RFC] Lagarith Decoder.

Måns Rullgård mans
Fri Aug 14 23:32:58 CEST 2009

```Nathan Caldwell <saintdev at gmail.com> writes:

> On Wed, Aug 12, 2009 at 7:54 AM, Reimar
> D?ffinger<Reimar.Doeffinger at gmx.de> wrote:
>> On Wed, Aug 12, 2009 at 02:12:55PM +0200, Michael Niedermayer wrote:
>>> On Mon, Aug 10, 2009 at 11:42:19PM -0600, Nathan Caldwell wrote:
>>> > On Sat, Aug 8, 2009 at 6:32 AM, Michael Niedermayer<michaelni at gmx.at> wrote:
>>> > >> +/* Fast round up to least power of 2 >= to x */
>>> > >> +static inline uint32_t clp2(uint32_t x)
>>> > >> +{
>>> > >> + ? ?x--;
>>> > >> + ? ?x |= (x >> 1);
>>> > >> + ? ?x |= (x >> 2);
>>> > >> + ? ?x |= (x >> 4);
>>> > >> + ? ?x |= (x >> 8);
>>> > >> + ? ?x |= (x >> 16);
>>> > >> + ? ?return x+1;
>>> > >> +}
>>> > >
>>> > > is 1<<av_log2(x) faster?
>>> >
>>> > Might be, but it gives different results, so it's a moot point.
>>>
>>> 2<<av_log2(x-1)
>>> or whatever
>>
>> Well, that all depends on what input range is needed.
>> E.g. for 0 the documentation does not match the behaviour
>> for the original function (returns 0 which is not even a
>> power of 2).
>> In the worst case, you'd have to do
>> return x > 1 ? 2 << av_log(x - 1) : x;
>> I think, which has a small but still existing chance of
>> being faster.
>
> Well, that went OT rather quickly, lol.
> 0 input doesn't really matter. If we have a cumulative probability of
> 0, then that means all probabilities are 0 and we have larger problems
> than nearest power of 2 being incorrect.
> Anyway, for my tests cpl2 was faster than av_log2 by quite a large
> margin ~2000 dezicycles for av_log2 vs. ~400 dezicycles for cpl2
> tested on both Core2 and lolAtom and got the same results). However
> this is only run once per plane, and av_log2 looks cleaner, so I'll

Did you try using an av_log2() implementation using CLZ, BSR or
similar instructions?  The shift/or sequence above may or may not be
faster than the current av_log2().  I timed a few variants on ARM, and
got these numbers:

2<<av_log2(x-1) w/ gcc:                 14 cycles
2<<av_log2(x-1) naively hand-assembled: 11
clp2() above w/ gcc (doesn't mess up):  12
hand-written asm using CLZ:              5

--
M?ns Rullg?rd
mans at mansr.com

```