[FFmpeg-devel] Pipeline: H.264 speed improvements

Wed Dec 24 01:46:23 CET 2008

On Tue, Dec 23, 2008 at 7:40 PM, Guillaume POIRIER <poirierg at gmail.com> wrote:
> Hello,
>
> On Wed, Dec 24, 2008 at 12:02 AM, Jason Garrett-Glaser
> <darkshikari at gmail.com> wrote:
>>
>> For ARM this can be special-cased.  Intel CPUs have a 1-3 cycle CLZ
>> (depends on the CPU) but on AMD chips this can cost >10 cycles, so a
>> table is generally preferred on x86.
>
> The PPC970 (aka G5) has a 2 cycle latency for cntlzw and can do 2 of
> these per cycle.
> The PPC7450 (aka G4) has a 1 cycle latency.
>
> Note that to the best of my knowledge, there's no PPC inline assembly
> in FFmpeg, so this information is quite theoretical, all the most
> since I never wrote a single PPC function in assembly.

Couldn't the gcc intrinsic __builtin_clz() be used?  AFAIK it's
supported by GCCs quite far back (I know 3.4 supports it).

Dark Shikari