[Ffmpeg-devel] [RFC] Addition of JIT accelerated scaler for ARM into libswscale

Måns Rullgård mru
Tue Jan 23 00:43:40 CET 2007

"Guillaume POIRIER" <poirierg at gmail.com> writes:

> Hi,
> On 1/22/07, Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
> [..]
>> A natural solution for getting good scaler performance is to use JIT style
>> dynamic code generation. I spent full two days on the last weekend and got
>> some initial scaler implementation working (it is quite simple and
>> straightforward and uses less than 300 lines of code):
>> https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer
>> Its API is quite similar to libswscale, but a bit simplified. You need to
>> initialize scaler context by providing source and destination resolution,
>> and also quality level setting. Code for scaling of a horizontal line of
>> pixels is dynamically generated on this stage. Once context is initialized,
>> it can be used to scale planar YUV image data and get results in YUY2
>> format.
> I may sound like a rookie to ask this, but could you tell me what
> dynamic code generation precisely allows to do that can't be done with
> "straight code"?
> Also, why (optimized) dynamic code can be faster that "straight code"?

It is sometimes more efficient to perform a particular scaling than a
generic one.  For instance, suppose you want to do bilinear upscaling
by a factor of 2.  Code that only does exactly this scaling is simpler
than code that can scale by arbitrary factors.

> I have never written a single line of such kind of code, so I'm
> curious. Plus, modern CPUs (PPC, x86 at least) make it harder to
> program efficient dynamic code, so I heard.

Modern CPUs require proper instruction scheduling to perform
optimally, that is true.  There is nothing that prevents runtime
generated code from being optimally scheduled.

> For instance, if I remember correctly, P4 flushes its trace cache
> whenever code cache is written.... pretty un-efficient, isn't it?

Everybody knows the P4 is a disaster.  Even Intel admits it now, and
there is nothing left of the P4 in the Core 2 design.

M?ns Rullg?rd
mru at inprovide.com

More information about the ffmpeg-devel mailing list