[FFmpeg-devel] [PATCH] Efficiently support several output pixel formats in Cinepak decoder

Wed Feb 8 09:45:52 EET 2017

On Tue, Feb 07, 2017 at 12:54:23PM -0500, Ronald S. Bultje wrote:
> On Tue, Feb 7, 2017 at 12:04 PM, <u-9iep at aetey.se> wrote:
> 
> > cinepak, rgb24            19.7     (via the fast bilinear swscaler)
> > cinepak, internal rgb565   6.0
> 
> 
> The reason that your decoder-integrated colorspace convertor is so much
> faster than swscale is because swscale is converting internally to yuv420p
> using a scaling filter, and then back to rgb565 using another scaling
> filter.

(If this indeed happens, this does not sound exactly efficient, nor
is a conversion to yuv420p lossless itself, IOW a shame?)

> This is "easily" solved by adding a direct (possibly
> x86-simd-accelerated) rgb24-to-rgb565 converter into
> libswscale/swscale_unscaled.c, which would likely have nearly identical
> performance to what you're seeing here. Possibly even faster, because
> you're allowing for simd optimizations.

This unfortunately can not come near an identical performance because
it would have to work on several times more data (frame vs codebook).

Besides that, there would be at least an extra copy operation over
each frame, even if the conversion itself would be indefinitely fast.

Generally:

I value layered design as much as you do, but it introduces limitations.

For comparison, an example from a different domain, but well known:
ZFS. It shortcut several "design layers" in the storage subsystem
which allowed a lot of improvement and did not render ZFS unmaintainable.

The shortcuts I add (not introduce, just add to the present ones) are
of exactly the same nature as a specialized converter in libswscale.
I just add them in a place where they are several times more efficient
(the amount of data to handle).

Cinepak is hardly going to make an impact similar to ZFS now :)
but it has in fact been very big before.

In certain aspects, by the original design, it is still superior to
virtualy anything else at hand. With the proposed optimization we get the
best out of its virtues. It would be a waste to ignore the possibility.

Regards,
Rune