On Tue, Jan 23, 2007 at 12:22:26AM +0200, Siarhei Siamashka wrote:
> Hello All,
> First some background information and rationale.
> Nokia 770 [1] graphics chip has support for packed YUV422 color format
> (IMGFMT_YUY2 according to ffmpeg classification) but does not support 
> scaling (except for pixel doubling feature which can scale image exactly
> twice). So fullscreen video playback suffers a severe performance penalty 
> if it needs scaling. And I got some information that PXA270 in latest Sharp
> Zaurus PDA [2] also doesn't have hardware scaling capabilities, but do 
> support YUV colorspace (which formats exactly are supported still needs to 
> be clarified). So developing a fast ARM optimized scaler for these and similar
> devices makes sense.
> A natural solution for getting good scaler performance is to use JIT style 
> dynamic code generation. I spent full two days on the last weekend and got
> some initial scaler implementation working (it is quite simple and
> straightforward and uses less than 300 lines of code):
> https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer
> Its API is quite similar to libswscale, but a bit simplified. You need to
> initialize scaler context by providing source and destination resolution, 
> and also quality level setting. Code for scaling of a horizontal line of
> pixels is dynamically generated on this stage. Once context is initialized, 
> it can be used to scale planar YUV image data and get results in YUY2 
> format.
> Horizontal scaler works in the following way: each pixel in the destination
> buffer is either a copy of some pixel in the source buffer or an average value
> (1:1 proportion) of two nearest pixels. It is possible to extend scaling
> precision to add averaging proportions 1:3 and 3:1 with almost no 
> overhead.  Vertical scaling now just maps some source buffer line to each
> destination buffer line, but it can be probably extended to add support for
> 1:1 proportion averaging of two neighbour source pixel lines to get
> destination buffer line.
> So depending on quality setting, we get either nearest neighbour scaler or
> some kind of simplified low precision bilinear scaler. In order to estimate
> performance, I did some benchmarks with mplayer_1.0rc1-maemo.8 [3] which
> aready has this JIT code in use.
> # mplayer -nosound -benchmark -quiet -endpos 100 [scaler_settings] video.avi
> *** -sws 4 ***
> SwScaler: Nearest Neighbor / POINT scaler, from yuv420p to yuyv422 using C
> SwScaler: using C scaler for horizontal scaling
> SwScaler: using n-tap C scaler for vertical scaling (BGR)
> SwScaler: 640x272 -> 400x170
> BENCHMARKs: VC:  62.645s VO:  58.738s A:   0.000s Sys:   1.053s =  122.435s
> BENCHMARK%: VC: 51.1654% VO: 47.9746% A:  0.0000% Sys:  0.8599% = 100.0000%
> *** -sws 1 ***
> SwScaler: BILINEAR scaler, from yuv420p to yuyv422 using C
> SwScaler: using C scaler for horizontal scaling
> SwScaler: using n-tap C scaler for vertical scaling (BGR)
> SwScaler: 640x272 -> 400x170
> BENCHMARKs: VC:  64.029s VO: 164.350s A:   0.000s Sys:   1.321s =  229.700s
> BENCHMARK%: VC: 27.8750% VO: 71.5500% A:  0.0000% Sys:  0.5750% = 100.0000%
> *** JIT scaler, quality = 1 (nearest neighbour) ***
> [nokia770] Using ARM JIT scaler (quality=1) to scale 640x272 => 400x170
> BENCHMARKs: VC:  63.033s VO:   5.585s A:   0.000s Sys:   0.940s =   69.559s
> BENCHMARK%: VC: 90.6193% VO:  8.0295% A:  0.0000% Sys:  1.3512% = 100.0000%
> *** JIT scaler, quality = 2 (use pixel copy or 1:1 proportion averaging for
> horizontal scaling, nearest neighbour for vertical scaling) ***
> [nokia770] Using ARM JIT scaler (quality=2) to scale 640x272 => 400x170
> BENCHMARKs: VC:  62.893s VO:   7.551s A:   0.000s Sys:   1.000s =   71.444s
> BENCHMARK%: VC: 88.0310% VO: 10.5686% A:  0.0000% Sys:  1.4004% = 100.0000%
> So performance improvement over standard libswscale scalers (first two runs)
> is really huge. JIT scaler with quality setting 1 and nearest neighbour scaler
> >from libswscale are direct competitors here and JIT scaler implementation is
> 10x faster :)
> Using JIT scaler quality 2 settings, I can see some 'sparkles' in the image on
> vertical panning scenes, but horizontal panning looks ok. So I expect a
> good quality after improving vertical scaling by adding lines averaging.
> Now I wonder if it would be a good idea to include this JIT scaler for ARM
> into ffmpeg and what are the requirements for that? Of course I will clean up
> this code first, add more sanity checks and comments (most likely on next
> weekend). But I'm more worried about integration into libswscale code without
> turning it into a mess.

no, there wont be any mess, see swscale.c around line 2047 there are plenty
of special case converters, just add yours there too (iam assuming that your
converter does YV12 -> YUV422 + vertical + horizontal scaling
if you want that your converter is also used just as a horizintal scaler
together with the existing code for vertical scaling and colorspace
conversation then see hyscale & hcscale in swscale_template.c but please
dont follow the bad example there of just dumping the asm in there ...

> 1. Is there any documentation about internal libswscale structure and
> some hacking guidelines?

no, but such docs would be welcome as a patch :)

> 2. I see that scalers from libswscale have to support slices. Is it the only
> extra requirement or I should be aware of something else?

negative stride maybe, but that shouldnt be a problem i guess

> 3. What would be the best mapping of the scaling methods used in this JIT
> scaler code to libswscale scaling algorithms (nearest neighbour is clear, but
> I'm not sure about the rest).

SWS_FAST_BILINEAR for inaccurate bilinear, its currently used for our x86 JIT
scaler so this seems like a good choice

SWS_BILINEAR for (completely) correct bilinear scaling

and we can add more SWS_* if needed ...

