[Ffmpeg-devel] [RFC] Addition of JIT accelerated scaler for ARM into libswscale
Siarhei Siamashka
siarhei.siamashka
Mon Jan 22 23:22:26 CET 2007
Hello All,
First some background information and rationale.
Nokia 770 [1] graphics chip has support for packed YUV422 color format
(IMGFMT_YUY2 according to ffmpeg classification) but does not support
scaling (except for pixel doubling feature which can scale image exactly
twice). So fullscreen video playback suffers a severe performance penalty
if it needs scaling. And I got some information that PXA270 in latest Sharp
Zaurus PDA [2] also doesn't have hardware scaling capabilities, but do
support YUV colorspace (which formats exactly are supported still needs to
be clarified). So developing a fast ARM optimized scaler for these and similar
devices makes sense.
A natural solution for getting good scaler performance is to use JIT style
dynamic code generation. I spent full two days on the last weekend and got
some initial scaler implementation working (it is quite simple and
straightforward and uses less than 300 lines of code):
https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer
Its API is quite similar to libswscale, but a bit simplified. You need to
initialize scaler context by providing source and destination resolution,
and also quality level setting. Code for scaling of a horizontal line of
pixels is dynamically generated on this stage. Once context is initialized,
it can be used to scale planar YUV image data and get results in YUY2
format.
Horizontal scaler works in the following way: each pixel in the destination
buffer is either a copy of some pixel in the source buffer or an average value
(1:1 proportion) of two nearest pixels. It is possible to extend scaling
precision to add averaging proportions 1:3 and 3:1 with almost no
overhead. Vertical scaling now just maps some source buffer line to each
destination buffer line, but it can be probably extended to add support for
1:1 proportion averaging of two neighbour source pixel lines to get
destination buffer line.
So depending on quality setting, we get either nearest neighbour scaler or
some kind of simplified low precision bilinear scaler. In order to estimate
performance, I did some benchmarks with mplayer_1.0rc1-maemo.8 [3] which
aready has this JIT code in use.
# mplayer -nosound -benchmark -quiet -endpos 100 [scaler_settings] video.avi
*** -sws 4 ***
SwScaler: Nearest Neighbor / POINT scaler, from yuv420p to yuyv422 using C
SwScaler: using C scaler for horizontal scaling
SwScaler: using n-tap C scaler for vertical scaling (BGR)
SwScaler: 640x272 -> 400x170
BENCHMARKs: VC: 62.645s VO: 58.738s A: 0.000s Sys: 1.053s = 122.435s
BENCHMARK%: VC: 51.1654% VO: 47.9746% A: 0.0000% Sys: 0.8599% = 100.0000%
*** -sws 1 ***
SwScaler: BILINEAR scaler, from yuv420p to yuyv422 using C
SwScaler: using C scaler for horizontal scaling
SwScaler: using n-tap C scaler for vertical scaling (BGR)
SwScaler: 640x272 -> 400x170
BENCHMARKs: VC: 64.029s VO: 164.350s A: 0.000s Sys: 1.321s = 229.700s
BENCHMARK%: VC: 27.8750% VO: 71.5500% A: 0.0000% Sys: 0.5750% = 100.0000%
*** JIT scaler, quality = 1 (nearest neighbour) ***
[nokia770] Using ARM JIT scaler (quality=1) to scale 640x272 => 400x170
BENCHMARKs: VC: 63.033s VO: 5.585s A: 0.000s Sys: 0.940s = 69.559s
BENCHMARK%: VC: 90.6193% VO: 8.0295% A: 0.0000% Sys: 1.3512% = 100.0000%
*** JIT scaler, quality = 2 (use pixel copy or 1:1 proportion averaging for
horizontal scaling, nearest neighbour for vertical scaling) ***
[nokia770] Using ARM JIT scaler (quality=2) to scale 640x272 => 400x170
BENCHMARKs: VC: 62.893s VO: 7.551s A: 0.000s Sys: 1.000s = 71.444s
BENCHMARK%: VC: 88.0310% VO: 10.5686% A: 0.0000% Sys: 1.4004% = 100.0000%
So performance improvement over standard libswscale scalers (first two runs)
is really huge. JIT scaler with quality setting 1 and nearest neighbour scaler
from libswscale are direct competitors here and JIT scaler implementation is
10x faster :)
Using JIT scaler quality 2 settings, I can see some 'sparkles' in the image on
vertical panning scenes, but horizontal panning looks ok. So I expect a
good quality after improving vertical scaling by adding lines averaging.
Now I wonder if it would be a good idea to include this JIT scaler for ARM
into ffmpeg and what are the requirements for that? Of course I will clean up
this code first, add more sanity checks and comments (most likely on next
weekend). But I'm more worried about integration into libswscale code without
turning it into a mess.
1. Is there any documentation about internal libswscale structure and
some hacking guidelines?
2. I see that scalers from libswscale have to support slices. Is it the only
extra requirement or I should be aware of something else?
3. What would be the best mapping of the scaling methods used in this JIT
scaler code to libswscale scaling algorithms (nearest neighbour is clear, but
I'm not sure about the rest).
[1] http://en.wikipedia.org/wiki/Nokia_770
[2] http://en.wikipedia.org/wiki/Zaurus
[3] https://garage.maemo.org/projects/mplayer/
More information about the ffmpeg-devel
mailing list