[Ffmpeg-devel] [PATCH] simple_idct_armv5te optimization

Siarhei Siamashka siarhei.siamashka
Sat Sep 30 23:08:24 CEST 2006


On Saturday 30 September 2006 23:17, M?ns Rullg?rd wrote:

> > On Saturday 30 September 2006 20:41, Michael Niedermayer wrote:
> >
> >> patch looks ok (assuming its also faster with an actual video, instead
> >> of just dct-test)
> >
> > A good point. Actually on real video I was unable to see any visible
> > difference. And considering the increased code size, now I think it may
> > theoretically even cause some slowdown if the program runs out of
> > instruction cache, I remember a discussion in mplayer developers
> > mailing list about h264 decoder and -O4 vs. -O2.
> >
> > So it should be carefully benchmarked and investigated. Considering
> > the current 'simple_idct_armv5.S', a strange thing is that it
> > provides some performance improvement over older armv4 code for
> > mpeg1 (up to 10%), but almost does not have any effect for mpeg4
> > (within 1-2%) in my tests. And from the result of profiling (on x86
> > computer unfortunately, but with 'generic' cpu and MMX/SSE and uther
> > stuff disabled) both mpeg1 and mpeg4 heavily use IDCT, so some
> > effect should have been observed. There should be some
> > explanation. I'll try to find a way to measure effects of both data
> > and instruction caches.
>
> I'll have a look at it when I get time.  Unfortunately, that will
> probably not be within the next few days.

Well, appears that was my mistake. I did old decoding performance tests 
with 300-400kbps mpeg4 videos (transcoded for Nokia 770). Just took
512x288 1300kbps video clip, benchmarked its decoding  and got almost 
the same improvement (~10%) as observed with mpeg1 1150kbps files 
earlier. So benchmark numbers for this file are ranked in the following way:

old armv4 code: 312 seconds
current armv5te code: 287 seconds
current code with my last patch: 285 seconds

Tried it several times and the results seem to be consistent with at least 1
second precision.

Also an interesting observation is that Nokia 770 is able not only to decode
this file, but also play it (without scaling and with somewhat jerky
playback), I did not expect it. So probably it has potential for playing
even nonconverted video after all :)

So now the test with actual video shows at least some performance 
improvement and confirms the results of dct-test. I haven't seen any
regression with it yet (thought it would be a good idea to pay more attention
to code size next time). It does not provide much help on low bitrate files
(so probably it is worth to hunt for other bottlenecks here), but can help to
handle heavier bitrates.

I have also benchmarked simple_idct_armv5te by running it:
1. many times using the same buffer (data is cached)
2. walking around a big array and feeding new buffers to it each time (data is
read from memory)
Memory access seems to introduce about 20% slowdown here, so tweaking the
code by implementing some kind of prefetch may probably help too.




More information about the ffmpeg-devel mailing list