[FFmpeg-devel] Some optimization on JPEG decoding

Michael Niedermayer michaelni
Tue Jun 26 22:38:40 CEST 2007


On Tue, Jun 26, 2007 at 06:19:58PM +0200, Cyril Russo wrote:
> Hi all,
>   Here are some simple ideas, I've implemented on my local copy of 
> libavcodec which might be of interest to you.
> Concerning JPEG decoding, I've added support for thumbnail decoding.
> The idea is to only decode DC info from DCT, and produce a JPEG which is 
> 8 times smaller in width and height.
> The new thumbnail decoding uses its own decode_block, which ignores the 
> AC part of the DCT
> It also uses its own decode_scan function which shortcut the iDCT call 
> into a simple "*ptr = dcVal >> 3;"
> As a result, classic 5MP JPEG picture decoding uses 110ms (average other 
> 272 frames) on my computer (plus the downsampling, not included), while 
> the new thumbnail coding uses only 55ms (average other 272 frames).
> So, if you need to generate thumbnails quickly this is clearly a good 
> optimization (50% less computation time)

IIRC lowres mode is already supported in jpeg, if you have some improvements
for that they are welcome

> The other idea I've implemented is about speeding up the JPEG decoding 
> for current code.
> Current code does (pseudo code) :
>    1) for all macro blocks
>       1) Is it progressive ?
>           1) Ok, decode block
>           2) Not ok, decode block
>       2) Is it progressive ?
>           1) Ok, idct_put
>           2) Not ok, idct_add
> My code does:
> 1) Is it progressive
>     1) Ok,  for all macro blocks
>         1) decode blocks (plural here, current code does 32 blocks in a 
> batch)
>         2) idct_put
>     2) Not ok, for all macro blocks
>         1) decode blocks (plural here, current code does 32 blocks in a 
> batch)
>         2) idct_add

if its clean (no code duplication but rather uses always_inline) and faster
then its welcome

> The 1.1.1 part decodes 32 DCT blocks sequentially (so the processor can 
> keep the 32 DCT blocks in cache), and part 1.1.2 perform 32 iDCT 
> sequentially (again, this clearly improve the cache coherency).
> The modification improved the decoding time to 92ms (average other 272 
> frames) on my computer. This is a 16% speedup.
> I've tried different DCT sequence size, and 32 is quite good (32 blocks 
> takes exactly 4096 bytes).
> I think the same idea could be applied to other codec as well.
> I've tried to perform all the DCT first, then the IDCT in 2 separate 
> process. There was no speed increase as the DCT takes twice the space of 
> the current picture plane, so we soon get out of cache.
> It might be of interest however to perform the IDCT on the GPU (if 
> anyone is interested, I should still have some code about this).
>  From NVidia own tests, the IDCT on the GPU takes 20x less times than 
> CPU version, so it might finally worth the double memory requirement.
> If anyone is interested, please mail me, I'll send my changes.

you can post them here

> BTW, my branch is different from current SVN version, and I haven't even 
> tried to comply to whatever coding style of the moment.
> I clearly don't have time to rewrite the file multiple times, like last 
> time. If you are in the mood to do it, you're welcome.

well if someone (you or someone else) does provide clean patches which
pass review then they are welcome
messy patches though wont reach svn no matter what great improvements
they provide, you can fork ffmpeg and learn on your own why applying
messy patches is a very very bad idea

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070626/744d4095/attachment.pgp>

More information about the ffmpeg-devel mailing list