[FFmpeg-devel] [PATCH] More H.264 decoding speed tweaks

Loren Merritt lorenm
Mon Jun 23 21:48:05 CEST 2008


On Sun, 22 Jun 2008, Jason Garrett-Glaser wrote:

> Odd that my benchmarks were otherwise; perhaps its more
> source-dependent than I thought?

16 more movies or clips thereof. again 10 runs each. core2 e6600.
  704x400   706kbit  svn:  78.61 +/- 0.12  patched:  82.03 +/- 0.11  (+4.35% +/- 0.21%)
  704x400   771kbit  svn:  61.56 +/- 0.12  patched:  64.64 +/- 0.11  (+5.01% +/- 0.27%)
1280x720  1170kbit  svn: 196.65 +/- 0.27  patched: 208.57 +/- 0.40  (+6.07% +/- 0.25%)
1280x720  1604kbit  svn: 207.14 +/- 0.18  patched: 219.12 +/- 0.25  (+5.79% +/- 0.15%)
  704x480  1730kbit  svn: 124.98 +/- 0.17  patched: 127.23 +/- 0.20  (+1.80% +/- 0.21%)
1280x720  3650kbit  svn:  75.65 +/- 0.07  patched:  78.17 +/- 0.09  (+3.33% +/- 0.15%)
1920x1080 3658kbit  svn:  20.48 +/- 0.04  patched:  21.52 +/- 0.04  (+5.03% +/- 0.30%)
1280x528  4434kbit  svn:  39.92 +/- 0.05  patched:  40.70 +/- 0.06  (+1.96% +/- 0.20%)
1280x544  6399kbit  svn:  25.65 +/- 0.04  patched:  25.96 +/- 0.05  (+1.21% +/- 0.26%)
1280x534  6868kbit  svn: 230.24 +/- 0.36  patched: 234.44 +/- 0.26  (+1.82% +/- 0.19%)
1280x536  6964kbit  svn:  29.60 +/- 0.05  patched:  29.94 +/- 0.05  (+1.15% +/- 0.22%)
1920x784  7052kbit  svn:  29.46 +/- 0.05  patched:  30.13 +/- 0.04  (+2.30% +/- 0.21%)
1920x1040 7352kbit  svn: 197.70 +/- 0.24  patched: 202.06 +/- 0.24  (+2.20% +/- 0.17%)
1920x1040 7457kbit  svn:  44.82 +/- 0.09  patched:  46.53 +/- 0.05  (+3.81% +/- 0.23%)
1280x536  7534kbit  svn:  16.26 +/- 0.03  patched:  16.40 +/- 0.02  (+0.85% +/- 0.22%)
1280x536  7670kbit  svn:  62.17 +/- 0.04  patched:  63.59 +/- 0.06  (+2.29% +/- 0.12%)

... so yes the amount i's source dependent, but I failed to find any where 
dc_add wasn't good.

> One question, Loren, while we're on this topic; how should your SSE2
> iDCT4x4 be implemented when we start bringing x264's nasm asm to
> ffmpeg?  Should we check the DC coefficient for each 8x4 block and
> make an 8x4 dc_add, or what?

Yes. The question is how to modify the C so that it can use 8x4 without 
losing speed on non-sse2 cpus.
(a) Duplicate the loops and put a cpu check in the C.
(b) Define a 8x4 mmx idct which just calls 2 4x4 mmx idcts. Check 2 
blocks at a time, and call either 8x4 dc, 8x4 idct, or 4x4 of each. dc 
uses packed bytes, so it can handle up to width8 in mmx at no speed loss, 
which might make up for the extra branch.

--Loren Merritt




More information about the ffmpeg-devel mailing list