[FFmpeg-devel] Once again: Multithreaded H.264 decoding with ffmpeg?

Alexander Strange astrange
Fri May 30 08:04:42 CEST 2008


On May 30, 2008, at 1:52 AM, Jason Garrett-Glaser wrote:

>>> I have been looking into the h264 code and each piece of H.264
>>> documentation I could get my hands on. And I have the impression  
>>> that
>>> some of the decoding steps (namely residual decoding, deblocking)  
>>> could
>>> be parallelized quite well. But I don't have any idea how much  
>>> time the
>>> individual decoding steps take. Does someone happen to have some
>>> numbers? Or a hint how to measure this myself?
>
> [Profile courtesy of Loren Merritt]
>
> ffh264 svn-r11870 (2008-02-04)
> CPU: Core 2, speed 2400.75 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
> unit mask of 0x00 (Unhalted core cycles) count 100000
> samples  %        symbol name
> 168093    9.2010  decode_mb_cabac
> 165494    9.0587  decode_cabac_residual
> 133817    7.3248  fill_caches
> 115161    6.3036  hl_decode_mb_simple
> 111744    6.1166  h264_#_loop_filter_luma_mmx2
> 101511    5.5565  put_h264_chroma_mc8_mmx
> 88055     4.8199  h264_#_loop_filter_chroma_mmx2
> 72618     3.9749  filter_mb_fast
> 70919     3.8819  get_cabac_noinline

There's been some work since then - Loren wrote SSSE3 MC functions and  
the rest might be a bit better. I'd guess fill_caches and the loop  
filter are more important now; if you want to look at those, it would  
be great, but make sure you're good at assembly first.

(I have some patches in my head that will improve  
decode_cabac_residual, but you'd like me to do frame multithreading  
first, right?)




More information about the ffmpeg-devel mailing list