[FFmpeg-devel] multithreaded H.264 / macroblock layer

Fri Jun 12 17:34:57 CEST 2009

 Hello!

I am new to the mailing list..
Currently I am working on my diploma thesis about parallelization of an
H.264 decoder (wanted to choose ffmpeg).
I was reading the spec for some time and a number of papers. In many papers
macroblock level parallelism is proposed.

In my understanding, the scalability of a picture based parallelization
approach is very poor, when pictures are processed "as a whole".
However one could start processing the macroblocks of a new picture, while
the last one is being decoded (provided motion vectors are within a range
which already has been decoded in the reference image). One problem I see
with this solution, is that the management of the reference picture lists is
getting more complicated when different pictures are decoded at a time
(talking in general now, don't know enough yet about ffmpegs
implementation). Maybe someone could tell me, how complex this would really
be?

Since I have practically no experience in C programming (hardly knowing more
than the theoretics), I want to try something which can be implemented
straightforward without major changes to the code. Therefore I also wanted
to go for macroblock level parallelism.

The idea is to decode the macroblocks of a picture in a diagonal wavefront
from top-left to bottom-right. In that way macroblocks on a diagonal line
are independent and can be decoded in parallel (diagonal = not exactly 45
degrees, due to top-right MB). I am aware of the fact, that entropy decoding
can not be parallelized on macroblock layer. Therefore the first step I
would like to try is to seperate the entropy decoding and the rendering of
the image samples in the function decode_slice() in the file h264.c.

I think the functions which have to be seperated are decode_mb_cabac(h)
(cavlc respectively) and hl_decode_mb(h) which can be found in the for-loop,
that processes all the macroblocks in a raster scan order. After my first
unsuccessful tries to seperate the functions by introducing a seperate loop
for rendering, I noticed that motion and intra prediction data is probably
only stored per macroblock, so that I will have to introduce some arrays for
storing the entropy decoded data of a whole slice before executing the
rendering stage. I was already browsing the different data structures
(namely MpegEncContext and H264Context) but since I do not know much about
the implementation of the H.264 decoder the task of finding the concerned
structures is quite complicated. I'm also wondering by which logical
criteria the data is seperated into MpegEncContext and H264Context, since
macroblock related data can be found in both of them. Do I have to buffer
data from both of these structures?

Maybe somebody could help me with my problem and / or provide me with some
documentation?

Thanks in advance
Martin Brocksch