No subject

bogus at bogus at
Sun May 31 23:21:24 CEST 2009

decoding and rendering into two threads.
The threads save / restore the macroblock information into / from a data
structure which is called H264mb.
You introduced two of these which are read / written alternately, hence you
get a "pipelined" design with the two stages mentioned above. Please correc=
me, if I am wrong (and if you want to invest time in reading your old patch
again :)

Concerning your performance remarks I am wondering, where the bad behaviour
comes from. Probably the high utilization comes from the frequent memory
operations when saving / restoring the macroblocks!? Maybe this will
improve, when one could map entropy decoding & rendering of one set of
macroblocks to one and the same core? From my understanding in the actual
code the assigned core may also change for one set of macroblocks, probably
lowering the cache hits? But I'm not familiar with thread creation and
avctlx->execute syntax at all.. so excuse me, when I'm talking rubbish.

I'm trying to integrate your patch into an up-to-date version of ffmpeg (fo=
now I'm getting a segmentation fault, when the threads are created :). When
I managed to do this I think I will try to parallelize the rendering of the
macroblocks as I explained in my first mail.

Thanks for your help. Maybe I'll bother you with questions again in some
days :)

More information about the ffmpeg-devel mailing list