[FFmpeg-devel] hardware aided video decoding

Fri Jul 6 14:13:54 CEST 2007

On Fri, 6 Jul 2007 11:40:50 +0200
Michael Niedermayer <michaelni at gmx.at> wrote:

> implement the whole decoder on the card :)

If possible yes. But i doubt we have enough real estate on
the FPGA to build a graphics card _and_ a video decoder.

> for mpeg1/2/4 bitstream parsing (vlc decoding and related stuff) takes
> 1/3 of the cpu time last time i checked so gains with doing that on
> the CPU and transferring to the card would be limited, also h.264
> has significantly more complex bitstream parsing so i would guess
> the gains are even smaller but instead of guessing i would suggest
> that you try a profiler to get some exact awnsers (dont forget to 
> disable all inlining ...)

That's my intention, but first i have to know where to start,
implement a few things, try how it works etc pp.
And of course i need the hardware to run it on :)

And as Luca suggested, my idea was to start at the back,
first try MC, then iDCT, then if still some space is left
go to VLC/RLE and bitstream decoding.

On the other hand, i would like to keep it as generic
as possible so that most 8x8 DCT based codecs could be
accelerated, which would mean that the software player
would have to take the bitstream apart, convert the
data into something the card can digest and feed it.

> now if we look at just mpeg1/2/4 and the case that you dont want
> to implement the whole decoder on the card ...
> then the most obvious things to do are:
> 
> do the RLE + zigzag/alt scan decoding of coeffs and the IDCT on the card
> if you do just the IDCT on the card then you have to transfer 3+ times
> the data from the cpu to the card as IDCT coeffs are 16bit and there
> are as many as pixels, if you do the RLE & zigzag stuff on the card too
> then there would be significantly less data be transmitted as 95% or
> so of the coeffs are 0 and as the coeffs are stored as vlc coded 
> zero run + sign + level + last_bit in the bitstream

Why would you start at RLE and zigzag? 

> the next obvious step is to the motion compensation on the card too
> for mpeg1/2 and simple profile mpeg4 this should be easy
> mpeg4 ASP adds gmc/qpel which is much more complex
> 
> note! if you do not do MC on the card and the result of IDCT +
> user provided MC frame ends in video memory then the CPU doing MC
> of the next frame has to read from video mem somehow

That's why i would rather start at MC than at RLE

> now h.264 does not contain anything shareable with mpeg1/2/4
> both idct and MC is different

How much different are they? Can it be abstracted enough
so that a common iDCT and MC could be used for both?

> also for h.264 doing just IDCT is likely not going to work, that is
> having intra prediction done on the cpu which needs to read from the
> previous 4x4 IDCT result is just going to be a nightmare

Do i understand you correctly, that the IDCT results depend on
the results of the previous block? 
(ok, i have to read some h.264 docu)

> and idct+mc+intra preiction would still require the cpu to read and write the
> whole frame to apply the loop filter ...
> 
> for VC1 ask kostya ...

Earth to Konstantin, do you read me? :-)

> > Yes, i had a look at the few hardware h.264 decoders around,
> > but those seem all to be build around a CPU or DSP core with
> > a few additional special instructions needed for decoding.
> 
> yes, put a CPU and DSP on the card that should do too :)
> and dont forget adding special instructions for CABAC and MC

Yes, it should do but 1) it's very expensive and 2) uses a lot
of power.

			Attila Kinali

-- 
Praised are the Fountains of Shelieth, the silver harp of the waters,
But blest in my name forever this stream that stanched my thirst!
                         -- Deed of Morred