[FFmpeg-devel] hardware aided video decoding

Fri Jul 6 15:00:30 CEST 2007

On Fri, Jul 06, 2007 at 02:13:54PM +0200, Attila Kinali wrote:
> On Fri, 6 Jul 2007 11:40:50 +0200
> Michael Niedermayer <michaelni at gmx.at> wrote:

[...] 
> 
> Why would you start at RLE and zigzag? 

Most H.26[1-3]-based codecs can be described this way:
1. Read coefficients in form (skip, value)
2. Fill 8x8block from them by doing RLE and dezigzagging
3. Perform IDCT
4. Do motion compensation
5. Perform YUV->RGB
6. Postprocessing

And the real difference is in the first step. IIRC, there is something
about using this as intermediate format in FFmpeg TODO and there was a
patch to losslessly transcode from MSMPEG-4 (old DivX) to MPEG-4 (XviD).
So it is ideal to put this common decoding part to the dedicated decoder.

> > the next obvious step is to the motion compensation on the card too
> > for mpeg1/2 and simple profile mpeg4 this should be easy
> > mpeg4 ASP adds gmc/qpel which is much more complex

That's true but may also be easier to done onboard.

> > note! if you do not do MC on the card and the result of IDCT +
> > user provided MC frame ends in video memory then the CPU doing MC
> > of the next frame has to read from video mem somehow
> 
> That's why i would rather start at MC than at RLE
> 
> > now h.264 does not contain anything shareable with mpeg1/2/4
> > both idct and MC is different
> 
> How much different are they? Can it be abstracted enough
> so that a common iDCT and MC could be used for both?

H.264 (common case) operates on 4x4 blocks. Only MC may be in common.

> > also for h.264 doing just IDCT is likely not going to work, that is
> > having intra prediction done on the cpu which needs to read from the
> > previous 4x4 IDCT result is just going to be a nightmare
> 
> Do i understand you correctly, that the IDCT results depend on
> the results of the previous block? 
> (ok, i have to read some h.264 docu)
> 
> > and idct+mc+intra preiction would still require the cpu to read and write the
> > whole frame to apply the loop filter ...
> > 
> > for VC1 ask kostya ...
> 
> Earth to Konstantin, do you read me? :-)

Yes, I do. Greetings from the Eastern Europe ;-).
Two main differences between VC-1 and MPEG-4 ASP are 4x4/8x4/4x8 transforms
and slightly different motion compensation.
RealVideo 4 on the other hand is closer to H.264 but with different
bitstream format.

And now to wavelets.
I think it's easy to do on video card but will eat a lot of memory.
The principle is the next: read data from one memory location, process and
store at another place in video memory (may I say the word 'shaders'?).
Inplace transform will be slower and less effective.
Also while DCT is standard (almost), wavelet operations may differ
(see Snow for example).

> > > Yes, i had a look at the few hardware h.264 decoders around,
> > > but those seem all to be build around a CPU or DSP core with
> > > a few additional special instructions needed for decoding.
> > 
> > yes, put a CPU and DSP on the card that should do too :)
> > and dont forget adding special instructions for CABAC and MC
> 
> Yes, it should do but 1) it's very expensive and 2) uses a lot
> of power.

IIRC, DSPs are bad on bits parsing so CPU should be added to perform
this operation (with three commands - 'Load data', 'Do CABAC decoding'
and 'Store decoded bit' :-P ). 

> 			Attila Kinali
> 
> -- 
> Praised are the Fountains of Shelieth, the silver harp of the waters,
> But blest in my name forever this stream that stanched my thirst!
>                          -- Deed of Morred