[FFmpeg-devel] [PATCH 01/N] RV30/40 Decoder - Core

Sun Dec 9 08:46:22 CET 2007

Just some comments on implementation for now.

On Sun, Dec 09, 2007 at 12:00:04AM +0100, Michael Niedermayer wrote:
> On Sat, Dec 08, 2007 at 12:10:17PM +0200, Kostya wrote:
> > On Sat, Dec 08, 2007 at 05:32:07AM +0100, Michael Niedermayer wrote:
> > > On Fri, Dec 07, 2007 at 09:04:07AM +0200, Kostya wrote:
> > [...]
> > > > > please check if the following are identical (+-1 doesnt count as
> > > > > different)
> > > > > 
> > > > > svq3_get_ue_golomb() and 
> > > > > svq3_get_se_golomb()
> > > > > 
> > > > > and if they are identical post START/STOP_TIMER scores
> > > > > also id like to see START/STOP_TIMER scores in that case for the
> > > > > original vlc based code
> > > > > 
> > > > > the fastest should be placed in golomb.c/h
> > > > > and used
> > > > 
> > > > I don't know when I can do benchmarks but switched to svq3_get_[su]e_golomb()
> > > > anyway.
> > > 
> > > ok, just dont forget about the benchmarks please
> > 
> > Hmm, I don't have large and long enough clip to show timer values.
> 
> no? it shows nothing? just a blank screen? or all have the same value? but
> then you can show them ...
> 
> you can also put the START/STOP_TIMER over a block or mb decode function
> or over the for loop in svq3.c after
> /* decode prediction codes for luma blocks */
> that contains always 8 svq3_get_ue_golomb()
> 
> 
> > I guess it is fast enough. For the reference, rv34_inv_transform() takes 50 dezicycles.
> 
> its not the question if its fast enough i want to know which is the fastest

RV30 uses 8 of such codes for intra types and 2 for motion vectors.
I tested it with rv34_decode_mb_header() on VC-RV30/simpsons-clip.rm
Current implementation
1111 dezicycles in rv34_decode_mb_header, 1048018 runs, 558 skips
My implementation
1126 dezicycles in rv34_decode_mb_header, 1047952 runs, 624 skips

> >  
> > [...]
> > > [...]
> > > > /** essential slice information */
> > > > typedef struct SliceInfo{
> > > >     int type;              ///< slice type (intra, inter)
> > > >     int size;              ///< size of the slice in bits
> > > >     int quant;             ///< quantizer used for this slice
> > > >     int vlc_set;           ///< VLCs used for this slice
> > > >     int start, end;        ///< start and end macroblocks of the slice
> > > >     int width;             ///< coded width
> > > >     int height;            ///< coded height
> > > > }SliceInfo;
> > > > 
> > > > /** decoder context */
> > > > typedef struct RV34DecContext{
> > > >     MpegEncContext s;
> > > >     int8_t *intra_types_hist;///< old block types, used for prediction
> > > >     int8_t *intra_types;     ///< block types
> > > >     int block_start;         ///< start of slice in blocks
> > > >     uint8_t *luma_dc_quant_i;///< luma subblock DC quantizer for intraframes
> > > >     uint8_t *luma_dc_quant_p;///< luma subblock DC quantizer for interframes
> > > > 
> > > >     RV34VLC *cur_vlcs;       ///< VLC set used for current frame decoding
> > > >     int bits;                ///< slice size in bits
> > > >     H264PredContext h;       ///< functions for 4x4 and 16x16 intra block prediction
> > > >     SliceInfo si;            ///< current slice information
> > > >     uint8_t *slice_data;     ///< saved slice data
> > > > 
> > > >     int *mb_type;            ///< internal macroblock types
> > > >     int block_type;          ///< current block type
> > > >     int luma_vlc;            ///< which VLC set will be used for decoding of luma blocks
> > > >     int chroma_vlc;          ///< which VLC set will be used for decoding of chroma blocks
> > > >     int is16;                ///< current block has additional 16x16 specific features or not
> > > >     int dmv[4][2];           ///< differential motion vectors for the current macroblock
> > > > 
> > > >     int rv30;                ///< indicates which RV variasnt is currently decoded
> > > >     int rpr;                 ///< one field size in RV30 slice header
> > > > 
> > > >     int avail[4];            ///< whether left, top, top rights and top left MBs are available
> > > > 
> > > >     int (*parse_slice_header)(struct RV34DecContext *r, GetBitContext *gb, SliceInfo *si);
> > > >     int (*decode_mb_info)(struct RV34DecContext *r);
> > > >     int (*decode_intra_types)(struct RV34DecContext *r, GetBitContext *gb, int8_t *dst);
> > > >     void (*loop_filter)(struct RV34DecContext *r);
> > > > }RV34DecContext;
> > > 
> > > SliceInfo.size  is redundant with bits
> > > SliceInfo.quant with qscale
> > > SliceInfo.start with block_start
> > 
> > dropped size, and start is used in calculations.
> 
> block_start=start
> both are used, but they have the same value you can drop one if not
> id like to hear what the difference between the 2 is
> 
> 
> > Maybe it will be used in parallel decoding as well.
> 
> i thought you said its fast enough? 

good point 

[...]
> >     if(!has_C){
> >         if(!has_B || (!has_A && !r->rv30)){
> >             C[0] = A[0];
> >             C[1] = A[1];
> >         }else{
> >             C[0] = s->current_picture_ptr->motion_val[0][mv_pos-s->b8_stride-1][0];
> >             C[1] = s->current_picture_ptr->motion_val[0][mv_pos-s->b8_stride-1][1];
> >         }
> >     }
> 
> this looks suspect, i have my doubts about it being correct
> first for a 16x8 block at the left border it would set C from uninitalized
> memory, possibly even segfaulting
> second it would use previous slices for prediction
> 
> but then rv is crap so i guess your code is correct ...

It checks r->avail before reading from it.
And I agree with the statement "rv is crap".

[...]
> 
> 
> [...]
> >     if(!r->avail[0])
> >         has_A = 0;
> >     else{
> >         int A_type = r->mb_type[mb_pos - 1];
> >         has_A = A_type == RV34_MB_B_BIDIR || A_type == (dir ? RV34_MB_B_BACKWARD : RV34_MB_B_FORWARD);
> > 
> >         if(has_A){
> >             A[0] = s->current_picture_ptr->motion_val[dir][mv_pos - 1][0];
> >             A[1] = s->current_picture_ptr->motion_val[dir][mv_pos - 1][1];
> >         }
> >     }
> >     if(!r->avail[1]){
> >         has_B = 0;
> >     }else{
> >         int B_type = r->mb_type[mb_pos - s->mb_stride];
> >         has_B = B_type == RV34_MB_B_BIDIR || B_type == (dir ? RV34_MB_B_BACKWARD : RV34_MB_B_FORWARD);
> > 
> >         if(has_B){
> >             B[0] = s->current_picture_ptr->motion_val[dir][mv_pos - s->b8_stride][0];
> >             B[1] = s->current_picture_ptr->motion_val[dir][mv_pos - s->b8_stride][1];
> >         }
> >     }
> >     if(r->avail[2]){
> >         int C_type = r->mb_type[mb_pos - s->mb_stride + 1];
> >         has_C = C_type == RV34_MB_B_BIDIR || C_type == (dir ? RV34_MB_B_BACKWARD : RV34_MB_B_FORWARD);
> > 
> >         if(has_C){
> >             C[0] = s->current_picture_ptr->motion_val[dir][mv_pos - s->b8_stride + 1][0];
> >             C[1] = s->current_picture_ptr->motion_val[dir][mv_pos - s->b8_stride + 1][1];
> >         }
> >     }else if(r->avail[3]){
> >         int C_type = r->mb_type[mb_pos - s->mb_stride - 1];
> >         has_C = C_type == RV34_MB_B_BIDIR || C_type == (dir ? RV34_MB_B_BACKWARD : RV34_MB_B_FORWARD);
> > 
> >         if(has_C){
> >             C[0] = s->current_picture_ptr->motion_val[dir][mv_pos - s->b8_stride - 1][0];
> >             C[1] = s->current_picture_ptr->motion_val[dir][mv_pos - s->b8_stride - 1][1];
> >         }
> >     }else
> >         has_C = 0;
> 
> the code above is quadrupled
> also with the normal lavc mb_types the direction would be just a bit and
> wouldnt need long and obfuscated checks

You mean something block_type & MB_TYPE_L0 ? 

> [...]
> > /**
> >  * generic motion compensation function
> >  *
> >  * @param r decoder context
> >  * @param block_type type of the current block
> >  * @param xoff horizontal offset from the start of the current block
> >  * @param yoff vertical offset from the start of the current block
> >  * @param mv_off offset to the motion vector information
> >  * @param width width of the current partition in 8x8 blocks
> >  * @param height height of the current partition in 8x8 blocks
> >  */
> > static inline void rv34_mc(RV34DecContext *r, const int block_type,
> >                           const int xoff, const int yoff, int mv_off,
> >                           const int width, const int height, int dir,
> >                           qpel_mc_func (*qpel_mc)[16],
> >                           h264_chroma_mc_func (*chroma_mc))
> > {
[skipped body]
> > }
> 
> how does this differ from the equivalent functions in h264.c?
> that is mc_dir_part() IIRC

It will also handle 1/3pel for RV30

> [...]
> >         for(j = 0; j < 2; j++){
> >             for(i = 0; i < 2; i++){
> >                 s->current_picture_ptr->motion_val[0][s->mb_x * 2 + s->mb_y * 2 * s->b8_stride + i + j*s->b8_stride][0] = 0;
> >                 s->current_picture_ptr->motion_val[0][s->mb_x * 2 + s->mb_y * 2 * s->b8_stride + i + j*s->b8_stride][1] = 0;
> >             }
> >         }
> 
> fill_rectangle() should be used here, yes please
> move fill_rectangle() into a file from where it can be used, its a very
> usefull function

I guess it should be moved to h264.h. I will try to use it where appropriate.

> >         return 0;
> 
> >     case RV34_MB_SKIP:
> >         if(s->pict_type == P_TYPE){
> >             rv34_pred_mv(r, block_type, 0);
> 
> makes no sense, the predictor is always 0,0

Still, motion vectors need zeroing. Will replace with fill_rectangle() 

> [...]
> >     if(!is16){
> >         for(j = 0; j < 4; j++){
> >             no_left = !r->avail[0];
> >             for(i = 0; i < 4; i++, cbp >>= 1, Y += 4){
> >                 no_topright = no_up || (i==3 && j) || (i==3 && !j && (s->mb_x-1) == s->mb_width);
> >                 rv34_pred_4x4_block(r, Y, s->linesize, ittrans[intra_types[i]], no_up, no_left, i || (j==3), no_topright);
> >                 no_left = 0;
> >                 if(cbp & 1)
> >                     rv34_add_4x4_block(Y, s->linesize, s->block[(i>>1)+(j&2)], (i&1)*4+(j&1)*32);
> >             }
> >             no_up = 0;
> >             Y += s->linesize * 4 - 4*4;
> >             intra_types += s->b4_stride;
> >         }
> 
> h264 uses a MUCH simpler system to keep track of block availability
> take a small array and fill it with mb_types/border availability for the
> current MB and then just use block-8 block-1 block-8+1
> 
> with that it would just be
> has_topright= avail[i + 8*j - 7];
> 
> instead of
> no_topright = no_up || (i==3 && j) || (i==3 && !j && (s->mb_x-1) == s->mb_width);

time to adopt it then...    

[...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB