[FFmpeg-devel] [PATCH] H.264/AVCHD interlaced fixes

Sun Feb 1 11:50:42 CET 2009

Hi,

On Sat, Jan 31, 2009, Ivan Schreter wrote:
> Laurent Aimar wrote:
> > On Sat, Jan 31, 2009, Ivan Schreter wrote:
> >   
> >> To support key frames, SEI message recovery point is decoded and recovery
> >> frame count stored on the context. This is then used to set key frame flag
> >> in decoding process.
> >>     
> >  You are misusing the SEI recovery point semantic.
> > D.2.7 of ITU H264 says:
> > [...]
> >  So, a frame count >= 0 does not mean that the frame is a key frame BUT that
> >   
> Yes and no. We already had this discussion with Michael and at last I 
> agreed to him that "key frames" in the sense of ffmpeg are the frames 
> where we can restart decoding safely, i.e., the frames having SEI 
> recovery frame count.
> 
> > if you reset the decoder, and start by decoding the picture with the SEI and that
> > you throw away the N first decoded frames (outputed in presentation order), then
> > for now on, you have acceptable frames for display.
> >   
> Of course, the user needs to decode at least recovery frame count frames 
> in order to get pictures of acceptable quality.
 Not really drop N decoded frames but drop N outputed frames by the decoder (which
is not the same with reordering).

> There even is a variable 
> prepared for this in AVPacket (convergence_duration), which is supposed 
> to address this (so the user knows how much to decode).
> 
> > Example with a simple GOP structure with standard I/P, you can have
> > Type:               I P P P I
> > recovery_frame_cnt: 0 3 2 1 0
> >   
> You are mistaken here. SEI recovery point message is generally NOT 
> present.
 Well you are hoping that, but maybe it is true for streams in the wild.

> The reasoning behind having SEI recovery point is different: In 
> H.264, a P or B frame does not necessarily refer only to frames starting 
> with last I frame. It can refer also to frames _before_ start of current 
> GOP, i.e., to older I/P frames. Well, actually, the term "frame" is 
> incorrect here. H.264 uses term "slice", which can represent anything 
> from a few macroblocks through a field up to a whole frame. Each slice 
> can be I/P/B/SI/SP and not all slices in the frame have to be of same type.
> 
> Let's take a simple example: I(0) B(-2) B(-1) P(3) B(1) B(2) P(6) B(4) 
> B(5) I (9) B(7) B(8) P(12) B(10) B(11)  ...
> Now, let's assume, an object displayed in P(3) got hidden while 
> displaying I(9) and reappeared in frame B(10). The encoder can either 
> encode the object anew, or it can simply let B(10) refer to P(3). 
> However, P(3) is before I(9), so restarting from I(9) would break 
> display of B(10).
> 
> To address this problem, SEI recovery frame cnt is associated with I(9), 
> telling the decoder it has to decode at least recovery_frame_cnt 
> (whatever it is) frames, before effects of B(10) will disappear.
> 
> exact_match flag specifies, whether it's going to be exact match or 
> approximate. For approximate, it's still an acceptable picture for 
> display, but not 1:1 as decoded when starting with SEI with exact match. 
> For instance, if we had only P frames in sequence, after a while the 
> picture decoded from any starting point starts looking like the original 
> picture. So for such purpose one could use approximate match SEI 
> recovery point. I haven't seen such sample yet, though.
> 
> BTW, in the samples I have SEI recovery point with exact match and 
> recovery_frame_count = 0 is present for I frames in GOP, since the files 
> I have do not refer to frames before current I frame. Therefore I wrote 
> first version which will work correctly at least with recovery_frame_cnt 
> == 0.
> 
> > I think that the only safe case is when recovery_frame_cnt is 0 and
> > exact_match_flag is true.
> >   
> This is the case in my samples.
> 
> >> In the parser, it is used to communicate (synthetic)
> >> picture type to libavformat's av_read_frame() routine to correctly set key
> >> flag and compute missing PTS/DTS timestamps.
> >>     
> >  Missing PTS/DTS can only be correctly recreated if the h264 parser implements
> > a complete DPB buffer handler.
> >  I/P/B in h264 just specify the tools available, and not at all the frame
> > order(unlike in mpeg2 and mpeg4 part 2).
> >  For example, you can use B frames instead of P frames without changing the
> > order of decoding and presentation, the B simply using past references.
> >   
> Uhm, again, those are not "frames". For instance, "I-frame" of 
> interlaced H.264 video can be composed of one "I-slice" in first field 
> picture and one "P-slice" in second field, which refers to the first 
> field. This is also the case in AVCHD samples from recent camcorders.
 Sorry, it was a shortcut, I should have said "In the case of a stream for which
every picture is coded as frame using only one type of slice per picture, the type
being X"...
 In this mail, every time I speak about a X frame, it is defined as above.

 But my example stands, the picture type (in a stream like I described) does not
have any relation with timestamp.

> I'm not claiming all cases are handled. I just want to help support 
> AVCHD camcorders finally.
> 
> As for the timestamps, I and P "frames" must declare both PTS/DTS in 
> H.222.0 stream. B "frames" don't have to (although in my sample files 
> they do).
 I don't think so. The only things mandatory are (from memory):
 - a pts at least every 700ms.
 - if the dts is written, the pts must be written too.
 - if a pts is written but not the dts, then dts == pts.

Sor for a GOP of about 500ms, you could have a pts/dts only on the I/key frame.

> Correct computation of PTS/DTS is already handled in libavformat.
 For example, With classic H264 B pyramid, I am not sure the parser does it,
but I have not checked it.

> >> To support interlaced mode, it was needed to collate two field pictures
> >> into one buffer to fulfill av_read_frame() contract - i.e., reading one
> >> whole frame per call.
> >>     
> >  This will limit you to support only a subset of the H264. Nothing prevents
> > a H264 stream to first encodes 2 tops and then the two bottoms. (I am not
> > sure to have seen such streams).
> >   
> Yes. With or without my patch, it nevertheless wouldn't work. It can be 
> added in the future by reordering the fields by frame number in the 
> parser (I want to eventually implement pairing based on frame number, as 
> I already wrote). But I cannot really imagine, someone would produce 
> such a brain-damaged stream...
 Well, for example you could encode every top as a P picture, and every bottom
as B picture. I do not think it is brain-damaged, and this stream will not have
consecutive top and bottom .
 Now again, I do not think such stream exists in the wild. It was just an example.

> >> There is one open point, though: Although it seems that top field pictures
> >> are always preceding matching bottom field pictures, this is not fixed in
> >> the standard. Current implementation relies on this.
> >>     
> >  This cannot correctly works, bottom field first video are common.
> >   
> Give me a sample. Note: only interlaced videos coded as field pictures 
> (not whole frames) bottom-field-first won't work (currently they don't 
> work anyway, so what). Videos coded by frame pictures or non-interlaced 
> will work correctly.
 I will try to find one that I can share. It exits only for SD video (as HD
is always(?) top field first.

Regards,

-- 
fenrir