[FFmpeg-devel] [PATCH] H.264/AVCHD interlaced fixes
Sat Jan 31 22:20:06 CET 2009
Laurent Aimar wrote:
> On Sat, Jan 31, 2009, Ivan Schreter wrote:
>> To support key frames, SEI message recovery point is decoded and recovery
>> frame count stored on the context. This is then used to set key frame flag
>> in decoding process.
> You are misusing the SEI recovery point semantic.
> D.2.7 of ITU H264 says:
> So, a frame count >= 0 does not mean that the frame is a key frame BUT that
Yes and no. We already had this discussion with Michael and at last I
agreed to him that "key frames" in the sense of ffmpeg are the frames
where we can restart decoding safely, i.e., the frames having SEI
recovery frame count.
> if you reset the decoder, and start by decoding the picture with the SEI and that
> you throw away the N first decoded frames (outputed in presentation order), then
> for now on, you have acceptable frames for display.
Of course, the user needs to decode at least recovery frame count frames
in order to get pictures of acceptable quality. There even is a variable
prepared for this in AVPacket (convergence_duration), which is supposed
to address this (so the user knows how much to decode).
> Example with a simple GOP structure with standard I/P, you can have
> Type: I P P P I
> recovery_frame_cnt: 0 3 2 1 0
You are mistaken here. SEI recovery point message is generally NOT
present. The reasoning behind having SEI recovery point is different: In
H.264, a P or B frame does not necessarily refer only to frames starting
with last I frame. It can refer also to frames _before_ start of current
GOP, i.e., to older I/P frames. Well, actually, the term "frame" is
incorrect here. H.264 uses term "slice", which can represent anything
from a few macroblocks through a field up to a whole frame. Each slice
can be I/P/B/SI/SP and not all slices in the frame have to be of same type.
Let's take a simple example: I(0) B(-2) B(-1) P(3) B(1) B(2) P(6) B(4)
B(5) I (9) B(7) B(8) P(12) B(10) B(11) ...
Now, let's assume, an object displayed in P(3) got hidden while
displaying I(9) and reappeared in frame B(10). The encoder can either
encode the object anew, or it can simply let B(10) refer to P(3).
However, P(3) is before I(9), so restarting from I(9) would break
display of B(10).
To address this problem, SEI recovery frame cnt is associated with I(9),
telling the decoder it has to decode at least recovery_frame_cnt
(whatever it is) frames, before effects of B(10) will disappear.
exact_match flag specifies, whether it's going to be exact match or
approximate. For approximate, it's still an acceptable picture for
display, but not 1:1 as decoded when starting with SEI with exact match.
For instance, if we had only P frames in sequence, after a while the
picture decoded from any starting point starts looking like the original
picture. So for such purpose one could use approximate match SEI
recovery point. I haven't seen such sample yet, though.
BTW, in the samples I have SEI recovery point with exact match and
recovery_frame_count = 0 is present for I frames in GOP, since the files
I have do not refer to frames before current I frame. Therefore I wrote
first version which will work correctly at least with recovery_frame_cnt
> I think that the only safe case is when recovery_frame_cnt is 0 and
> exact_match_flag is true.
This is the case in my samples.
>> In the parser, it is used to communicate (synthetic)
>> picture type to libavformat's av_read_frame() routine to correctly set key
>> flag and compute missing PTS/DTS timestamps.
> Missing PTS/DTS can only be correctly recreated if the h264 parser implements
> a complete DPB buffer handler.
> I/P/B in h264 just specify the tools available, and not at all the frame
> order(unlike in mpeg2 and mpeg4 part 2).
> For example, you can use B frames instead of P frames without changing the
> order of decoding and presentation, the B simply using past references.
Uhm, again, those are not "frames". For instance, "I-frame" of
interlaced H.264 video can be composed of one "I-slice" in first field
picture and one "P-slice" in second field, which refers to the first
field. This is also the case in AVCHD samples from recent camcorders.
I'm not claiming all cases are handled. I just want to help support
AVCHD camcorders finally.
As for the timestamps, I and P "frames" must declare both PTS/DTS in
H.222.0 stream. B "frames" don't have to (although in my sample files
they do). Correct computation of PTS/DTS is already handled in libavformat.
>> To support interlaced mode, it was needed to collate two field pictures
>> into one buffer to fulfill av_read_frame() contract - i.e., reading one
>> whole frame per call.
> This will limit you to support only a subset of the H264. Nothing prevents
> a H264 stream to first encodes 2 tops and then the two bottoms. (I am not
> sure to have seen such streams).
Yes. With or without my patch, it nevertheless wouldn't work. It can be
added in the future by reordering the fields by frame number in the
parser (I want to eventually implement pairing based on frame number, as
I already wrote). But I cannot really imagine, someone would produce
such a brain-damaged stream...
>> There is one open point, though: Although it seems that top field pictures
>> are always preceding matching bottom field pictures, this is not fixed in
>> the standard. Current implementation relies on this.
> This cannot correctly works, bottom field first video are common.
Give me a sample. Note: only interlaced videos coded as field pictures
(not whole frames) bottom-field-first won't work (currently they don't
work anyway, so what). Videos coded by frame pictures or non-interlaced
will work correctly.
Again, this will be addressed later, via frame matching for field
pictures. But one has to start somewhere, and as I mentioned, I don't
know if I can continue working on it, so something working at least
somehow is better than something not working at all!
More information about the ffmpeg-devel