[FFmpeg-devel] [PATCH] AVCHD/H.264 parser: determination of frame type, question about timestamps

Mon Jan 26 08:42:17 CET 2009

Hi,

Michael Niedermayer wrote:
>> You contradict yourself.
>>     
>
> no
>
>   
>> On the one side you say get_ue_golomb_31() must 
>> be enough (yes, it is, since legal values for slice_type are 0..9), OTOH 
>> you want to check it for non-negative (_although_ legal values are in 
>> range 0..9, i.e., non-negative).
>>     
>
> yes
> we cannot just set pict_type to some random value, this could have
> unpredictable effects, people can feed all kinds of trash into the
> decoder, it must not give them a root shell ;)
>   
Correct. As architect/developer of an enterprise database system kernel 
I know very well what you mean. So don't worry too much about it. I'm 
dropping this topic for now.
> [...]
>   
>> For instance, imagine a video editing application (like kdenlive) needs 
>> to seek to frame X. There is a SEI recovery point at X-3 with 
>> convergence_duration == 5. How can the application now get to frame X? 
>> It asks to seek to X, which will position the stream at key frame at 
>> position X-3.
>>     
>
> no it will not do this it will position the stream farther to the left
> to ensure that X is after the convergence point when the application
> asks for <=X if it asks just for something randomly around X it might
> end up at X-3 but then that was what it asked for
>   
After sending the mail I got the same idea. So in the example case, 
you'd position the stream to the recovery point frame with distance from 
X >= convergence_duration, so the application can then decode up to X 
and have frame X correctly decoded.

OK, I'll try to write an appropriate patch for it. That's not in 
decoder, though, but in libavformat, then.

To summarize:

    * Decode SEI recovery point
    * A frame with SEI recovery point is a key frame (IDR frames are
      also key frames)
    * av_seek_frame() will position to recovery point with distance >=
      convergence_duration before X

Any pointers on how to implement the last one most effectively? I didn't 
study libavformat too much yet.

>> [...]
>>>> which is IMHO broken. Of course, we could communicate with it by setting 
>>>> pict_type to FF_I_TYPE for keyframes only (IDR frames and frames after 
>>>> recovery point), for other frames containing I- and P-slices to 
>>>> FF_P_TYPE and for B-frames to FF_B_TYPE. But I don't like it much. Any 
>>>> idea, how to do it correctly without the need to touch other codecs?
>>>>     
>>>>         
>>> pict_type from the parse context should likely be split in pict_type and
>>> keyframe
>>>   
>>>       
>> Actually, we already have a flag field on AVPacket coming from the 
>> parser, but compute_pkt_fields() doesn't believe it and resets it based 
>> on pict_type from parse context instead.
>>     
>
> the parser works with char* buffers not AVPackets
>   
Yes, sorry, I was referring to av_read_frame(), which returns AVPacket.

However, why do we need pict_type at all? I/P/B-frames are 
MPEG-specific. Actually, I believe we should change it and return two 
flags - delayed and key frame. This would make it IMHO cleaner and more 
general than testing for pict_type.
>>>> Further, I found out that av_read_frame() sometimes reads only one field 
>>>> of a frame (especially in interlaced mpegts AVCHD files), which confuses 
>>>> the rest of ffmpeg. I suppose, this is a bug in h264_parse(), which 
>>>> returns a single field instead of whole frame (when a frame is coded as 
>>>> two fields in two pictures), but I didn't find a way yet, how to address 
>>>> the problem. Any idea?
>>>>     
>>>>         
>>> keep in mind that AFIAK h264 allows unpaired fields that is
>>> 1 frame, 1 field, 1 frame
>>> I have no samples that actually does this so we might get away with ignoring
>>> it but if we choose to not support it we should have some argument why
>>> it would be too complex ..
>>>   
>>>       
>> Umm... You are referring to D.2.2 (picture timing semantics). 
>>     
>
> no, i was not refering to that
>   
But?
> [...]
>> Of course, we still have the problem of frame doubling/tripling and 
>> having 3 fields per picture, eventually with one of them repeated 
>> (pic_struct codes 5-8).
>>     
>
> no we do not have a problem with this, we do not and never will duplicate
> anything we just export the information and the app can do with it what it
> wants, thats also exactly what we do in mpeg2
>   
We don't export the information, do we? But you are right, with frame 
doubling and tripling there is no problem - the application code will 
handle it by itself anyway by displaying last frame longer. As for 
having 3 fields per picture, I'm not so sure it will currently work. At 
least timing will be wrong. Consider following:

We have a stream with pictures containing (T1/B1/T2==T1), (B2/T3/B3==B2) 
fields. That's two H.264 pictures, but 3 frames. Each av_read_frame() 
should return a packte containing exactly single frame. But we have just 
2 packets, which need to be returned in 3 calls to av_read_frame(), 
according to API. Further, the DTS must be set correctly as well for the 
three AVPackets in order to get the timing correct. How do you want to 
handle this?

And as already mentioned, the case with (T1), (B1), (T2), (B2), we are 
returning 4 packets via av_read_frame() for 2 frames, which is against 
API. How to handle this? My idea was delaying return from h264_parse, 
until second field also parsed or handling this in libavformat 
appropriately by parsing further (probably cleaner, but we need 
additional flag how to report it to libavformat - something like pushing 
pic_struct to libavformat).
>> FYI, I started adding some additional constants to the h264 code for 
>> other SEI messages and for parsing SEI recovery point as the beginning, 
>> as this will be needed in all cases.
>>     
>
> good, and the patch for it?
>   
I just started. No patch yet. I first want to discuss how it should work 
before implementing too much code which will be eventually thrown out.

Regards,

Ivan