[FFmpeg-devel] [PATCH] correct MPEG video parser not to return incomplete frames

Sun Sep 6 17:33:34 CEST 2009

Michael Niedermayer wrote:
> [...]
>>> Also the mpeg demuxer returns random trash prior to the first frame, this 
>>> IS
>>> a bug in your seeking code. Fix it please! And before you start arguing 
>>> about
>>> how unlikely it would cause a problem in mpeg video, it will cause 
>>> problems
>>> in mpeg audio as well.
>>>   
>>>       
>> I don't understand what random trash is returned from MPEG demuxer prior to 
>> first frame. Also, I don't understand why the seeking code should be 
>> causing this, since it positions the file to start reading at the packet 
>> with start of a frame. Old seeking code didn't do it differently...
>>
>> How can I reproduce the problem? With which sample file?
>>     
>
> any mpeg-ps file created with ffmpeg
>   

Okaaay... My seeking patches were only applied to mpeg-TS (transport 
stream). I did no change to mpeg-PS (program stream) yet. So it's 
DEFINITELY not a bug in my seeking code, as it's not even active there! 
I wanted to fix the parser first, before activating it there.

> [...]
> you seem to belive that you can cut an ES stream at any byte position and the
> decoder can even in theory pick up decoding at the first complete frame while
> discarding previous frames.
> for mpeg video that does work in general though not in all corner cases.
> for mpeg audio it does not work, there is nothing preventing the startcode
> from occuring inside a frame (unless i am too tired and silly but i dont
> think so ...)
>   

Actually, I don't believe that. MPEG format is unfortunately brain-dead 
enough not to support resync at first complete frame in 100% of the 
cases. So in theory, it doesn' work. It just seems to work in practice 
with streams I have at hand.

But don't get me wrong, I'm not arguing with you that the parser can be 
left untouched. I believe your proposal with 
AVStream.initial_bytes_to_discard is actually a good way to go and will 
make the complete solution work 100% (hopefully).

> To make it even more clear what the problem is, look at this example, 
> the seeking code seeks to packet 5, with the expectation that the decoder
> will decode the first frame in there but there is a startcode emulation
> in the previous partial frame causing the decoder not only to return noise
> but also to miss the first complete frame. The only way for the decoder to
> avoid this is to drop the first few frames but if it does, it starts decoding
> later than when the seeking code expects.
> Its the seeking code that should start a little earlier and discard data
> to make sure synchronization of frame boundaries did happen
>   

OK. But the question is, how much earlier? Theoretically, we could have 
such a bad file, where we can possibly never synchronize, or am I 
mistaken? I'll have to look in the standard for details.

> its what i suggested already previously for fixing the video corner cases
>   

To which thread/post are you referring?

>>> The goal of the seeking code is to be exact, if now the first frame can be
>>> invalid without this being detectable the decoder or parser would have to
>>> discard it and this would break exactness
>>>   
>>>       
>> I agree with you. But again, the parser has to somehow detect the broken 
>> non-frame. And you disagree with detection of non-frame by detecting 
>> missing picture start code and slice start code (concretely, for MPEG 
>> video). So how am I supposed to detect such non-frame, so I can initialize 
>> AVStream.initial_bytes_to_discard?
>>     
>
> you start a frame or 2 earlier and throw them away, with each the probability
> of being off drops
>   

This basically answers my question above. So you just want to reduce the 
probability. I.e., I could do it in this way in the 
ff_gen_syncpoint_search routine:

    * Back off to position X.
    * Start reading frames from this position and for each stream count
      number of parsed frames.
    * When at least 2 frames have been read, only then actually consider
      the frame for further processing (keyframe detection, timestamp
      comparison and the like).
    * Exception: We are already at position 0 in the file (then, it's
      equivalent starting playing from the beginning, so no frames
      should be ignored).

Further, the parser needs to be enhanced to somehow store in-packet 
position of the frame, so AVStream.initial_bytes_to_discard can be 
implemented. I suppose, frame start offset in packet can be handled in 
similar manner as cur_frame_pos. Right?

>
>   
>>> [...]
>>>   
>>>       
>>>>> [...]
>>>>> IIRC our parser misassociates a timestamp in the partial frame case, 
>>>>> this
>>>>> has to be fixed _first_. I am against adding new features on top of 
>>>>> known
>>>>> bugs.
>>>>>         
>>>>>           
>>>> I looked at the code in mpegvideo_parse() and ff_mpeg1_find_frame_end(). 
>>>> To me, it seems like ff_mpeg1_find_frame_end() misassociates the 
>>>> timestamp. Namely, the timestamp is fetched there, but actually the 
>>>> _previous_ frame (or in this case, a non-frame) is returned from the 
>>>> parsing routine.
>>>>
>>>> I'm not sure how to fix this, though. Possibly, instead of fetching the 
>>>> timestamp in ff_mpeg1_find_frame_end(), it should be done in this way:
>>>>
>>>>     
>>>>         
>>>   
>>>       
>>>>    * If the frame starts at offset 0 in the PES packet AND we don't
>>>>      have anything in the buffer, then fetch the timestamp.
>>>>     
>>>>         
>>> sounds like another one of your not always working hacks
>>>   
>>>       
>> Grrrr....
>>
>> Man, I'm trying to help! Instead of bashing, make a proposal how to do it!
>>     
>
> iam not sure how to implement all the stuff down to the finest detail without
> new issues poping up, but iam sure you take this too serious ...
>
> the first partial frame has no pic startcode, it cant otherwise it would be
> the correct target for the timestamp so  why is our parser associateing 
> the timestamo with it? ive not had time to look at the actual code more
> carefully sadly ...
> The timestamp association rules are quite clear in mpeg, the parser really
> should not need any black magic to get it right even if it doesnt know if the
> first part is a complete frame or not.
>   

As I understand it, if a packet contains some data with a start code in 
there, the data before startcode is part of previous frame and the frame 
with its data starting at the first startcode should get the timestamp 
specified in this packet.

MPEG video parser fetches the timestamp when finding the new start code. 
Data before the start code are added to the buffer containing previous 
frame. This previous frame is returned in this call of the parser, since 
it has been completely read. But the parser's timestamp is already 
fetched. So the previous frame will get the timestamp. Next frame just 
beginning in the packet will not get the correct timestamp.

If the start code is at position 0 in the packet, then the frame will 
get the correct timestamp.

I suppose, timestamp fetching must be instead postponed by remembering 
the position of the start code in the file and fetching the timestamp 
when the buffer is completed instead. I.e., instead of fetching the 
timestamp in ff_mpeg1_find_frame_end() we'd just store the position on 
parser context there and fetch the timestamp in mpegvideo_parse() 
associated with this position just before returning the new frame back. 
Would you agree with this?

I'm just unsure, how long the association of timestamps with positions 
is kept in lavf. Possibly, this buffer overflows before the packet is 
completely constructed, so the timestamp has to be fetched earlier (in 
the next parser call after returning previous frame).

My first patch for mpegvideo_parser.c was incorrect, since it only 
corrects associating the first timestamp of the first frame by 
discarding the non-frame at the beginning of the packet. But the 
remaining frames will also get wrong timestamps, if I see it correctly.

>
>   
>> I'm again at the point of simply giving up and investing my limited time 
>> into something else, where people value the contribution a bit more...
>>     
>
> your contribution is valued but implementing this is not a trivial task and
> writing a functioning implementation does take time, you seem to expect that
> a quick implementation would fully work
>   

No. I'm just expecting constructive criticism instead of bashing.

I simply find comments like "sounds like another one of your not always 
working hacks" or "is complete nonsense" insulting and inappropriate, 
especially if no constructive criticism follows.

Regards,

Ivan