[FFmpeg-devel] [PATCH] correct MPEG video parser not to return incomplete frames

Mon Aug 17 02:59:06 CEST 2009

On Sun, Aug 16, 2009 at 03:43:46PM +0200, Ivan Schreter wrote:
> Hi Michael,
>
> (I'm keeping whole history, as it's long ago I responded last time in this 
> thread)
>
> Michael Niedermayer wrote:
>> On Sun, Aug 09, 2009 at 11:45:17AM +0200, Ivan Schreter wrote:
>>   
>>> Hi Michael,
>>>
>>> Michael Niedermayer wrote:
>>>     
>>>> On Sun, Aug 02, 2009 at 05:41:00PM +0200, Ivan Schreter wrote:
>>>>         
>>>>> Michael Niedermayer wrote:
>>>>>             
>>>>>> On Sun, Aug 02, 2009 at 05:11:55PM +0200, Ivan Schreter wrote:
>>>>>>                   
>>>>>>> Hi,
>>>>>>>
>>>>>>> Michael Niedermayer wrote:
>>>>>>>                         
>>>>>>>> On Sun, Aug 02, 2009 at 03:58:59PM +0200, Ivan Schreter wrote:
>>>>>>>> [...]
>>>>>>>>                                 
>>>>>>>>> @@ -146,7 +153,12 @@
>>>>>>>>>      /* we have a full frame : we just parse the first few MPEG 
>>>>>>>>> headers
>>>>>>>>>         to have the full timing information. The time take by this
>>>>>>>>>         function should be negligible for uncorrupted streams */
>>>>>>>>> -    mpegvideo_extract_headers(s, avctx, buf, buf_size);
>>>>>>>>> +    if (mpegvideo_extract_headers(s, avctx, buf, buf_size) < 0) {
>>>>>>>>> +        /* garbled frame, ignore (possibly first read after seek) 
>>>>>>>>> */
>>>>>>>>> +        *poutbuf = NULL;
>>>>>>>>> +        *poutbuf_size = 0;
>>>>>>>>> +        return next;
>>>>>>>>> +    }
>>>>>>>>>  #if 0
>>>>>>>>>      printf("pict_type=%d frame_rate=%0.3f repeat_pict=%d\n",
>>>>>>>>>             s->pict_type, (double)avctx->time_base.den / 
>>>>>>>>> avctx->time_base.num, s->repeat_pict);
>>>>>>>>>                                         
>>>>>>>> as already said elsewhere, its not the job of parsers to discard 
>>>>>>>> data
>>>>>>>> it would break decoders that are capable to decode damaged frames, 
>>>>>>>> like
>>>>>>>> our decoder.
>>>>>>>>                                 
>>>>>>> The problem is not in the decoder, but rather parser. Namely, 
>>>>>>> associated data (PTS, DTS, position, etc.) are passed incorrectly.
>>>>>>>                         
>>>>>> your patch _creates_ a new problem for the decoder
>>>>>>
>>>>>>
>>>>>>                   
>>>>>>>> in addition to this your patch is VERY broken in a subtile and hard 
>>>>>>>> to
>>>>>>>> debug way, that is it will detect 99% of the incomplete frames but 
>>>>>>>> not
>>>>>>>> all from what it doesnt detect again most but not all would be fine,
>>>>>>>> that would be a nightmare to debug as it would occur so rarely and
>>>>>>>> require exact seeking to the problematic spot to reproduce ...
>>>>>>>> the issue and that might not be the only one is that there can be
>>>>>>>> things prior to the picture start code that can matter ...
>>>>>>>>                                 
>>>>>>> It is also not the intention to detect 100% of incomplete frames. The 
>>>>>>> intention is only to detect partial frame after the seek, so we don't 
>>>>>>> return nonsense back (which breaks seeking).
>>>>>>>                         
>>>>>> but it does not detect 100% of the partial frames after a seek
>>>>>>                   
>>>>> Actually, it does a good job.
>>>>>             
>>>> it does an incomplete job
>>>>
>>>>
>>>>         
>>>>> If the stream is NOT broken:
>>>>>
>>>>>    * If the buffer for the frame contains both picture start code and
>>>>>      slice, then the frame is complete and PTS/DTS belongs to this 
>>>>> frame.
>>>>>             
>>>> 3rd try
>>>>
>>>> PES pkt0       | PES pkt1                     seq start code | picture 
>>>> start code | slice ..
>>>>
>>>> PES pkt1 contains both but is NOT a complete frame and depending on what 
>>>> is
>>>> in the seq header and the previous seen seq headers the decoder can fail
>>>>         
>>> I know about this. I didn't come up with a solution that would also 
>>> address this and I'm afraid there is no good solution to this.
>>>
>>> OTOH, for real-world files which someone wants to play or edit, I doubt 
>>> the video size and time base will change in one file. And if someone has 
>>> such files, it will be with current implementation even more broken on 
>>> seeking.
>>>
>>> The problem of handling seq start code on seek is just academical - it is 
>>> NOT present in all frames and seeking to a frame which doesn't have seq 
>>> start code might still produce garbled results. The same holds for ext 
>>> start code, which is also parsed by MPEG video parser.
>>>     
>>
>> you may consider it academical but it is not
>> heres another example
>> 1. seek
>> 2. stream copy
>>
>> even with a single width/height this will loose the seq header in the case
>> above rendering part of the file unplayable
>>   
> For 2, I agree. For 1, we anyway don't handle it correctly at all. I'm 
> addressing the problem with seek, actually, but not the problem of seq 
> header.
>
>> and no i dont care if it just happens rarely, it can happen and its not 
>> the
>> correct behavior in the "what the user expects" sense
>>
>>
>> [...]
>>   
>>>>>>> Possibly, the check could be made in a "fresh" parser only, so that 
>>>>>>> we at most lose very first broken frame of a stream and it would 
>>>>>>> still correct the seeking...
>>>>>>>
>>>>>>> Otherwise, how would you do it? The problem:
>>>>>>>                         
>>>>>> i dont know, my first thought would be just to discard the first frame
>>>>>> the parser outputs after seeking no matter if it looks valid or not.
>>>>>> Of course thats just a thought and may have issues as well i dont know
>>>>>>                   
>>>>> That won't work at all (or in your own words, it is VERY broken). In 
>>>>> seek, we reposition the file pointer to the first PES packet containing 
>>>>> data of the key frame.             
>>>> no, we dont, thats maybe what your patches do but they arent approved, 
>>>> given
>>>> that its no big issue to reposition one frame earlier OR keep track of 
>>>> how
>>>> many byetes need to be skiped it doesnt seem that this is an issue
>>>>         
>>> Seeking one frame earlier might work, though I find it being an ugly 
>>> workaround.
>>>
>>> I don't understand what you mean with "keep track of how many bytes need 
>>> to be skipped", but it also sounds like a workaround.
>>>     
>>
>> if you have a PES packet with 3 frames, seeking to the one you really want
>> by skiping 0-2 frames seems to make sense on its own, i wouldnt call it a
>> workaround
>>
>>
>>   
>>>>         
>>>>> This PES packet might contain the rest of previous frame. Now, 
>>>>> read_frame must read the key frame. If you discard the first frame, 
>>>>> either of this could happen:
>>>>>
>>>>>    * If the key frame begins at offset 0 of the PES packet, you discard
>>>>>      the very key frame you wanted to have.
>>>>>    * If the key frame begins at offset >0 of the PES packet, you
>>>>>      discard the invalid frame, BUT your key frame won't carry DTS/PTS
>>>>>      timestamps. Thus, it's possible to decode it, but the user won't
>>>>>      know what frame it is. It is equally bad.
>>>>>
>>>>>             
>>>>         
>>>>> Therefore I believe my fix is a good one, when I add that check for a 
>>>>> fresh parser, so it will be only executed just after seek, in order not 
>>>>> to cause problems when decoding broken streams.
>>>>>             
>>>> as said 3 times alraedy your code does not work reliable so we will have
>>>> to do something else
>>>>         
>>> Yes, but as an iterim solution it works.
>>>     
>>
>> for h264 the interim solution does not seem to work (raw h264 amongth 
>> others)
>>
>>
>>   
>>>>> Current situation is bad, since seeking of MPEG streams is completely 
>>>>> broken (not only due to this problem, but I'm working on the other 
>>>>> problem as well).
>>>>>             
>>>> yes i know and agree of course about that one
>>>>         
>>> Then help me please find a correct solution.
>>>     
>>
>> i really dont see where the problem comes from
>> you do serach for a keyframe somehow already, so you must at that point 
>> know
>> where in the PES packet the frame is, skip whats before (minus seq headers
>> and such) from the parser output, that seems to be all thats needed 
>> assuming
>> the keyframe choosen is the correct one in the new seeking semantics
>> but maybe i miss something ...
>>   
> That's exactly what I do!
>
> But the problem is:
>
>    * I find the PES packet where my frame starts and return position of
>      this frame.
>    * I reposition the file pointer to this PES packet.
>    * Parser is reset.
>    * First frame read from the parser is the remainder of the previous
>      frame (starting somewhere hundred PES packets earlier).
>    * Parser assigns PTS/DTS of the frame actually starting in this PES
>      packet to this broken frame with remainder of the previous frame.
>    * Second frame read from the parser is the key frame, but timing
>      info is wrong - PTS/DTS contains AV_NOPS_VALUE and cannot be
>      computed correctly.
>
> Strictly speaking, the frame stream coming out of the parser is as 
> decodable as ever, but since timestamps are messed up, it's impossible to 
> correctly play it from this position.
>
> Unfortunately, I don't have any chance to detect that whatever I find 
> before first picture start code in the PES packet is not a frame, except by 
> checking whether there is a picture start code and a slice in there.
>
> PTS/DTS in the PES packet is associated by lavf/utils.c with first frame 
> returned, which starts in this PES packet. Since our partial frame "starts" 
> in this PES packet, PTS/DTS is associated with it instead of with the 
> actually starting frame in this PES packet.
>
> So, when I simply skip the broken frame at the beginning, first frame read 
> by the parser will be the frame which really starts in the PES packet and 
> it gets the proper timing info associated.
>
> When decoding the file from the beginning, we find the proper beginning of 
> the frame whose last part is in my PES packet and thus don't steal PTS/DTS 
> of the frame really starting in the PES packet.
>
> I hope I explained it somewhat clearly now...

i understand your explanation but i dont understand the problem

the parser, after an reset misassociates the first timestamp to something
that is not a frame and i guess it also is not detect as a frame.
That is a bug, it has to be fixed (and not by discarding this partial
frame)

the second part is that after seeking you know where in a PES packet your
frame starts still the output leaving lavf contains things prior to that
frame, this is a bug in your code, please fix this.
If there are 3 keyframes in a PES packet, it should be possible to seek
to any of them and receive exactly that keyframe and not always the first
of the PES packet. To me this seems nothing except a single variable that
says how much to skip before output.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In fact, the RIAA has been known to suggest that students drop out
of college or go to community college in order to be able to afford
settlements. -- The RIAA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090817/60eb3477/attachment.pgp>