[FFmpeg-devel] [PATCH] correct MPEG video parser not to return incomplete frames

Michael Niedermayer michaelni
Thu Aug 27 03:17:52 CEST 2009

On Sat, Aug 22, 2009 at 10:07:48AM +0200, Ivan Schreter wrote:
> Hi Michael,
> Michael Niedermayer wrote:
>> [...]
>>>> i really dont see where the problem comes from
>>>> you do serach for a keyframe somehow already, so you must at that point 
>>>> know
>>>> where in the PES packet the frame is, skip whats before (minus seq 
>>>> headers
>>>> and such) from the parser output, that seems to be all thats needed 
>>>> assuming
>>>> the keyframe choosen is the correct one in the new seeking semantics
>>>> but maybe i miss something ...
>>> That's exactly what I do!
>>> But the problem is:
>>>    * I find the PES packet where my frame starts and return position of
>>>      this frame.
>>>    * I reposition the file pointer to this PES packet.
>>>    * Parser is reset.
>>>    * First frame read from the parser is the remainder of the previous
>>>      frame (starting somewhere hundred PES packets earlier).
>>>    * Parser assigns PTS/DTS of the frame actually starting in this PES
>>>      packet to this broken frame with remainder of the previous frame.
>>>    * Second frame read from the parser is the key frame, but timing
>>>      info is wrong - PTS/DTS contains AV_NOPS_VALUE and cannot be
>>>      computed correctly.
>>> Strictly speaking, the frame stream coming out of the parser is as 
>>> decodable as ever, but since timestamps are messed up, it's impossible to 
>>> correctly play it from this position.
>>> Unfortunately, I don't have any chance to detect that whatever I find 
>>> before first picture start code in the PES packet is not a frame, except 
>>> by checking whether there is a picture start code and a slice in there.
>>> PTS/DTS in the PES packet is associated by lavf/utils.c with first frame 
>>> returned, which starts in this PES packet. Since our partial frame 
>>> "starts" in this PES packet, PTS/DTS is associated with it instead of 
>>> with the actually starting frame in this PES packet.
>>> So, when I simply skip the broken frame at the beginning, first frame 
>>> read by the parser will be the frame which really starts in the PES 
>>> packet and it gets the proper timing info associated.
>>> When decoding the file from the beginning, we find the proper beginning 
>>> of the frame whose last part is in my PES packet and thus don't steal 
>>> PTS/DTS of the frame really starting in the PES packet.
>>> I hope I explained it somewhat clearly now...
>> i understand your explanation but i dont understand the problem
>> the parser, after an reset misassociates the first timestamp to something
>> that is not a frame and i guess it also is not detect as a frame.
>> That is a bug, it has to be fixed (and not by discarding this partial
>> frame)
> Uhm, as you say, the parser misassociates the timestamp to something that 
> is not a frame. It will be then refused by the decoder, since it doesn't 
> contain minimum necessary information (picture start code and slice start 
> code).

> So is something missing picture start code and slice start code in your 
> opinion a partial frame or is it something not a frame?

it can be both, this depends on what else is in there
if you look at mpeg4 (a)sp then its possible to even redundantly store
various things from the picture headers in the slice headers

> If the latter, then the parser doesn't need to return it at all, right? (at 
> least after a seek, stream copy is maybe different issue)

>> the second part is that after seeking you know where in a PES packet your
>> frame starts still the output leaving lavf contains things prior to that
>> frame, this is a bug in your code, please fix this.
> I don't know where in PES packet the frame starts. This information is not 
> propagated out of the parser. The only thing we know is in which PES packet 
> the frame starts and we can also position the file pointer only by PES 
> packet.

> OK, it would be maybe possible to get the additional information about 
> where in PES packet the frame starts and handle it in generic parser. BUT: 

its certainly possible

> This won't work as well. Namely, this would only fix the problem with first 
> stream encountered after a seek. If there is a PES packet for stream 1 
> followed by a PES packet for stream 2, which has exactly the same problem 
> (rest of previous frame without picture start code and slice start code and 
> start of a new frame and contains PTS/DTS), then the problem would simply 
> happen in stream 2.

and that amount is discarded after the parser

>> If there are 3 keyframes in a PES packet, it should be possible to seek
>> to any of them and receive exactly that keyframe and not always the first
>> of the PES packet. To me this seems nothing except a single variable that
>> says how much to skip before output.
> That would be optimal, yes. But again, this single variable would solve 
> this case only, but wouldn't solve the problem of partial frames, at least 
> not in other streams except the stream in very first packet (see above). 

> OTOH, you yourself said in another post that it's no problem to seek a few 
> frames earlier... 

yes, but to me this all looks like just 5 lines of code to handle, and if
so i think it would be nice to have even if it isnt exactly essential

> Another issue is, you could have two PES packets, in 
> first PES packet stream 1 with 3 frames PTS 2, 3 and 4 and in second PES 
> packet 3 frames from stream 2 with PTS 1, 2 and 3. How do you seek to PTS 
> 3? 

> You can't, 

ohh i can :)

> you'd have to keep the information per-stream. 

thats what i meant to suggest

> And even with 
> that, not only single offset, but also number of frames would have to be 
> kept (in case several frames are interleaved), which would make the parser 
> code quite hard to understand (not that it is today easy to understand...).

i dont know what you talk about
if(st->initial_bytes_to_discard >= pkt->size){
    st->initial_bytes_to_discard -= pkt->size
    discard packet

> Therefore, I propose the following:
>    * If the parser is newly-created when parsing starts at the
>      beginning of the file, handle everything as-is.
>    * If the parser is reset after a seek, set a flag to tell it to
>      ignore the first partial non-frame (which doesn't contain picture
>      start code and slice start code), as this anyway can't be decoded
>      at all.
> Would that be OK? Or do you have some different idea? I feel this is a 

IIRC our parser misassociates a timestamp in the partial frame case, this
has to be fixed _first_. I am against adding new features on top of known

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090827/4acae016/attachment.pgp>

More information about the ffmpeg-devel mailing list