[FFmpeg-devel] [PATCH] First shot at AVCHD seeking via new seeking API

Sun Aug 9 10:47:44 CEST 2009

Hi Baptiste,

Baptiste Coudurier wrote:
> [...]
> I think you can check for st->discard and therefore don't need active 
> flag.

True. I'll change it.

>
> >>>> [...]
>>>>>
>>>>> Do term_pos and first_pos need to be per stream ?
>>>> Yes. Reason as above - frames for various streams interleaved in
>>>> packets, so the starting position of frames in various streams don't
>>>> necessarily correlate with their arrival via av_read_frame().
>>>
>>> Scenario is currently for first iteration:
>>> seek at pos
>>> read frames until you find hi keyframes for all streams.
>>> The Lowest pos is the first keyframe found. It may not be in order but
>>> that doesn't matter at the first iteration.
>>>
>>> On next iterations you have no interest in going past the first pos
>>> since read_frame will and if not must return the same frames in the
>>> same order at the same pos, whatever interleaving there is. If it
>>> doesn't problem is somewhere else and must be fixed.
>>>
>>> So you only need one first_pos for all the streams.
>>> And next iterations will find keyframes before pos. That's why I don't
>>> think you need term_pos nor terminated either.
>> Unfortunately, no. Again, consider this case:
>>
>> pos - 1: 1st PES packet of video frame
>> pos + 0: PES packet containing audio frame
>> pos + 1: 2nd PES packet of video frame
>>
>> now, we start reading from somewhere before pos. read_frame will first
>> return audio frame at pos, where we'd stop reading.
>
> For the first iteration we continue reading until the hi keyframe is 
> found, if I understood correctly.
Yes. But this may be one or more key frames too late, so it still might 
be updated in subsequent iterations.

>
>> But the video frame,  which begins at pos - 1 would not be read, 
>> since  > we need to continue  reading to read the frame with position 
>> pos - 1.
> > This is no bug, but a feature...
>
> Yes, however if lo keyframe is not found, code will seek back before 
> pos again to find the lo keyframe for video.
> And in this second iteration you don't need to read past the previous 
> pos.
Correct per stream.

>
> >>>> [...]
>>>>>
>>>>> If I understood correctly, at first iteration you should find high
>>>>> keyframes for all streams.
>>>> Not necessarily. Normally, yes, but there is a special case, with key
>>>> frame in a stream beginning before rough position and ending after it.
>>>> The frame won't be returned in the first iteration, only possibly a
>>>> later key frame. It will only be found in later iteration (or not 
>>>> found
>>>> at all, if seeking against end of the file).
>>>
>>> You are describing the case of the first keyframe here right ?
>>> It will be returned at the next iteration so it's fine and lo will be
>>> updated.
>> I'm describing a case where:
>>
>> pos - 1: 1st PES packet of key frame of stream 1
>> pos + 0: 1st and only PES packet of key frame of stream 0
>> pos + 1: 2nd and last PES packet of key frame of stream 1
>>
>> Generic seeking routine might return position of key frame of stream 0,
>> since it's PTS matches the best. But actually, position pos - 1 is the
>> correct one, since otherwise the decoder would not synchronize correctly
>> at requested timestamp.
>
> Yes, yo have to seek back before pos - 1 to find the lo keyframe, that 
> will be done during the second iteration anyway.
The problem is, if I just remember, I read frames up to pos + 0, when 
reading starting from position <= pos - 1, I'll first read first PES 
packet of key frame of stream 1, which is put into a buffer. Then, I 
read first PES packet of key frame of stream 0, which represents a 
complete frame, and reading stops, because if I keep just the global 
position, I think I've read everything. But actually, I have to continue 
reading second PES packet of key frame of stream 1, which is already 
past pos + 0.

But if I remember that in stream 1 the last known position of a frame 
is, say, pos + 2, I know I have to stop reading at pos + 2 - but only 
for stream 1. For stream 0, I have to terminate reading at pos already. 
Therefore I'm storing termination position per stream, otherwise I'd 
miss some frames.

I hope it's clear now, why I've done it this way.

> [...]
>
> >>>> [...]
>>>>>
>>>>> You can shortcut by checking if user specified a max_pos > ts or
>>>>> min_pos < ts. In this case only found_lo or found_hi matters.
>>>> I'm not sure.
>>>>
>>>> The max_pos/min_pos/max_ts/min_ts don't say "don't give me anything
>>>> outside this range", rather "make sure my decoder output 
>>>> synchronizes in
>>>> this range". I.e., it's perfectly OK to return a position before
>>>> min_pos/min_ts.
>>>
>>> I think, given the parameters name, it exactly says don't give me
>>> anything outside this range, that's why the API has been extended.
>>> So it's not ok to return anything outside this range.
>> According to the API docs:
>>
>> "Seeking will be done so that the point from which all active streams
>> can be presented successfully will be closest to ts and within 
>> min/max_ts."
>>
>> It says, "the point from which all active streams can be presented
>> successfully", i.e., where they synchronize or put differently, where
>> each of them encounters a key frame, must be within min/max_ts, NOT the
>> point where we reposition the file. Therefore, data _before_ min_ts must
>> be taken into account and therefore the current code is correct.
>
> Yes, this is what is in the docs, I said you could shortcut if ts == 
> max_ts or ts == min_ts.
If ts == max_ts, then indeed it's possible to short-cut using lo-frame 
only. But in order to find highest lo-frame, nevertheless I need to 
iterate until I find a frame past ts/max_ts, which is the hi-frame. So 
there's not much gain of doing this optimization.

But you brought me to another idea: Instead of using INT64_MAX for 
position in first iteration, I can use max_ts, so it will effectively 
make exactly the optimization you proposed. I'll change it this way 
(I've already changed it for byte seeking by setting term_pos to maximum 
position, so instead of term_pos we'd have term_ts for both byte seeking 
and timestamp seeking).

For ts == ts_min, it's not possible to optimize, as we can possibly have 
a key frame in stream 0 before ts_min and in stream 1 between ts_min and 
ts_max, so in order to make sure output synchronizes between ts_min and 
ts_max, we need to position the file actually before ts_min.

>
> I think it's more: first correctly decoded (except is AVSEEK_FLAG_ANY 
> is given) frame pts or pos will be within the range.
>
> Explicitly, if you have a keyframe > ts and max_ts == ts or keyframe < 
> min_ts and ts == min_ts it's not ok.
Yes, provided you have a single active stream. Then it's indeed not OK. 
But if n streams are involved, you can have 0 to n-1 streams which have 
keyframe < min_ts (but NOT keyframe > max_ts), in order to synchronize 
output between min_ts and max_ts for _all_ streams.

>
> Users wants accurate seeking.
Exactly. That's why I'm working on it :-)

>
> > [...]
> >
>> +    int64_t term_pos;   ///<  Termination position, where we already 
>> read packets.
>> +    int64_t first_pos;  ///<  First packet position in this iteration.
>> +    int     terminated; ///<  Termination flag.
>
> Can you try to remove these ?
See above, it's IMHO not possible.

> [...]
>> +                    if (!sp->found_hi) {
>> +                        // No high frame exists for this stream
>> +                        sp->found_hi = 1;
>> +                        (*found_hi)++;
>> +                        sp->pts_hi = INT64_MAX;
>> +                        sp->pos_hi = INT64_MAX;
>
> Why do you set pts and pos ?
> Can't you just check if pts_hi is set or not ?
Just defensive coding. I've learned my lessons doing professional system 
programming in last 16 years...

> [...]
>> +        dts = pkt.dts;
>> +        if (pts == AV_NOPTS_VALUE) {
>> +            // Some formats don't provide PTS, only DTS.
>> +            pts = dts;
>> +        }
>
> This is wrong if we have delay.
That's true. Actually, I'm relying PTS will be provided for streams 
having delay. This is also the case for MPEG-PS/TS, but there are some 
formats which don't have delay and only provide DTS, but no PTS. So it's 
again just defensive coding. For MPEG-PS/TS it's not relevant, but if 
someone wants to use the code for a different format, he'd first need to 
find out why it doesn't work. With this fallback, it will work.

Ok, so I'll implement your proposal regarding active flag and the 
optimization for ts == max_ts. Per-stream term position is in my opinion 
still necessary. Will it be OK to commit then?

Regards,

Ivan