[Ffmpeg-devel] [RFC] Improvement for the odd timestamp generation when parser is in use.

Mon Mar 19 01:47:29 CET 2007

Hi,

I noticed some odd behavior with libavformat when AVParser is used to make 
complete frames. Timestamps of output packets get's very jumpy and can't 
easily be attributed to the original packet they came from.

This post ended up a way bit longer than I expected so if you are only 
interested in the proposed change to avformat/parser is to always output the 
timestamp of the packet a frame started in, and provide a byteoffset from 
that timestamp where frame starts. so player (or avformat if that is ok to 
do internally) can correct the timestamp atleast for cbr streams.

It's abit hard to explain what is going on so i'll try it using an example.

Assume we have a cbr stream wich has a timebase set which results in pts/dts 
values actually can be interpreted as what byte in the stream we are at (ac3 
in avi does this for example and is where i found the issue). Now assume 
each packet comming out of the demuxer is of 10 bytes (or duration 10 as our 
timebase equates that), however each full frame that parser will find is of 
6 bytes. The assumptions comes from a similar situation with ac3 in avi 
where ech avi packet was of about 23xx bytes and each ac3 frame of 17xx 
something bytes, don't remember the exact figures.

This is the timestamps we then will get out. (best view with fixed font 
width)

pk1  i_size: 10  i_dts: 0   o_size: 6  o_dts: -   o_adts: -   actual: 0
     i_size: 4   i_dts: -   o_size: -  o_dts: -   o_adts: -   actual:
pk2  i_size: 10  i_dts: 10  o_size: 6  o_dts: 0   o_adts: 0   actual: 6
     i_size: 8   i_dts: -   o_size: 6  o_dts: 10  o_adts: 10  actual: 12
     i_size: 2   i_dts: -   o_size: -  o_dts: -   o_adts: -   actual:
pk3  i_size: 10  i_dts: 20  o_size: 6  o_dts: -   o_adts: 16  actual: 18
     i_size: 6   i_dts: -   o_size: 6  o_dts: 20  o_adts: 20  actual: 24
pk4  i_size: 10  i_dts: 30  o_size: 6  o_dts: -   o_adts: 26  actual: 30
     i_size: 4   i_dts: -   o_size: -  o_dts: -   o_adts: -   actual:
pk5  i_size: 10  i_dts: 40  o_size: 6  o_dts: 30  o_adts: 30  actual: 36
     i_size: 4   i_dts: -   o_size: -  o_dts: -   o_adts: -   actual: -
pk6  i_size: 10  i_dts: 50  o_size: 6  o_dts: 40  o_adts: 40  actual: 42

i_size: data from demuxer (or what is left after previous output)
i_dts: timestamp on packet comming from demuxer (reset to nopts after 
anything has been consumd)
o_size: packet out from parser
o_dts: timestamp out from parser
o_adts: timestamp out from libavformat (cur_dts + duration, if parser didn't 
give anything)

So, we get timestamps -,0,10,16,20,26,30,40 out from the demuxer. This give 
the following dts differences. -,10,6,4,6,4,10. where the 16 and 26 
timestamps are invented timestamps based on previous frames timestamp + 
duration (cur_dts). This makes is rather hard to know when to trust 
timestamps comming out from libavformat.

Now, if framesizes are small, this isn't a huge deal, as the absolute 
timestamp error isn't that large. However, in the case of AC3 or DTS frames, 
that are to be directly passed to a rendering device, there is trouble, as 
you never know what timestamp to use to sync to.

The best way i can see to handle this is to make parser always output the 
timestamp of the packet the current frame started in. (currently it only 
does this if the demux packet that resulted in previous frame had a 
timestamp and it's the first time we use data from it ). This would result 
in multiple packets with the same timestamps, wich might not be the best, 
but it would atleast be consistant.

Now if the above way is acceptable, by making parser present the number of 
bytes after the timestamp that was given, a player could atleast for cbr 
streams correct the timestamp of the demux packet. The correction could even 
be done in avformat by default, but that might not be good in the case of 
stream copy.

I implemented the above in our player, and it works very well, however i'm 
not sure how good it is to use the parsers internal variable from the public 
interface. The bad use of internal variables could be removed if only parser 
would give the byteoffset directly. (this is no code i expect to go into 
avformat, only here as a RFC and possible use for somebody who needs to 
improve the accuracy of libavformat timestamps)

          AVStream* s = m_pFormatContext->streams[pkt.stream_index];
          if(s->parser && s->need_parsing && s->codec->bit_rate)
          {

// START PARSER PART
            AVCodecParserContext* pc = s->parser;
            int k = pc->cur_frame_start_index;
            for(int i = 0; i < AV_PARSER_PTS_NB; i++) {
                if (pc->frame_offset >= pc->cur_frame_offset[k]
                 && pc->cur_frame_dts[k] != AV_NOPTS_VALUE)
                    break;
                k = (k - 1) & (AV_PARSER_PTS_NB - 1);
            }

            // how far after this timestamp are we
            int64_t bytes = pc->frame_offset - pc->cur_frame_offset[k];
// END PARSER PART

            int64_t offset = av_rescale_rnd(bytes, s->time_base.den*8, 
s->codec->bit_rate * s->time_base.num, AV_ROUND_NEAR_INF);

            // if the paket this started in has a timestamp, interpolate 
from that
            if(pc->cur_frame_dts[k] != AV_NOPTS_VALUE)
              pkt.dts = pc->cur_frame_dts[k] + offset;
            else
              pkt.pts = AV_NOPTS_VALUE;

            if(pc->cur_frame_pts[k] != AV_NOPTS_VALUE)
              pkt.pts = pc->cur_frame_pts[k] + offset;
            else
              pkt.pts = AV_NOPTS_VALUE;
          }

/Regards

Joakim