[FFmpeg-devel] [PATCH] flvenc: Fix sequence header update timestamps

Fri May 11 02:08:41 EEST 2018

On Fri, May 11, 2018 at 1:39 AM, Alex Converse <alex.converse at gmail.com> wrote:
> On Thu, May 3, 2018 at 10:58 AM, Jan Ekström <jeebjp at gmail.com> wrote:
>>
>> Hi,
>>
>> While the timestamp field is 8+24=32 bits, the DTS value is int64_ts.
>> Thus, while I'm really bad at doing wrap-around logic, wouldn't it be
>> something a la ´uint32_t wrapped_timestamp = (uint32_t)(ts %
>> UINT32_MAX)`?
>>
>
> See the note about timestamps below.
>
>> Additionally, I did briefly go through E.4.1 (FLV Tag) in the spec and
>> it doesn't really seem to define whether this value is PTS or DTS...
>> Is this defined anywhere proper? Given FLV's age I wouldn't be
>> surprised that the answer would be "effectively DTS", of course. But
>> if you've seen what  Adobe's implementation writes with B-frames it'd
>> be interesting to see.
>>
>
> FLV AVC packets have a field called CompositionTime described to match
> 14496-12. FLV requires this composition offset to be zero for sequence header
> type packets. This strongly indicates to me that we want DTS here. In addition
> the current muxer and demuxer already use dts pretty consistently for this
> value (and use cts = pts - dts).
>

Alright, so dts it is... That only leaves out the fact that we're
stuffing int64_t into an "unsigned int", which most likely will not
warn on 64bit linux, but most likely will on various architectures
and/or systems.

So we:
1) take in the dts as int64_t.
2) apply wrap-over at either 32 or 31 bits depending on if the signed
value is ever supposed to be negative. Depends on how wrap-around in
FLV is supposed to be handled (aka "Is the negative area ever supposed
to be used from the 32bit signed integer field?").
3) Write it there.

>
>> >      int64_t data_size;
>> >      AVIOContext *pb = s->pb;
>> >      FLVContext *flv = s->priv_data;
>> > @@ -492,8 +492,8 @@ static void flv_write_codec_header(AVFormatContext* s, AVCodecParameters* par) {
>> >                  par->codec_type == AVMEDIA_TYPE_VIDEO ?
>> >                          FLV_TAG_TYPE_VIDEO : FLV_TAG_TYPE_AUDIO);
>> >          avio_wb24(pb, 0); // size patched later
>> > -        avio_wb24(pb, 0); // ts
>> > -        avio_w8(pb, 0);   // ts ext
>> > +        avio_wb24(pb, ts & 0xFFFFFF);
>> > +        avio_w8(pb, (ts >> 24) & 0x7F);
>>
>> Looking at the spec this is bottom 24 bits of the timestamp and top 8,
>> thus why is the second one not & 0xFF ?
>>
>
> The mapping from dts to the flv timestamp we write to the stream is identical
> for to what we do for non header packets. [1] I can pull this into a separate
> function but it's only two lines so it didn't seem worth while.

Yes, I think I commented on this that I hadn't noticed the signedness
of the value there :) So this is 100% a-OK.

>
>> >          avio_wb24(pb, 0); // streamid
>> >          pos = avio_tell(pb);
>> >          if (par->codec_id == AV_CODEC_ID_AAC) {
>> > @@ -756,7 +756,7 @@ static int flv_write_header(AVFormatContext *s)
>> >      }
>> >
>> >      for (i = 0; i < s->nb_streams; i++) {
>> > -        flv_write_codec_header(s, s->streams[i]->codecpar);
>> > +        flv_write_codec_header(s, s->streams[i]->codecpar, 0);
>> >      }
>> >
>>
>> Is it customary that even if you've got a live stream going, that the
>> start point of a given ingest is always zero? In that case you might
>> want to take mention of the initial dts, and then write offsets
>> compared to that? Otherwise if you have an initial offset of, say, 5s,
>> and then you a 5s GOP with an update, then while the difference to the
>> initial timestamp would be 5s, using the original dts field you would
>> get the second header having a value of 10s there. Or am I
>> misunderstanding something here?
>
> The FLV file (on disk) format requires the initial DTS to be zero. For FLV
> over RTMP this is not the case.
>

Oh, interesting... makes sense, of course, for continuing a stream
relative to the last timestamp of the previous ingest.

> At one point the flv muxer handled that itself, but that changed to handle
> copyts better.[2]
>
> For non-zero starting DTS should header packets have non-zero
> timestamp? Maybe but that's the case irrespective of weather or
> not we offset late extradata. Meanwhile many popular demuxers
> (avformat included) special case initial extradata.

Interesting, so in theory we should be making the initial
initialization data to be according to the DTS according to your
reading? Or is the initial extradata packet always supposed to be
zero, irrelevant of the actual initial content DTS?

As soon as these questions can be answered I think we've got the ball
rolling on getting this in. As I mentioned already, this is an
improvement.

Best regards,
Jan