[FFmpeg-devel] [PATCH] avformat: add AV1 RTP depacketizer and packetizer

Fri Dec 13 15:44:25 EET 2024

On Thursday, December 12th, 2024 at 4:18 AM, Chris Hodges <Chris.Hodges at axis.com> wrote:

> Hi Tristan,
> 
> thanks for taking the time reviewing.
> 
> On 12/6/24 18:39, Tristan Matthews via ffmpeg-devel wrote:
> 
> > Hi (also apologies if my client mangles the inline version of the patch, it's the first time I've tried to review an attached patch with it)...
> 
> > > + av_log(ctx, AV_LOG_DEBUG, "RTP Packet %d in (%x), len=%d:\n",
> > > + seq, flags, len);
> > > + av_hex_dump_log(ctx, AV_LOG_TRACE, buf, FFMIN(len, 128));
> > 
> > I think this is probably too verbose for DEBUG level (since it's called every packet)...maybe ifdef this bit out.
> 
> 
> Added #ifdefs and changed level to TRACE.
> 
> > > + if (is_keyframe) {
> > > + av_log(ctx, AV_LOG_DEBUG, "Marking FIRST packet\n");
> > > + aggr_hdr |= AV1F_AGGR_HDR_FIRST_PKT;
> > > + }
> > 
> > I noticed that other implementations only set this bit if we're dealing with a keyframe that is also accompanied by a
> > sequence header (since an encoder may emit a keyframe without one)...the spec is a bit vague about this but it
> > may be safer?
> 
> 
> I've added scanning the frame for the sequence header to make it a bit
> more safe against intra-only frames inside a GOP. Please notice that
> this is not fool-proof. According to the AV1 spec, a bit-identical copy
> of the I-frame sequence header MAY appear in every subsequential frame.
> I am not aware of any encoder doing this, however.
> 
> After I wrote this, I revisited the condition that sets in
> AV_PKT_FLAG_KEY in the AV1 parsers and librav1e and libsvtav1 seem to do
> this only on KEY_FRAME, not on INTRA_ONLY or SWITCH_FRAME, which should
> make it safe to NOT scan for the sequence header. So it might be up for
> discussion if this is really needed or just causes unnecessary overhead.
> 
> Do the other implementations you mentioned have that kind of information
> by parsing OBUs or from the AV1 encoder?

In both they do it by parsing OBUs:

- This boolean is later used to build the aggregation header: https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/blob/main/net/rtp/src/av1/pay/imp.rs?ref_type=heads#L285

- Here they are storing the OBUs in memory and have access to that info directly:
https://chromium.googlesource.com/external/webrtc/+/HEAD/modules/rtp_rtcp/source/rtp_packetizer_av1.cc#346

> 
> I'm putting it in a #if section for now...
> 
> > > + av_log(ctx, AV_LOG_TRACE, "AV1 Frame %d in (%x), size=%d:\n",
> > > + rtp_ctx->seq, rtp_ctx->flags, frame_size);
> > > + av_hex_dump_log(ctx, AV_LOG_TRACE, frame_buf, FFMIN(frame_size, 128));
> > 
> > This is also too frequent/verbose I think.
> 
> 
> Put into #ifdef.
> 
> > > + if ((obu_type == AV1_OBU_TEMPORAL_DELIMITER) ||
> > > + (obu_type == AV1_OBU_TILE_LIST)) {
> > > + // ignore and remove according to spec
> > > + obu_ptr += obu_size;
> > > + continue;
> > > + }
> > 
> > You also want to ignore AV1_OBU_PADDING here I believe.
> 
> 
> The AV1 RTP spec draft does not explicitly mention OBU_PADDING, however,
> it makes sense to remove it. I've added it (with a comment).
> 
> > > + av_log(ctx, AV_LOG_TRACE, "Sending FRAG packet %ld/%d, %d OBUs\n",
> > > + pkt_ptr - rtp_ctx->buf, rtp_ctx->max_payload_size, num_obus);
> > > + av_hex_dump_log(ctx, AV_LOG_TRACE, rtp_ctx->buf, FFMIN(pkt_ptr - rtp_ctx->buf, 128));
> > 
> > Again this may be too frequent/verbose.
> 
> 
> Done.
> 
> I also found and fixed a bug in the decoder regarding fragmented OBUs
> with "unlucky" growths regarding the number of LEB bytes, when the
> fragmented OBU wasn't the last one in the temporal unit.
> 

Nice, looking forward to testing the next iteration.

Best,
Tristan