[FFmpeg-trac] #3354(avcodec:new): enhancement: Zero latency av_read_frame()
FFmpeg
trac at avcodec.org
Mon Jan 27 15:40:46 CET 2014
#3354: enhancement: Zero latency av_read_frame()
-------------------------------------+-------------------------------------
Reporter: pjw | Type:
Status: new | enhancement
Component: avcodec | Priority: normal
Keywords: mpegts, | Version: git-
latency | master
Blocking: | Blocked By:
Analyzed by developer: 0 | Reproduced by developer: 0
-------------------------------------+-------------------------------------
== Summary ==
I'm using ffmpeg to setup a (very) low latency video streaming application
(in C++). Ffmpeg is used on both the server and client side. The video is
encoded as h264 (i.e. "libx264") and transported in a transport stream
(mpegts) over UDP.
Up to now I've been able to reduce the latency to: encoding + transport +
decoding + one-frame-time. I'd like it to be just encoding + transport +
decoding.
The problem seems to be that {{{av_read_frame()}}} always holds back one
video frame, i.e. frame ''n'' is returned only when {{{av_read_frame()}}}
for ''n+1'' is called. I'd like {{{av_read_frame()}}} to return a frame as
soon as possible, without any delay.
== How to test ==
Besides the video stream, I've added an extra data stream. It is used to
transport a timestamp along-side each video frame. The timestamp
represents the point at which the data is sent. A data packet and a video
packet have the same PTS. In the client code, i can synchronize the data
stream packets with the video stream packets using PTS. Now i know when
the data pkt and video frame were sent and I also know when they arrived.
This allows me to calculate the transport-delay. This is of course only
true when the server and client use the same clock. In my tests I executed
the server and client code on the same host.
The transport delay of the data packet is sub-milliseconds, as expected.
However, the delay of the video frame is ~25ms (presumably 20ms = one
frame time at 50 Hz + 5ms for encoding/decoding). I expected it to be just
~5ms, i.e. ecoding/decoding time.
Wireshark shows that both the video data packets and private data packet
are sent at the same time. So the delay of one frame (20ms) is introduced
by the client code. I think the it is caused by (a combination of) mpegts
and h264_parser.
== Server code ==
This is what i did on the server side. Note that this is not a working
example. It has been stripped to keep it short(-ish):
{{{
avformat_alloc_output_context2(&mFormatContext, nullptr,
"mpegts", "udp://239.192.100.100:12345");
// These flags don't seem to help.
// mFormatContext->avio_flags |= AVIO_FLAG_DIRECT;
// mFormatContext->flags |= AVFMT_FLAG_FLUSH_PACKETS;
// Add video stream:
mCodec = avcodec_find_encoder_by_name("libx264");
mVidStream = avformat_new_stream(mFormatContext, mCodec);
mVidStream->id = mFormatContext->nb_streams - 1;
mCodecContext = mVidStream->codec;
mCodecContext->codec_id = mCodec->id;
mCodecContext->bit_rate = mBitrate;
mCodecContext->width = 1280;
mCodecContext->height = 720;
mCodecContext->gop_size = 1;
mCodecContext->pix_fmt = AV_PIX_FMT_YUV420P;
mCodecContext->time_base.den = 50; // 50 Hz
mCodecContext->time_base.num = 1;
mCodecContext->max_b_frames = 0;
mCodecContext->thread_count = mThreads; // tried 1 - 4. All have same
effect.
mCodecContext->thread_type = FF_THREAD_SLICE;
// These options also don't have the desired effect.
// mCodecContext->flags |= CODEC_FLAG_LOW_DELAY;
// mCodecContext->flags2 |= CODEC_FLAG2_FAST;
// Add (private) data stream. Will be used to send a packet containing a
timestamp
// along-side each video frame.
mDataStream = avformat_new_stream(mFormatContext, nullptr);
mDataStream->id = mFormatContext->nb_streams - 1;
mDataStream->codec = avcodec_alloc_context3(nullptr);
mDataStream->codec->codec_type = AVMEDIA_TYPE_DATA;
mDataStream->codec->codec_id = AV_CODEC_ID_SMPTE_KLV;
mDataCodecContext = mDataStream->codec;
if (mFormatContext->oformat->flags & AVFMT_GLOBALHEADER)
{
mCodecContext->flags |= CODEC_FLAG_GLOBAL_HEADER;
mDataCodecContext->flags |= CODEC_FLAG_GLOBAL_HEADER;
}
// No delay please! :-)
av_opt_set(mCodecContext->priv_data, "preset", "ultrafast", 0);
av_opt_set(mCodecContext->priv_data, "tune", "zerolatency", 0);
avcodec_open2(mCodecContext, mCodec, nullptr);
avformat_write_header(mFormatContext, nullptr);
//
// encode loop:
//
fill_yuv_image(&mDstPicture, frame_count++,
mCodecContext->width, mCodecContext->height);
AVPacket pkt;
av_init_packet(&pkt);
int got_packet;
int err = avcodec_encode_video2(mCodecContext, &pkt, mFrame, &got_packet);
if (err < 0) return;
if (!err && got_packet && pkt.size)
{
pkt.stream_index = mVidStream->index;
pkt.duration = 0;
// Send a data packet alongside each video frame.
AVPacket dataPkt;
av_init_packet(&dataPkt);
dataPkt.stream_index = mDataStream->index;
dataPkt.pts = pkt.pts; // Same as video
dataPkt.dts = pkt.dts;
double tNow = get_system_time_since_epoch_in_seconds();
dataPkt.data = (unsigned char*) &tNow;
dataPkt.size = sizeof(double);
// Write side data.
av_write_frame(mFormatContext, &dataPkt);
// Flush; probably not necessary but should not hurt.
av_write_frame(mFormatContext, nullptr);
// Write video frame.
err = av_write_frame(mFormatContext, &pkt);
// Flush; again.. probably not necessary but should not hurt.
err = av_write_frame(mFormatContext, nullptr);
}
// PTS for next frame
mFrame->pts += av_rescale_q(1, mVidStream->codec->time_base,
mVidStream->time_base);
}}}
== Client code ==
This is the client code. Also stripped in an attempt to keep it short:
{{{
avformat_open_input(&mFormatContext, "udp://239.192.100.100:12345"",
nullptr, nullptr);
// These flags work! But this negates the use of av_read_frame() as it now
// does not guarantee to return one frame.
// mFormatContext->flags |= AVFMT_FLAG_NOPARSE | AVFMT_FLAG_NOFILLIN;
// These flags seem to have no effect.
// mFormatContext->flags |= AVFMT_FLAG_NOBUFFER;
// mFormatContext->flags |= AVFMT_FLAG_FLUSH_PACKETS;
// mFormatContext->avio_flags |= AVIO_FLAG_DIRECT;
avformat_find_stream_info(mFormatContext, nullptr);
// Video stream.
mVideoStreamIdx = av_find_best_stream(mFormatContext, AVMEDIA_TYPE_VIDEO,
-1, -1, nullptr, 0);
AVStream* st = mFormatContext->streams[mVideoStreamIdx];
// This flag works! But this negates the use of av_read_frame() as it now
// does not guarantee to return one frame.
// st->need_parsing = AVSTREAM_PARSE_NONE;
// find decoder for the stream
AVCodecContext* dec_ctx = st->codec;
AVCodec* dec = avcodec_find_decoder(dec_ctx->codec_id);
if (dec->capabilities & CODEC_CAP_TRUNCATED)
dec_ctx->flags |= CODEC_FLAG_TRUNCATED;
dec_ctx->thread_type = FF_THREAD_SLICE;
dec_ctx->thread_count = mThreads;
// These don't have the desired effect:
// dec_ctx->flags |= CODEC_FLAG_LOW_DELAY;
// dec_ctx->flags2 |= CODEC_FLAG2_FAST;
// dec_ctx->flags2 |= CODEC_FLAG2_CHUNKS;
// dec_ctx->refcounted_frames = 1;
avcodec_open2(dec_ctx, dec, nullptr);
mVideoStream = mFormatContext->streams[mVideoStreamIdx];
mVideoDecodeContext = mVideoStream->codec;
mDataStreamIdx = av_find_best_stream(mFormatContext, AVMEDIA_TYPE_DATA,
-1, -1, nullptr, 0);
//
// Decoding loop:
//
static std::queue<std::pair<int64_t, double> > ptsDb;
AVPacket pkt;
av_init_packet(&pkt);
pkt.data = nullptr;
pkt.size = 0;
// wait for data.
if (av_read_frame(mFormatContext, &pkt) < 0)
return;
double tRecv = get_system_time_since_epoch_in_seconds();
if (pkt.stream_index == mDataStreamIdx)
{
double tData = *(double*) (pkt.data);
printf("DAT PTS %li\trecv'd @ %.2lf [ms], trans delay %.4lf [ms]\n",
pkt.pts, tRecv * 1e3, (tRecv - tData) * 1e3);
ptsDb.emplace(std::make_pair(pkt.pts, tRecv));
}
else if (pkt.stream_index == mVideoStreamIdx)
{
// Quick hack to sync data stream packets with video packets.
std::pair<int64_t, double> elem {0, 0};
while (!ptsDb.empty())
{
elem = ptsDb.front();
if (elem.first < pkt.pts)
{
ptsDb.pop();
}
if (elem.first == pkt.pts)
{
ptsDb.pop();
break;
}
if (elem.first > pkt.pts)
{
elem.second = 0;
break;
}
}
double tData = elem.second;
printf("VID PTS %li\trecv'd @ %.2lf [ms], delta with data %.2lf [ms]
(%i)\n", pkt.pts, tRecv * 1e3, (tRecv - tData) * 1e3, ptsDb.size());
// decode video frame
int got_frame = 0;
mFrame = avcodec_alloc_frame();
double t1 = get_system_time_since_epoch_in_seconds();
avcodec_decode_video2(mVideoDecodeContext, mFrame, &got_frame, &pkt);
double t2 = get_system_time_since_epoch_in_seconds();
if (got_frame)
printf("[DEBUG] Got frame VID PTS %lli\tdecoding time %.2lf [ms]\n",
mFrame->pkt_pts, (t2 - t1) * 1e3);
av_free_packet(&pkt);
avcodec_free_frame(&mFrame);
}
}}}
As noted in the code above, the flags {{{AVFMT_FLAG_NOPARSE |
AVFMT_FLAG_NOFILLIN}}}
and/or {{{AVSTREAM_PARSE_NONE}}} seem to (almost) do what i want. My
understanding of these flags is that
they essentially disable the functionality which ensures one frame is
available. So once
these flags are used there is no way of knowing when a frame is ready to
be decoded, in which case they are not usable.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/3354>
FFmpeg <http://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list