[FFmpeg-devel] TR-03 implementation

Éloi Bail eloi.bail at savoirfairelinux.com
Thu Feb 16 21:22:04 EET 2017


Hi, 

In november, we wrote on the mailing list about implementing support for TR-03 in ffmpeg [1]. 
There were some doubts in the ffmpeg community about whether or not ffmpeg 
could handle demuxing 3gbps of RTP input without significantly modifying the 
RTP demuxer and/or doing kernel bypassing. 

CBC/Radio Canada contracted us to test what was possible and to try to implement TR-03 
in ffmpeg. Using 2 servers connected by 10gbps fibre optic connection and a switch we 
performed several tests with various tools which showed that it should be possible to 
receive and demux 3gbps of RTP raw video with a large enough RX queue in the 
NIC and the socket. We then patched ffmpeg to support depayloading 8 and 10 bit 
raw video [2] and process the input stream on a seperate thread [3]. This allowed us 
to succesfully receive a 3gbps raw video stream in ffmpeg and write the raw video to 
the disk. We were also able to transcode it into h264. 

Thus it seems to us that ffmpeg should be able to support TR-03 without significant 
modifications nor kernel bypassing. 

Bellow is a more detailed description of our testing and development process: 

1. In the Linux Kernel: Thanks to iperf tool, we tested that the Linux 
kernel is able to handle 3gbps of udp streams with a payload size of 800 to 
1450 bytes. 

2. Using a simple RTP demuxer, we ensured that a user space program is able 
to handle a 3gbps stream without dropping packets. When adding an 
increasing amount of processing per packet, we observed that eventually 
packets are dropped. We concluded that minimal processing per packet should 
be used to achieve the reception of 3 gbps video stream. 

3. We played with Gstreamer which already implements an RTP raw video muxer 
/ demuxer. We were able to send a 3gbps video stream without dropping any 
packets. In reception, we experienced around 20% packet drop with 3gbps 
video stream because the thread in charge of socket reading is taking 100% 
CPU. Gstreamer team is aware of that and have ideas to reduce significantly 
the CPU usage grouping the processing per packet with the recvmmsg syscall 

4. We implement an RTP demuxer compatible with RFC 4175 and pixel format 
422-8bits and 422-10bits [2] 

* Checking FFmpeg tool code, we saw that a separate input thread(s) is used 
only if there is more than one input. With a minimal pipeline which reads 
an RTP stream from a socket and writes the raw video into a file, we 
observed that packets were dropped because too much time was used for 
packet processing. 

We modified FFmpeg tool to force the use of a dedicated input thread. 

5. Several queues are used from packet reception to packet processing. 
Tunning each queue allowed us to have zero packet dropped: 

* In the NIC queue: thanks to ethtool, we increased the queue size from 453 
to its maximum (4078) to avoid packet dropped in the NIC queue 

* In the Kernel queue: we observed no packet dropped after increasing the 
queue size to 16 mo 

* In the jitter buffer queue (FFmpeg): By default the jitter buffer is 
sized for 500 packets. With 1080P raw videos (RFC4175), we calculated 
that a video frame would lead to around 3000 packets. 

To be more resilient to packets reordering, we could increase the size of 
the jitter buffer but we observed that using a big jitter buffer, a 
significant processing per packet is added and lead thus to packet dropped 
in the Kernel. In addition, RFC4175 adds a mechanism to be resiliant to packet 
reordering per video frame. 

Results: 
* With : 
- our test setup composed of 2 servers running Centos 7 linked by a 10gbps 
switch. 
- our modified FFmpeg to handle RFC4175 and to improve the reading 
performance, 
- NIC and Kernel queues tunned and FFmpeg jitter buffer disabled 

we were able to: 
- send a 3 gbps video stream with gstreamer 
- receive with FFmpeg a 3 gbps video stream 422-8 bits without dropping any 
packets nor having any video artifacts. 
* However, using pixel format 4.2.2 10bits (packed), we encountered a 
performance degradation. Indeed 4.2.2 10bits (packed) is not supported in 
FFmpeg. We decided to convert into a 4.2.2 10bits planar format. We 
believe that this conversion adds too much processing per packets and thus 
leads to packets dropped. 
We are able to stream (and live transcode) 1080p 60fps 42210-bits without dropping packets. In reception the 
bandwidth is around 2.2 gbps. 

[1]: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-November/202554.html 
[2]: http://ffmpeg.org/pipermail/ffmpeg-devel/2017-February/207253.html 


More information about the ffmpeg-devel mailing list