[FFmpeg-trac] #4984(undetermined:new): ffmpeg amerge and amix filter delay when working with RTSP

FFmpeg trac at avcodec.org
Tue Nov 3 09:12:37 CET 2015


#4984: ffmpeg amerge and amix filter delay when working with RTSP
-------------------------------------+-------------------------------------
             Reporter:  leogsa       |                     Type:  defect
               Status:  new          |                 Priority:  normal
            Component:               |                  Version:
  undetermined                       |  unspecified
             Keywords:  RTSP         |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
 ffmpeg amerge and amix filter delay

 I need to take audio-streams from several IP cameras and merge them into
 one stream, so that they would sound simaltaneousely.

 I tried filter "amix": (for testing purposes I take audio-stream 2 times
 from the same camera. yes, I tried 2 cameras - result is the same)

 ffmpeg -i rtsp://user:pass@172.22.5.202 -i rtsp://user:pass@172.22.5.202
 -map 0:a -map 1:a  -filter_complex
 amix=inputs=2:duration=first:dropout_transition=3  -ar 22050 -vn -f flv
 rtmp://172.22.45.38:1935/live/stream1

 result: I say "hello". And hear in speakers the first "hello" and in 1
 second I hear the second "hello". Instead of hearing two "hello"'s
 simaltaneousely.

 also I tried filter "amerge":

 ffmpeg -i rtsp://user:pass@172.22.5.202 -i rtsp://user:pass@172.22.5.202
 -map 0:a -map 1:a  -filter_complex amerge -ar 22050 -vn -f flv rtmp://
 172.22.45.38:1935/live/stream1

 result: the same as in the first example, but now I hear the first "hello"
 in left speaker and in 1 second I hear the second "hello" in right
 speaker,
 instead of hearing two "hello"'s in both speakers simaltaneousely.

 Here is ful command-line output for both variants: amix:

     ffmpeg -i rtsp://admin:12345@172.22.5.202 -i rtsp://
 admin:12345 at 172.22.5.202 -map 0:a -map 1:a -filter_complex
 amix=inputs=2:duration=longest:dropout_transition=0 -vn -ar 22050 -f flv
 rtmp://172.22.45.38:1935/live/stream1       ffmpeg version
 N-76031-g9099079
 Copyright (c) 2000-2015 the FFmpeg developers
       built with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-16)
       configuration: --enable-gpl --enable-libx264 --enable-libmp3lame
 --enable-nonfree --enable-version3
       libavutil      55.  4.100 / 55.  4.100
       libavcodec     57.  6.100 / 57.  6.100
       libavformat    57.  4.100 / 57.  4.100
       libavdevice    57.  0.100 / 57.  0.100
       libavfilter     6. 11.100 /  6. 11.100
       libswscale      4.  0.100 /  4.  0.100
       libswresample   2.  0.100 /  2.  0.100
       libpostproc    54.  0.100 / 54.  0.100
     Input #0, rtsp, from 'rtsp://admin:12345@172.22.5.202':
       Metadata:
         title           : Media Presentation
       Duration: N/A, start: 0.032000, bitrate: N/A
         Stream #0:0: Video: h264 (Baseline), yuv420p, 1280x720, 20 fps, 25
 tbr, 90k tbn, 40 tbc
         Stream #0:1: Audio: adpcm_g726, 8000 Hz, mono, s16, 16 kb/s
         Stream #0:2: Data: none
     Input #1, rtsp, from 'rtsp://admin:12345@172.22.5.202':
       Metadata:
         title           : Media Presentation
       Duration: N/A, start: 0.032000, bitrate: N/A
         Stream #1:0: Video: h264 (Baseline), yuv420p, 1280x720, 20 fps, 25
 tbr, 90k tbn, 40 tbc
         Stream #1:1: Audio: adpcm_g726, 8000 Hz, mono, s16, 16 kb/s
         Stream #1:2: Data: none
     Output #0, flv, to 'rtmp://172.22.45.38:1935/live/stream1':
       Metadata:
         title           : Media Presentation
         encoder         : Lavf57.4.100
         Stream #0:0: Audio: mp3 (libmp3lame) ([2][0][0][0] / 0x0002),
 22050
 Hz, mono, fltp (default)
         Metadata:
           encoder         : Lavc57.6.100 libmp3lame
     Stream mapping:
       Stream #0:1 (g726) -> amix:input0
       Stream #1:1 (g726) -> amix:input1
       amix -> Stream #0:0 (libmp3lame)
     Press [q] to stop, [?] for help
     [rtsp @ 0x2689600] Thread message queue blocking; consider raising the
 thread_queue_size option (current value: 8)
     [rtsp @ 0x2727c60] Thread message queue blocking; consider raising the
 thread_queue_size option (current value: 8)
     [rtsp @ 0x2689600] max delay reached. need to consume packet
     [NULL @ 0x268c500] RTP: missed 38 packets
     [rtsp @ 0x2689600] max delay reached. need to consume packet
     [NULL @ 0x268d460] RTP: missed 4 packets
     [flv @ 0x2958360] Failed to update header with correct duration.
     [flv @ 0x2958360] Failed to update header with correct filesize.
     size=      28kB time=00:00:06.18 bitrate=  36.7kbits/s
     video:0kB audio:24kB subtitle:0kB other streams:0kB global headers:0kB
 muxing overhead: 16.331224%

 and amerge:

 # ffmpeg -i rtsp://admin:12345@172.22.5.202 -i rtsp://
 admin:12345 at 172.22.5.202 -map 0:a -map 1:a -filter_complex amerge -vn
 -ar
 22050 -f flv rtmp://172.22.45.38:1935/live/stream1
     ffmpeg version N-76031-g9099079 Copyright (c) 2000-2015 the FFmpeg
 developers
       built with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-16)
       configuration: --enable-gpl --enable-libx264 --enable-libmp3lame
 --enable-nonfree --enable-version3
       libavutil      55.  4.100 / 55.  4.100
       libavcodec     57.  6.100 / 57.  6.100
       libavformat    57.  4.100 / 57.  4.100
       libavdevice    57.  0.100 / 57.  0.100
       libavfilter     6. 11.100 /  6. 11.100
       libswscale      4.  0.100 /  4.  0.100
       libswresample   2.  0.100 /  2.  0.100
       libpostproc    54.  0.100 / 54.  0.100
     Input #0, rtsp, from 'rtsp://admin:12345@172.22.5.202':
       Metadata:
         title           : Media Presentation
       Duration: N/A, start: 0.064000, bitrate: N/A
         Stream #0:0: Video: h264 (Baseline), yuv420p, 1280x720, 20 fps, 25
 tbr, 90k tbn, 40 tbc
         Stream #0:1: Audio: adpcm_g726, 8000 Hz, mono, s16, 16 kb/s
         Stream #0:2: Data: none
     Input #1, rtsp, from 'rtsp://admin:12345@172.22.5.202':
       Metadata:
         title           : Media Presentation
       Duration: N/A, start: 0.032000, bitrate: N/A
         Stream #1:0: Video: h264 (Baseline), yuv420p, 1280x720, 20 fps, 25
 tbr, 90k tbn, 40 tbc
         Stream #1:1: Audio: adpcm_g726, 8000 Hz, mono, s16, 16 kb/s
         Stream #1:2: Data: none
     [Parsed_amerge_0 @ 0x3069cc0] No channel layout for input 1
     [Parsed_amerge_0 @ 0x3069cc0] Input channel layouts overlap: output
 layout will be determined by the number of distinct input channels
     Output #0, flv, to 'rtmp://172.22.45.38:1935/live/stream1':
       Metadata:
         title           : Media Presentation
         encoder         : Lavf57.4.100
         Stream #0:0: Audio: mp3 (libmp3lame) ([2][0][0][0] / 0x0002),
 22050
 Hz, stereo, s16p (default)
         Metadata:
           encoder         : Lavc57.6.100 libmp3lame
     Stream mapping:
       Stream #0:1 (g726) -> amerge:in0
       Stream #1:1 (g726) -> amerge:in1
       amerge -> Stream #0:0 (libmp3lame)
     Press [q] to stop, [?] for help
     [rtsp @ 0x2f71640] Thread message queue blocking; consider raising the
 thread_queue_size option (current value: 8)
     [rtsp @ 0x300fb40] Thread message queue blocking; consider raising the
 thread_queue_size option (current value: 8)
     [rtsp @ 0x2f71640] max delay reached. need to consume packet
     [NULL @ 0x2f744a0] RTP: missed 18 packets
     [flv @ 0x3058b00] Failed to update header with correct duration.
     [flv @ 0x3058b00] Failed to update header with correct filesize.
     size=      39kB time=00:00:04.54 bitrate=  70.2kbits/s
     video:0kB audio:36kB subtitle:0kB other streams:0kB global headers:0kB
 muxing overhead: 8.330614%

 UPDATE 30 oct 2015: I found interesting detail when connecting 2 cameras
 (they have different microphones and I hear the difference between them):
 the order of "Hello"'s from different cams depends on the ORDER OF INPUTS.

 with command

 ffmpeg -i rtsp://cam2 -i rtsp://cam1 -map 0:a -map 1:a -filter_complex
 amix=inputs=2:duration=longest:dropout_transition=0 -vn -ar 22050 -f flv
 rtmp://172.22.45.38:1935/live/stream1

 I hear "hello" from 1st cam and then in 1 second "hello" from 2nd cam.

 -----

 with command

 ffmpeg -i rtsp://cam1 -i rtsp://cam2 -map 0:a -map 1:a -filter_complex
 amix=inputs=2:duration=longest:dropout_transition=0 -vn -ar 22050 -f flv
 rtmp://172.22.45.38:1935/live/stream1

 I hear "hello" from 2nd cam and then in 1 second "hello" from 1st cam.

 So, As I understand - ffmpeg takes inputs not simaltaneousely, but in the
 order of inputs given.

 P.S.  FILES are mixed and merged perfectly with same commands.

--
Ticket URL: <https://trac.ffmpeg.org/ticket/4984>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list