[FFmpeg-trac] #10399(undetermined:new): Output presents "crackling " static sounds

FFmpeg trac at avcodec.org
Sat Jun 3 12:18:58 EEST 2023


#10399: Output presents "crackling " static sounds
-------------------------------------+-------------------------------------
             Reporter:  drive4code   |                     Type:  defect
               Status:  new          |                 Priority:  important
            Component:               |                  Version:
  undetermined                       |  unspecified
             Keywords:  AAC volume   |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
 Summary of the bug:
 The output file contains crackling artifacts

 What I Was Trying To Accomplish:
 I was running a github project, called cleanvid, which aims to censor
 profanity out of videos. Running this mutes the audio by generating the
 provided command in the part where profanity is found. Here's the steps it
 takes detailed by the project-s Readme:

 "
 cleanvid is a little script to mute profanity in video files in a few
 simple steps:

 1.The user provides as input a video file and matching .srt subtitle file.
 If subtitles are not provided explicitly, they will be extracted from the
 video file if possible; if not, subliminal is used to attempt to download
 the best matching .srt file.

 2. pysrt is used to parse the .srt file, and each entry is checked against
 a list of profanity or other words or phrases you'd like muted. Mappings
 can be provided (eg., map "sh*t" to "poop"), otherwise the word will be
 replaced with *****.

 3.A new "clean" .srt file is created. with only those phrases containing
 the censored/replaced objectional language.

 **4. ffmpeg is used** to create a cleaned video file. This file contains
 the original video stream, but the audio stream is muted during the
 segments containing objectional language. The audio stream is re-encoded
 as AAC and remultiplexed back together with the video. Optionally, the
 clean .srt file can be embedded in the cleaned video file as a subtitle
 track.
 "

 Problem Encountered:
 There seem to be occasional artifacts when running this. Using the
 "-aac_tns 0", which seems to help mitigate the problem, for long videos
 these seems to be a big portion (I've seen a 20-minute portion on a 1 hour
 30-minute video) of artifacting, which then seems to get better through
 the rest of the video and never fully disappear. Without using the flag,
 the problem is consistent and constant throughout the video.

 Additional Notes:
 The problem seems to be somehow mitigated by including the "-aac_tns 0"
 flag, and worsened by using the "-b:a 640k" flag

 How to reproduce:
 Run the following on the uploaded input, and listen to the artifacts that
 pop up. Due to the small time of the clip, it may take multiple attempts
 {{{
 % ffmpeg -y -i input.mp4 -sn -c:v copy -af
 "volume=enable='between(t,0.260,0.380)':volume=0,volume=enable='between(t,1.320,3.020)':volume=0,volume=enable='between(t,30.860,31.460)':volume=0"
 -c:a aac -report output.mp4
 ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg
 developers
 built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
 }}}

 Loglevel Output:
 The size was way too large. I can only provide the first and last part:
 {{{
 % ffmpeg -v 9 -loglevel 99 -i input.mp4
 ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg
 developers
   built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
   configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1
 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu
 --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl
 --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom
 --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca
 --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite
 --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-
 libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-
 libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus
 --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-
 libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-
 libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-
 libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-
 libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq
 --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl
 --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg
 --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883
 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
   libavutil      56. 70.100 / 56. 70.100
   libavcodec     58.134.100 / 58.134.100
   libavformat    58. 76.100 / 58. 76.100
   libavdevice    58. 13.100 / 58. 13.100
   libavfilter     7.110.100 /  7.110.100
   libswscale      5.  9.100 /  5.  9.100
   libswresample   3.  9.100 /  3.  9.100
   libpostproc    55.  9.100 / 55.  9.100
 Splitting the commandline.
 Reading option '-v' ... matched as option 'v' (set logging level) with
 argument '9'.
 Reading option '-loglevel' ... matched as option 'loglevel' (set logging
 level) with argument '99'.
 Reading option '-i' ... matched as input url with argument 'input.mp4'.
 Finished splitting the commandline.
 Parsing a group of options: global .
 Applying option v (set logging level) with argument 9.
 Successfully parsed a group of options.
 Parsing a group of options: input url input.mp4.
 Successfully parsed a group of options.
 Opening an input file: input.mp4.
 [NULL @ 0x55967fe54240] Opening 'input.mp4' for reading
 [file @ 0x55967fe54ec0] Setting default whitelist 'file,crypto,data'
 Probing mov,mp4,m4a,3gp,3g2,mj2 score:100 size:2048
 Probing mp3 score:1 size:2048
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] Format mov,mp4,m4a,3gp,3g2,mj2
 probed with size=2048 and score=100
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'ftyp' parent:'root' sz:
 24 8 23353518
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] ISO: File Type Major Brand:
 mp42
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'uuid' parent:'root' sz:
 40 32 23353518
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'mdat' parent:'root' sz:
 23319625 72 23353518
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'moov' parent:'root' sz:
 33829 23319697 23353518
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'mvhd' parent:'moov' sz:
 108 8 33821
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] time scale = 48000
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'trak' parent:'moov' sz:
 19339 116 33821
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'tkhd' parent:'trak' sz:
 92 8 19331
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'mdia' parent:'trak' sz:
 19239 100 19331
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'mdhd' parent:'mdia' sz:
 32 8 19231
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'hdlr' parent:'mdia' sz:
 45 40 19231
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] ctype=[0][0][0][0]
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] stype=vide
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'minf' parent:'mdia' sz:
 19154 85 19231
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'vmhd' parent:'minf' sz:
 20 8 19146
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'dinf' parent:'minf' sz:
 36 28 19146
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'dref' parent:'dinf' sz:
 28 8 28
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] Unknown dref type 0x206c7275
 size 12
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'stbl' parent:'minf' sz:
 19090 64 19146
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'stsd' parent:'stbl' sz:
 150 8 19082
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] size=134 4CC=avc1 codec_type=0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'avcC' parent:'stsd' sz:
 48 8 48
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'stts' parent:'stbl' sz:
 32 158 19082
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] track[0].stts.entries = 2
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] sample_count=4150,
 sample_duration=1000
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] sample_count=1,
 sample_duration=740
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'ctts' parent:'stbl' sz:
 24 190 19082
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] track[0].ctts.entries = 1
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] count=4151, duration=3300
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] dts shift 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'stsc' parent:'stbl' sz:
 1072 214 19082
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] track[0].stsc.entries = 88
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'stsz' parent:'stbl' sz:
 16624 1286 19082
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] sample_size = 0 sample_count =
 4151
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'stco' parent:'stbl' sz:
 884 17910 19082
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] type:'stss' parent:'stbl' sz:
 296 18794 19082
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] keyframe_count = 70
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 0,
 offset 50, dts 0, size 63932, distance 0, keyframe 1
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 1,
 offset fa0c, dts 1000, size 218, distance 1, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 2,
 offset fae6, dts 2000, size 344, distance 2, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 3,
 offset fc3e, dts 3000, size 117, distance 3, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 4,
 offset fcb3, dts 4000, size 367, distance 4, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 5,
 offset fe22, dts 5000, size 226, distance 5, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 6,
 offset ff04, dts 6000, size 1268, distance 6, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 7,
 offset 103f8, dts 7000, size 1152, distance 7, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 8,
 offset 10878, dts 8000, size 8271, distance 8, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 9,
 offset 128c7, dts 9000, size 5689, distance 9, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 10,
 offset 13f00, dts 10000, size 8869, distance 10, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 11,
 offset 161a5, dts 11000, size 3581, distance 11, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 12,
 offset 16fa2, dts 12000, size 7453, distance 12, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 13,
 offset 18cbf, dts 13000, size 7036, distance 13, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 14,
 offset 1a83b, dts 14000, size 9793, distance 14, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 15,
 offset 1ce7c, dts 15000, size 4045, distance 15, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 16,
 offset 1f6a5, dts 16000, size 8956, distance 16, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 17,
 offset 219a1, dts 17000, size 9253, distance 17, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 18,
 offset 23dc6, dts 18000, size 10984, distance 18, keyframe 0
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] AVIndex stream 0, sample 19,
 offset 268ae, dts 19000, size 5354, distance 19, keyframe 0



 ...





 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] All info found
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] stream 0: start_time: 0.055
 duration: 69.179
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] stream 1: start_time: 0
 duration: 69.248
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] format: start_time: 0 duration:
 69.259 (estimate from stream) bitrate=2697 kb/s
 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55967fe54240] After
 avformat_find_stream_info() pos: 122805 bytes read:196065 seeks:2
 frames:17
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
   Metadata:
     major_brand     : mp42
     minor_version   : 0
     compatible_brands: mp41isom
     creation_time   : 2023-06-03T08:37:14.000000Z
   Duration: 00:01:09.26, start: 0.000000, bitrate: 2697 kb/s
   Stream #0:0(und), 16, 1/60000: Video: h264 (Main), 1 reference frame
 (avc1 / 0x31637661), yuv420p(left), 1920x1080 (1920x1088) [SAR 1:1 DAR
 16:9], 0/1, 2533 kb/s, 60 fps, 60 tbr, 60k tbn, 120 tbc (default)
     Metadata:
       creation_time   : 2023-06-03T08:37:14.000000Z
       handler_name    : VideoHandler
       vendor_id       : [0][0][0][0]
       encoder         : AVC Coding
   Stream #0:1(und), 1, 1/48000: Audio: aac (LC) (mp4a / 0x6134706D), 48000
 Hz, stereo, fltp, 163 kb/s (default)
     Metadata:
       creation_time   : 2023-06-03T08:37:14.000000Z
       handler_name    : SoundHandler
       vendor_id       : [0][0][0][0]
 Successfully opened the file.
 At least one output file must be specified
 [AVIOContext @ 0x55967fe5d280] Statistics: 196065 bytes read, 2 seeks
 }}}
-- 
Ticket URL: <https://trac.ffmpeg.org/ticket/10399>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list