[FFmpeg-user] Adding narration to a video

Mon Dec 2 19:33:42 EET 2024

Hi, as unlikely as it might seem, I'm a blind person trying to create a 
promotional video. I'm in a group of disabled rock climbers and I have a 
video taken with a GoPro attached to my climbing helmet the first time I 
rappelled down a cliff.  I want to add narration explaining what is 
going on and talking up the group itself. I recorded a bunch of mp3 
files with the audio narration. I've been googling for weeks on how to 
add those audio files into the video stream but I just cannot get it 
exactly right no matter what I do. Here is what I have tried so far:

1. Programmatically cut the video into segments so that a segment cut 
out of the middel would exactly match the length of my audio, added the 
audio to the segment, and patched the segments back together.

Actually, this works pretty well except that there is a little jump at 
each of the splices. Still, I figure there has to be a better way.

2.  Programmatically strung together a command to use itsoffset to 
insert at the appropriate points. I know, you're saying itsoffset 
doesn't work for audio -- but it's not really my fault I thought this 
would work, there are lots of posts out there saying it does. I pondered 
trying to adapt my script to use adealy in a filter_complex but it got 
to be too complicated.

3. Programmatically looped through each sound file, used a 
filter_complex clause with adelay to add the audio at the appropriate 
time. Each iteration of the loop required about 45 seconds to execute on 
my 20-core I5 with 32GB, and an SSD drive. I'd be okay with that but it 
didn't work. The original audio was muffled and the clips weren't where 
I wanted them.

3. Programmatically used sox to splice the audio files together padding 
with silence so the narration would line up with the appropriate places 
in the video. If I play the original video and the spliced audio file in 
separate processes but at the same time, it sounds perfect, the 
narration lines up perfectly with the video. But when I merge them, they 
don't line up at all. This one is particularly puzzling. Not only does 
the narration not line up, some of the clips are repeated. Like in the 
audio file, I say, "Here I loos my footing."  just once at 1:09. On the 
video, it doesn't play until 1:54 but then it plays again at 2:33 and 
again at 2:51. Wierd.

Here is the command I used to try this:

ffmpeg -i original.mp4 -i narration.mp3 -c:v copy \
     -filter_complex "[0:a][1:a] amix=inputs=2:duration=longest [audioin]" \
     -map 0:v -map "[audioin]" \
     -y garbage.mp4
I am running Debian testing with ffmpeg version 7.1.3.  I am writing 
this in bash scripts and using ffprobe to get the length of the audio 
segments. The audio files are mp3 files named for the point where I want 
them inserted. For example, i want to introduce myself 28 seconds into 
the video so that file is named 0028000-intro.mp3. In the script, I can 
extract the 28000 and use that in an adelay filter or devide by 1000 for 
the -t attribute. I am pretty sure I am doing the math right.

Here is a link to the most successful effort I've had so far via method1:

https://people.math.wisc.edu/~jheim/Climbing/video.mp4

That works pretty well but I figure method 3 has to be the right 
approach. Make an audio file with the narration that lines up with the 
video. Then add that extra audio track to the original video. Adding an 
mp3 file to a segment of the video works fine but it doesn't work when I 
try to do the whole thing at once.

What the heck am I doing wrong?