Hi everyone.

I have a question about the FFmpeg library, and more precisely about the *audio
management* in case of no symmetry between video and audio frames.

First I will explain my situation.
I work on an open source project, which is a C++ wrapper of FFmpeg, called
avTranscoder (https://github.com/avTranscoder/avTranscoder). The goal is to
provide a high level API to rewrap / transcode media easily.
Currently we have an issue about one of our common case: get a MOV file,
transcode video and audio at the same time, and create our output media

Indeed, we decode, convert, and encode sequentially video and audio frames.
*ffprobe* tells us about our input file:

*Stream #0:0(eng): Video: dnxhd (AVdn / 0x6E645641), yuv422p, 1920x1080,
121241 kb/s, 25 fps, 25 tbr, 25k tbn, 25k tbc*
*Stream #0:1(eng): Audio: pcm_s24le (in24 / 0x34326E69), 48000 Hz, 2
channels, s32, 2304 kb/s*

After few calculations, we expect:

*Video packet = 1 video frame*
*Audio packet = 1920 audio samples (sample_rate / fps)*

Unfortunately, we have more audio packets than video packets (there is in
fact *1024 audio samples* per packet, as I can see with *ffprobe
-show_packets*), and our output is "frame wrapped" (implying same video
packets as audio packets). This requires to manage buffer(s) to encode
audio packets correctly.

So my question is: how does ffmpeg buffer video and audio data in such a
case ?
After looking at the ffmpeg.c file, we believe the solution starts here:
https://github.com/FFmpeg/FFmpeg/blob/master/ffmpeg.c#L713 Isn't it ?

