[Libav-user] [help needed] how does libav handle s16le pcm ?

Tue Apr 2 23:02:58 EEST 2024

Hello,

This is my first post there, and I'm not fluent with english, so 
apologies if I'm not very precise, and don't hesitate to ask.

Since a long while, I'm learning ffmpeg / libav API, because I'd like to 
mux audio and video from any source (e.g. video + audio from a webcam as 
first target.

My current code is originaly based on Fabrice Bellard muxing.c code. I 
modified it for recording both video and audio. For portability reasons, 
I'm using SDL2 (+alsa) on Linux at the begining, but I know how to 
cross-compile this software for Windows once I'll made it work (as I do 
with miniDart).

Currently, I'm recording videos since a very long times (using OpenCV), 
converting the cv::Mat into ffmpeg video frames. No problem with that, 
it simply works well since a long time.

See: https://framagit.org/ericb/audiorecord/-/tree/master/Sources/step8
(muxer.cpp contains most of the code)

With satisfied dependencies, using make, it should build without any 
issue on Linux, using alsa, SDL2, ffmpeg and opencv APIs.

The problematic part is the audio part, IMHO located in 
get_audio_frame() : using Alsa driver, I can catch SDL audio as 8 bits 
mono, using SDL_DequeueAudio (or an SDL callback, giving similar 
results), and mux both audio and video. This way, I can create .mp4, 
.mkv, .flv, .avi  or .mov.

mpv or vlc or ffplay can play these vidéos, and ffprobe seems to provide 
good information.The nice thing is that audio and video are extremely 
well synchronized : it simply works.

But I'm not satisfied, because of important distorsion (a 16 bits 
solution would be perfect), and I fear I miss something important or I'm 
plain wrong somewhere.

Last, the output context seems to -most of the time- request 
AV_SAMPLE_FMT_FLTP and I have no clue about how to send data to the 
muxer to correctly do the job.

More precisely, SDL uses alsa driver (who is an alsa binding in fact) 
and allows to open a recording device. Then I can dequeue pcm data. 
Those data are dequeued in an audioBuffer, and stored in an audio frame, 
(see write_audio_frame(), calling get_audio_frame() and so on) before to 
be added to the video, including the right timestamp.

I know there is something important yet missing, e.g. something like 
create an inpustream + and probably custom AVIOContext pointing the the 
audioBuffer, convert it and so on, but before to start this difficult 
work, I need help and advices : what shall I exactly do to convert s16le 
pcm data into ffmpeg to correctly "present" these data to allow ffmpeg 
to correctly handle them ?

The point is I didn't find a satisfying libav documentation -excepted in 
some complicated examples- in ffmpeg source code, explaining how to 
convert data without open a file. But I  prefer ask ...

Schematicaly:

SDL opens a recording device and provides alsa pcm dequeued audio data 
using SDL_DequeueAudio(). The audio frame is filled with an (int8 *) 
audioBuffer -s16le-

... do I correctly fill the audioBuffer in ?
... and what do after ???

At the end I know there is an output audio context converting the audio 
in AV_FMT_FLTP format and everything is muxed => the final video. But 
either the sound is garbage, or there is no sound at all, with all the 
tries I did.

(sorry if I'm unprecise, but I'm stuck since a long while and after a 
lot of researches I have no clue ... )

BTW: step 9 will contain the s16le implementation, contains more debug 
information, and is located here :
https://framagit.org/ericb/audiorecord/-/tree/master/Sources/step9

I hope I was clear. Thanks in advance for any help, or any suggestion.