[Libav-user] Encoding FLOAT PCM to OGG using libav

Thu Jul 31 09:24:38 CEST 2014

I am currently trying to convert a raw PCM Float buffer to an OGG encoded
file. I tried several library to do the encoding process and I finally
chose libavcodec.

What I precisely want to do is get the float buffer ([-1;1]) provided by my
audio library and turn it to a char buffer of encoded ogg data.

I managed to encode the float buffer to a buffer of encoded MP2 with this
(proof of concept) code:

static int frameEncoded;

FILE *file;

int main(int argc, char *argv[])
{
    file = fopen("file.ogg", "w+");

    long ret;

    avcodec_register_all();

    codec = avcodec_find_encoder(AV_CODEC_ID_MP2);
    if (!codec) {
        fprintf(stderr, "codec not found\n");
        exit(1);
    }

    c = avcodec_alloc_context3(NULL);

    c->bit_rate = 256000;
    c->sample_rate = 44100;
    c->channels = 2;
    c->sample_fmt = AV_SAMPLE_FMT_S16;
    c->channel_layout = AV_CH_LAYOUT_STEREO;

    /* open it */
    if (avcodec_open2(c, codec, NULL) < 0) {
        fprintf(stderr, "Could not open codec\n");
        exit(1);
    }

    /* frame containing input raw audio */
    frame = av_frame_alloc();
    if (!frame) {
        fprintf(stderr, "Could not allocate audio frame\n");
        exit(1);
    }

    frame->nb_samples     = c->frame_size;
    frame->format         = c->sample_fmt;
    frame->channel_layout = c->channel_layout;

    /* the codec gives us the frame size, in samples,
     * we calculate the size of the samples buffer in bytes */
    int buffer_size = av_samples_get_buffer_size(NULL, c->channels,
c->frame_size,
                                                 c->sample_fmt, 0);
    if (buffer_size < 0) {
        fprintf(stderr, "Could not get sample buffer size\n");
        exit(1);
    }
    samples = av_malloc(buffer_size);
    if (!samples) {
        fprintf(stderr, "Could not allocate %d bytes for samples buffer\n",
                buffer_size);
        exit(1);
    }
    /* setup the data pointers in the AVFrame */
    ret = avcodec_fill_audio_frame(frame, c->channels, c->sample_fmt,
                                   (const uint8_t*)samples, buffer_size, 0);
    if (ret < 0) {
        fprintf(stderr, "Could not setup audio frame\n");
        exit(1);
    }
}

void  myLibraryCallback(float *inbuffer, unsigned int length)
{
    for(int j = 0; j < (2 * length); j++) {
        if(frameEncoded >= (c->frame_size *2)) {
            int avret, got_output;

            av_init_packet(&pkt);
            pkt.data = NULL; // packet data will be allocated by the encoder
            pkt.size = 0;

            avret = avcodec_encode_audio2(c, &pkt, frame, &got_output);
            if (avret < 0) {
                fprintf(stderr, "Error encoding audio frame\n");
                exit(1);
            }
            if (got_output) {
                fwrite(pkt.data, 1, pkt.size, file);
                av_free_packet(&pkt);
            }

            frameEncoded = 0;
        }

        samples[frameEncoded] = inbuffer[j] * SHRT_MAX;
        frameEncoded++;
    }
}

The code is really simple, I initialize libavencode the usual way, then my
audio library sends me processed PCM FLOAT [-1;1] interleaved at 44.1Khz
and the number of floats (usually 1024) in the inbuffer for each channel (2
for stereo). So usually, inbuffer contains 2048 floats.

That was easy since I just needed here to convert my PCM to 16P, both
interleaved. Moreover it is possible to code a 16P sample on a single char.

Now I would like to apply this to OGG which needs a sample format of
AV_SAMPLE_FMT_FLTP. Since my native format is AV_SAMPLE_FMT_FLT, it should
only be some desinterleaving. Which is really easy to do.

The points I don't get are:

   1. How can you send a float buffer on a char buffer ? Do we treat them
   as-is (float* floatSamples = (float*) samples) ? If so, what means the
   sample number avcodec gives you ? Is it the number of floats or chars ?
   2. How can you send datas on two buffers (one for left, one for right)
   when avcodec_fill_audio_frame only takes a (uint8_t*) parameter and not a
   (uint8_t**) for multiple channels ? Does-it completely change the previous
   sample code ?

I tried to find some answers myself and I made a LOT of experiments so far
but I failed on theses points. Since there is a huge lack of documentation
on these, I would be very grateful if you had answers.

Thank you !
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ffmpeg.org/pipermail/libav-user/attachments/20140731/fa440767/attachment.html>