<div dir="ltr">I was actually able to figure this one out.  swr_convert was not converting the entire frame of data so a subsequent call was needed using null parameters to flush out the rest of the samples.  This was done because the output buffer was not big enough apparently.  Also the source and destination frame size were not the same.  The source was 576, while the destination was 1024, so I created a fifo buffer in order to store the converted samples until there was enough to enter into a packet.<div>
<br></div><div>Does anyone know how the memory management works under the hood when using av_interleaved_write_frame?  We were running tests and were wondering if you were to interleave the packets yourself if buffering would be mitigated.   It sounds like everything is buffered until the end, but is it smart enough to know that if an audio packet exists that there is no need to buffer anything previous to that?  I found that at times I would be able to interleave by hand, but at other times the audio samples would not fit and needed to be placed at the end (typically during the finish flush).</div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Jul 13, 2014 at 2:58 AM, Ryan Routon <span dir="ltr"><<a href="mailto:ryanrouton@gmail.com" target="_blank">ryanrouton@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span style="font-family:arial,sans-serif;font-size:13px">Hello All,</span><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">First of all this is my first post to the mailing list so hello all =)</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">Recently I have been working with fmpeg in order to render an mp4 using still images via rgba values and a user selected audio file.  I have set up the video portion which encodes perfectly. I have used the <a href="https://www.ffmpeg.org/doxygen/trunk/group__lavc__decoding.html#ga834bb1b062fbcc2de4cf7fb93f154a3e" style="color:rgb(70,101,162);font-weight:bold;font-family:Roboto,sans-serif;font-size:14px;line-height:19px" target="_blank">avcodec_decode_audio4</a> function to pull frames from a user selected audio file.  Once I get the frame the samples are then stored in <span style="font-family:Menlo;font-size:11px">AV_SAMPLE_FMT_FLTP, so at this point I either have to manually cast the values as a short or use the sampler functionality.  I did the first as a proof of concept but in order to make the conversion as flexible as possible I opted for the sampler route.  I set my sampler as follows:</span></div>

<div style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:Menlo;font-size:11px"><br></span></div><div><p style="font-family:Menlo;font-size:11px;margin:0px;color:rgb(0,132,0)"><span style="color:rgb(0,0,0)">        </span>/* set options */</p>


<p style="font-family:Menlo;font-size:11px;margin:0px">        <span style="color:rgb(49,89,93)">av_opt_set_int</span>       (<span style="color:rgb(79,129,135)">swr_ctx</span>, <span style="color:rgb(209,47,27)">"in_channel_count"</span>,   frameIn-><span style="color:rgb(79,129,135)">channels</span>,                 <span style="color:rgb(39,42,216)">0</span>);</p>


<p style="font-family:Menlo;font-size:11px;margin:0px">        <span style="color:rgb(49,89,93)">av_opt_set_int</span>       (<span style="color:rgb(79,129,135)">swr_ctx</span>, <span style="color:rgb(209,47,27)">"in_sample_rate"</span>,     frameIn-><span style="color:rgb(79,129,135)">sample_rate</span>,              <span style="color:rgb(39,42,216)">0</span>);</p>


<p style="font-family:Menlo;font-size:11px;margin:0px">        <span style="color:rgb(49,89,93)">av_opt_set_sample_fmt</span>(<span style="color:rgb(79,129,135)">swr_ctx</span>, <span style="color:rgb(209,47,27)">"in_sample_fmt"</span>,      (<span style="color:rgb(79,129,135)">AVSampleFormat</span>)frameIn-><span style="color:rgb(79,129,135)">format</span>,   <span style="color:rgb(39,42,216)">0</span>);</p>


<p style="font-family:Menlo;font-size:11px;margin:0px">        <span style="color:rgb(49,89,93)">av_opt_set_int</span>       (<span style="color:rgb(79,129,135)">swr_ctx</span>, <span style="color:rgb(209,47,27)">"out_channel_count"</span>,  codecOut-><span style="color:rgb(79,129,135)">channels</span>,                <span style="color:rgb(39,42,216)">0</span>);</p>


<p style="font-family:Menlo;font-size:11px;margin:0px">        <span style="color:rgb(49,89,93)">av_opt_set_int</span>       (<span style="color:rgb(79,129,135)">swr_ctx</span>, <span style="color:rgb(209,47,27)">"out_sample_rate"</span>,    codecOut-><span style="color:rgb(79,129,135)">sample_rate</span>,             <span style="color:rgb(39,42,216)">0</span>);</p>


<p style="font-family:Menlo;font-size:11px;margin:0px">        <span style="color:rgb(49,89,93)">av_opt_set_sample_fmt</span>(<span style="color:rgb(79,129,135)">swr_ctx</span>, <span style="color:rgb(209,47,27)">"out_sample_fmt"</span>,     codecOut-><span style="color:rgb(79,129,135)">sample_fmt</span>,              <span style="color:rgb(39,42,216)">0</span>);</p>

<p style="font-family:Menlo;font-size:11px;margin:0px"><br></p><p style="font-family:Menlo;font-size:11px;margin:0px">my target audio profile is as follows:</p><p style="font-family:Menlo;font-size:11px;margin:0px"><br></p>

<p style="font-family:Menlo;font-size:11px;margin:0px">            (*codec)-><span style="color:rgb(79,129,135)">sample_fmts</span>[<span style="color:rgb(39,42,216)">0</span>] : <span style="color:rgb(49,89,93)">AV_SAMPLE_FMT_FLTP</span>;</p>

<p style="font-family:Menlo;font-size:11px;margin:0px">            c-><span style="color:rgb(79,129,135)">bit_rate</span>    = <span style="color:rgb(39,42,216)">64000</span>;</p><p style="font-family:Menlo;font-size:11px;margin:0px">

            c-><span style="color:rgb(79,129,135)">sample_rate</span> = <span style="color:rgb(39,42,216)">22050</span>;</p><p style="font-family:Menlo;font-size:11px;margin:0px">            c-><span style="color:rgb(79,129,135)">channels</span>    = <span style="color:rgb(39,42,216)">2</span>;</p>

<p style="font-family:Menlo;font-size:11px;margin:0px">



</p><p style="font-family:Menlo;font-size:11px;margin:0px;color:rgb(120,73,42)"><span style="color:rgb(0,0,0)">            c-></span><span style="color:rgb(79,129,135)">channel_layout</span><span style="color:rgb(0,0,0)"> = </span>AV_CH_LAYOUT_STEREO<span style="color:rgb(0,0,0)">;</span></p>

<p style="font-family:Menlo;font-size:11px;margin:0px;color:rgb(120,73,42)"><span style="color:rgb(0,0,0)"><br></span></p><p style="margin:0px"><font color="#000000" face="Menlo"><span style="font-size:11px">I has a hell of a time debugging random black box crashes in the swr_convert file and it turned out to be that I was passing the uint8 ** by value instead of reference so when there was more than one more channel it would crash, but changing my parameter from ->data to &->data[0] solved that.  So just in case that might help anyone having that particular problem...   </span></font></p>

<p style="margin:0px"><font color="#000000" face="Menlo"><span style="font-size:11px"><br></span></font></p><p style="margin:0px"><font color="#000000" face="Menlo"><span style="font-size:11px"><br></span></font></p><p style="font-family:Menlo;font-size:11px;margin:0px;color:rgb(120,73,42)">

<span style="color:rgb(0,0,0)">..Anyways, Now the sampler works great for a 1 channel file at the same sample rate.  When I change the input file to a 2 channel or change the sample rate i still hear the audio but it is very disjointed as if there are gaps in the samples during the encoding process.  </span></p>

<p style="font-family:Menlo;font-size:11px;margin:0px;color:rgb(120,73,42)"><span style="color:rgb(0,0,0)"><br></span></p><p style="font-family:Menlo;font-size:11px;margin:0px;color:rgb(120,73,42)"><span style="color:rgb(0,0,0)">Here is the function, where the get_audio_frame() function uses the decode 4 function:</span></p>

<p style="font-family:Menlo;font-size:11px;margin:0px;color:rgb(120,73,42)"><span style="color:rgb(0,0,0)"><br></span></p><p style="margin:0px;font-size:11px;font-family:Menlo"><span style="color:rgb(187,44,162)">static</span> <span style="color:rgb(187,44,162)">int</span> write_audio_frame_flip(<span style="color:rgb(79,129,135)">AVFormatContext</span> *oc, <span style="color:rgb(79,129,135)">OutputStream</span> *ost)</p>

<p style="margin:0px;font-size:11px;font-family:Menlo">{</p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(79,129,135)"><span style="color:rgb(0,0,0)">    </span>AVCodecContext<span style="color:rgb(0,0,0)"> *c;</span></p>

<p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(0,132,0)"><span style="color:rgb(0,0,0)">    </span><span style="color:rgb(79,129,135)">AVPacket</span><span style="color:rgb(0,0,0)"> pkt = { </span><span style="color:rgb(39,42,216)">0</span><span style="color:rgb(0,0,0)"> }; </span>// data and size must be 0;</p>

<p style="margin:0px;font-size:11px;font-family:Menlo">    <span style="color:rgb(79,129,135)">AVFrame</span> *frame;</p><p style="margin:0px;font-size:11px;font-family:Menlo">    <span style="color:rgb(187,44,162)">int</span> ret;</p>

<p style="margin:0px;font-size:11px;font-family:Menlo">    <span style="color:rgb(187,44,162)">int</span> got_packet;</p><p style="margin:0px;font-size:11px;font-family:Menlo">    <span style="color:rgb(187,44,162)">int</span> dst_nb_samples;</p>

<p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">    </p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(49,89,93)"><span style="color:rgb(0,0,0)">    </span>av_init_packet<span style="color:rgb(0,0,0)">(&pkt);</span></p>

<p style="margin:0px;font-size:11px;font-family:Menlo">    c = ost-><span style="color:rgb(79,129,135)">st</span>-><span style="color:rgb(79,129,135)">codec</span>;</p><p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">

    </p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(0,132,0)"><span style="color:rgb(0,0,0)">    </span>//Get decoded audio frame</p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(49,89,93)">

<span style="color:rgb(0,0,0)">    frame = </span>get_audio_frame_flip<span style="color:rgb(0,0,0)">(ost);</span></p><p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">    </p><p style="margin:0px;font-size:11px;font-family:Menlo">

    <span style="color:rgb(187,44,162)">if</span> (frame)</p><p style="margin:0px;font-size:11px;font-family:Menlo">    {</p><p style="margin:0px;font-size:11px;font-family:Menlo">        <span style="color:rgb(49,89,93)">SetSampler</span>(c, frame);</p>

<p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">     </p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(0,132,0)"><span style="color:rgb(0,0,0)">        </span>// convert samples from native format to destination codec format, using the resampler</p>

<p style="margin:0px;font-size:11px;font-family:Menlo">        <span style="color:rgb(187,44,162)">if</span> (<span style="color:rgb(79,129,135)">swr_ctx</span>)</p><p style="margin:0px;font-size:11px;font-family:Menlo">
        {</p>
<p style="margin:0px;font-size:11px;font-family:Menlo">            <span style="color:rgb(187,44,162)">if</span> (ret < <span style="color:rgb(39,42,216)">0</span>) {</p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(209,47,27)">

<span style="color:rgb(0,0,0)">                </span><span style="color:rgb(61,29,129)">fprintf</span><span style="color:rgb(0,0,0)">(</span><span style="color:rgb(120,73,42)">stderr</span><span style="color:rgb(0,0,0)">, </span>"Could not allocate destination samples\n"<span style="color:rgb(0,0,0)">);</span></p>

<p style="margin:0px;font-size:11px;font-family:Menlo">                <span style="color:rgb(187,44,162)">return</span> <span style="color:rgb(39,42,216)">0</span>;</p><p style="margin:0px;font-size:11px;font-family:Menlo">

            }</p><p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">            </p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(209,47,27)"><span style="color:rgb(0,0,0)">            </span><span style="color:rgb(120,73,42)">LOGD</span><span style="color:rgb(0,0,0)">(</span>"Scaling frame->sample_rate=%d frame->nb_samples=%d c->sample_rate=%d"<span style="color:rgb(0,0,0)">, frame-></span><span style="color:rgb(79,129,135)">sample_rate</span><span style="color:rgb(0,0,0)">, frame-></span><span style="color:rgb(79,129,135)">nb_samples</span><span style="color:rgb(0,0,0)">, c-></span><span style="color:rgb(79,129,135)">sample_rate</span><span style="color:rgb(0,0,0)">);</span></p>

<p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">            </p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(0,132,0)"><span style="color:rgb(0,0,0)">            </span>// compute destination number of samples</p>

<p style="margin:0px;font-size:11px;font-family:Menlo">            dst_nb_samples = <span style="color:rgb(49,89,93)">av_rescale_rnd</span>(<span style="color:rgb(49,89,93)">swr_get_delay</span>(<span style="color:rgb(79,129,135)">swr_ctx</span>, frame-><span style="color:rgb(79,129,135)">sample_rate</span>) + frame-><span style="color:rgb(79,129,135)">nb_samples</span>,</p>

<p style="margin:0px;font-size:11px;font-family:Menlo">                                            c-><span style="color:rgb(79,129,135)">sample_rate</span>, frame-><span style="color:rgb(79,129,135)">sample_rate</span>, <span style="color:rgb(49,89,93)">AV_ROUND_UP</span>);</p>

<p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">            </p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(0,132,0)"><span style="color:rgb(0,0,0)">            </span>//convert the samples</p>

<p style="margin:0px;font-size:11px;font-family:Menlo">            ret = <span style="color:rgb(49,89,93)">swr_convert</span>(<span style="color:rgb(79,129,135)">swr_ctx</span>, &ost-><span style="color:rgb(79,129,135)">tmp_frame</span>-><span style="color:rgb(79,129,135)">data</span>[<span style="color:rgb(39,42,216)">0</span>], dst_nb_samples, (<span style="color:rgb(187,44,162)">const</span> <span style="color:rgb(112,61,170)">uint8_t</span> **)&frame-><span style="color:rgb(79,129,135)">data</span>[<span style="color:rgb(39,42,216)">0</span>], dst_nb_samples);</p>

<p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">            </p><p style="margin:0px;font-size:11px;font-family:Menlo">            <span style="color:rgb(187,44,162)">if</span> (ret < <span style="color:rgb(39,42,216)">0</span>) {</p>

<p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(209,47,27)"><span style="color:rgb(0,0,0)">                </span><span style="color:rgb(120,73,42)">LOGD</span><span style="color:rgb(0,0,0)">(</span>"Error while converting\n"<span style="color:rgb(0,0,0)">);</span></p>

<p style="margin:0px;font-size:11px;font-family:Menlo">                <span style="color:rgb(187,44,162)">return</span> <span style="color:rgb(39,42,216)">0</span>;</p><p style="margin:0px;font-size:11px;font-family:Menlo">

            }</p><p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">          </p><p style="margin:0px;font-size:11px;font-family:Menlo">            frame = ost-><span style="color:rgb(79,129,135)">tmp_frame</span>;</p>

<p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">            </p><p style="margin:0px;font-size:11px;font-family:Menlo">        } <span style="color:rgb(187,44,162)">else</span> {</p><p style="margin:0px;font-size:11px;font-family:Menlo">

            dst_nb_samples = frame-><span style="color:rgb(79,129,135)">nb_samples</span>;</p><p style="margin:0px;font-size:11px;font-family:Menlo">        }</p><p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">

        </p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(79,129,135)"><span style="color:rgb(0,0,0)">        frame-></span>pts<span style="color:rgb(0,0,0)"> = </span><span style="color:rgb(49,89,93)">av_rescale_q</span><span style="color:rgb(0,0,0)">(</span>samples_count<span style="color:rgb(0,0,0)">, (</span>AVRational<span style="color:rgb(0,0,0)">){</span><span style="color:rgb(39,42,216)">1</span><span style="color:rgb(0,0,0)">, c-></span>sample_rate<span style="color:rgb(0,0,0)">}, c-></span>time_base<span style="color:rgb(0,0,0)">);</span></p>

<p style="margin:0px;font-size:11px;font-family:Menlo">        <span style="color:rgb(79,129,135)">samples_count</span> += dst_nb_samples;</p><p style="margin:0px;font-size:11px;font-family:Menlo">    }</p><p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">

    </p><p style="margin:0px;font-size:11px;font-family:Menlo">    ret = <span style="color:rgb(49,89,93)">avcodec_encode_audio2</span>(c, &pkt, frame, &got_packet);</p><p style="margin:0px;font-size:11px;font-family:Menlo">

    <span style="color:rgb(187,44,162)">if</span> (ret < <span style="color:rgb(39,42,216)">0</span>) {</p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(209,47,27)"><span style="color:rgb(0,0,0)">        </span><span style="color:rgb(61,29,129)">fprintf</span><span style="color:rgb(0,0,0)">(</span><span style="color:rgb(120,73,42)">stderr</span><span style="color:rgb(0,0,0)">, </span>"Error encoding audio frame: %s\n"<span style="color:rgb(0,0,0)">, </span><span style="color:rgb(120,73,42)">av_err2str</span><span style="color:rgb(0,0,0)">(ret));</span></p>

<p style="margin:0px;font-size:11px;font-family:Menlo">        <span style="color:rgb(187,44,162)">return</span> <span style="color:rgb(39,42,216)">0</span>;</p><p style="margin:0px;font-size:11px;font-family:Menlo">    }</p>

<p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">    </p><p style="margin:0px;font-size:11px;font-family:Menlo">    <span style="color:rgb(187,44,162)">if</span> (got_packet) {</p><p style="margin:0px;font-size:11px;font-family:Menlo">

        ret = <span style="color:rgb(49,89,93)">write_frame</span>(oc, &c-><span style="color:rgb(79,129,135)">time_base</span>, ost-><span style="color:rgb(79,129,135)">st</span>, &pkt);</p><p style="margin:0px;font-size:11px;font-family:Menlo">

        <span style="color:rgb(187,44,162)">if</span> (ret < <span style="color:rgb(39,42,216)">0</span>) {</p><p style="margin:0px;font-size:11px;font-family:Menlo;color:rgb(209,47,27)"><span style="color:rgb(0,0,0)">            </span><span style="color:rgb(61,29,129)">fprintf</span><span style="color:rgb(0,0,0)">(</span><span style="color:rgb(120,73,42)">stderr</span><span style="color:rgb(0,0,0)">, </span>"Error while writing audio frame: %s\n"<span style="color:rgb(0,0,0)">,</span></p>

<p style="margin:0px;font-size:11px;font-family:Menlo">                    <span style="color:rgb(120,73,42)">av_err2str</span>(ret));</p><p style="margin:0px;font-size:11px;font-family:Menlo">            <span style="color:rgb(187,44,162)">return</span> <span style="color:rgb(39,42,216)">0</span>;</p>

<p style="margin:0px;font-size:11px;font-family:Menlo">        }</p><p style="margin:0px;font-size:11px;font-family:Menlo">    }</p><p style="margin:0px;font-size:11px;font-family:Menlo;min-height:13px">    </p><p style="margin:0px;font-size:11px;font-family:Menlo">

    <span style="color:rgb(187,44,162)">return</span> (frame || got_packet) ? <span style="color:rgb(39,42,216)">0</span> : <span style="color:rgb(39,42,216)">1</span>;</p><p style="font-family:Menlo;font-size:11px;margin:0px;color:rgb(120,73,42)">




































































</p><p style="margin:0px;font-size:11px;font-family:Menlo">}</p></div><div><br></div><div><br></div><div>Thank you for any insight into what I might be doing wrong, this is all very new to me and I am sure that I am misunderstanding some call or concept.  </div>
<span class="HOEnZb"><font color="#888888">
<div><br></div><div>Ryan</div></font></span></div>
</blockquote></div><br></div>