<div dir="ltr"><font face="monospace, monospace">Hi.</font><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">I am capturing video+audio of an IP camera(RTSP), transcoding it and saving it to file.</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">The input looks like this:</font></div><div><font face="monospace, monospace"><br></font></div><div><div><font face="monospace, monospace">Input #0, rtsp, from 'rtsp://<a href="http://10.1.1.22/?line=1&enableaudio=1&audio_line=1">10.1.1.22/?line=1&enableaudio=1&audio_line=1</a>':</font></div><div><font face="monospace, monospace"> Metadata:</font></div><div><font face="monospace, monospace"> title : LIVE VIEW</font></div><div><font face="monospace, monospace"> Duration: N/A, start: -0.001000, bitrate: N/A</font></div><div><font face="monospace, monospace"> Stream #0:0: Video: h264 (Main), yuv420p(tv, bt470bg/bt470bg/bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 tbr, 90k tbn, 180k tbc</font></div><div><font face="monospace, monospace"> Stream #0:1: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s</font></div></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">Analyzing the audio stream, once decoded, I get this:</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"><div>if (_frame->frame->height == 0) { // Easy way to check for audio frame.</div><div><span style="white-space:pre"> </span>std::cout << _frame->frame->pts << ", " << _frame->frame->pkt_duration << ", " << _frame->frame->nb_samples << std::endl;</div><div>}</div><div><br></div><div><br></div><div><div>4472, 640, 640 // Relative pts: unknown</div><div>5112, 640, 640 <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">// Relative pts: 640</span></div><div>5784, 640, 640 <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">// Relative pts: 672</span></div><div>6424, 640, 640 <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">// Relative pts: 640</span></div><div>7064, 640, 640 <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">// Relative pts: 640</span></div><div>7704, 640, 640 <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">// Relative pts: 640</span></div><div>8344, 640, 640 <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">// Relative pts: 640</span></div><div>8952, 640, 640 <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">// Relative pts: 608</span></div><div>9592, 640, 640 <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">// Relative pts: 640</span></div><div>10232, 640, 640
<span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">// Relative pts: 640</span></div><div>10904, 640, 640
<span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">// Relative pts:
<span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">672</span>
</span>
</div></div><div><br></div><div><br></div><div>It can be seen that all the frames have the same pkt_duration and nb_samples, but not the same relative(compared to the previous frame) pts.</div><div>Most of the frames have the expected relative pts of 640 but some have 672(+32) and others have 608(-32).</div><div><br></div><div>Is this behavior normal or an issue with the Ip Camera? </div><div><br></div><div><br></div><div>When resampling the audio(swr_convert()) I check for lost audio frames and fill silence samples if I detect a lost frame. The code looks like this: </div><div><br></div><div><div>void AudioResampler::putFrame(Frame* frame) {</div><div><span style="white-space:pre"> </span>if (frame->frame->pts > expectedInputPts) {</div><div><span style="white-space:pre"> </span>this->fillSilenceSamples(frame);</div><div><span style="white-space:pre"> </span>}</div></div><div><br></div><div>
<span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;white-space:pre"> </span>// Rest of the function.....<br></div><div><br></div><div>
<span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;white-space:pre"> </span>this->expectedInputPts = frame->frame->pts + frame->frame->pkt_duration;<br></div><div><br></div><div>
<span style="font-size:small;text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255);white-space:pre"> </span>return;<br></div><div>}</div><div><br></div><div>This code works fine when the relative pts between frames is constant and equal to the <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">pkt_duration, but that is not the case with the Ip camera I am using.</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Is there a better approach for audio resampling?</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Thanks.</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">The full code:</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><div>AudioResampler::AudioResampler(</div><div><span style="white-space:pre"> </span>Decoder* decoder,</div><div><span style="white-space:pre"> </span>EncoderAudio* encoder</div><div>) {</div><div><span style="white-space:pre"> </span>this->decoder_ctx = decoder->codec_ctx;</div><div><span style="white-space:pre"> </span>this->encoder_ctx = encoder->codec_ctx;</div><div><br></div><div><span style="white-space:pre"> </span>this->resample_context = swr_alloc_set_opts(NULL,</div><div><span style="white-space:pre"> </span>this->encoder_ctx->channel_layout,<span style="white-space:pre"> </span>// out_ch_layout</div><div><span style="white-space:pre"> </span>this->encoder_ctx->sample_fmt,<span style="white-space:pre"> </span>// out_sample_fmt</div><div><span style="white-space:pre"> </span>this->encoder_ctx->sample_rate,<span style="white-space:pre"> </span>// out_sample_rate</div><div><span style="white-space:pre"> </span>this->decoder_ctx->channel_layout,<span style="white-space:pre"> </span>// in_ch_layout</div><div><span style="white-space:pre"> </span>this->decoder_ctx->sample_fmt,<span style="white-space:pre"> </span>// in_sample_fmt</div><div><span style="white-space:pre"> </span>this->decoder_ctx->sample_rate,<span style="white-space:pre"> </span>// in_sample_rate</div><div><span style="white-space:pre"> </span>0,<span style="white-space:pre"> </span>// log_offset</div><div><span style="white-space:pre"> </span>NULL);<span style="white-space:pre"> </span>// log_ctx</div><div><span style="white-space:pre"> </span>if (this->resample_context == NULL)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error swr_alloc_set_opts().");</div><div><br></div><div><span style="white-space:pre"> </span>int ret;</div><div><span style="white-space:pre"> </span>ret = swr_init(this->resample_context);</div><div><span style="white-space:pre"> </span>if (ret < 0)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error swr_init().");</div><div><br></div><div><span style="white-space:pre"> </span>this->fifo = av_audio_fifo_alloc(</div><div><span style="white-space:pre"> </span>this->encoder_ctx->sample_fmt,</div><div><span style="white-space:pre"> </span>this->encoder_ctx->channels,</div><div><span style="white-space:pre"> </span>1);</div><div><span style="white-space:pre"> </span>if (this->fifo == NULL)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error av_audio_fifo_alloc().");</div><div><br></div><div><span style="white-space:pre"> </span>/* Others.. */</div><div><span style="white-space:pre"> </span>this->pts = 0;</div><div><span style="white-space:pre"> </span>this->expectedInputPts = 0;</div><div><span style="white-space:pre"> </span>this->flushed = false;</div><div>}</div><div><br></div><div>void AudioResampler::putFrame(Frame* frame) {</div><div><span style="white-space:pre"> </span>if (frame->frame->pts > expectedInputPts) {</div><div><span style="white-space:pre"> </span>this->fillSilenceSamples(frame);</div><div><span style="white-space:pre"> </span>}</div><div><br></div><div><span style="white-space:pre"> </span>uint8_t** converted_input_samples = (uint8_t**)calloc(</div><div><span style="white-space:pre"> </span>this->encoder_ctx->channels,</div><div><span style="white-space:pre"> </span>sizeof(*converted_input_samples)</div><div><span style="white-space:pre"> </span>);</div><div><span style="white-space:pre"> </span>if (converted_input_samples == NULL)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error calloc().");</div><div><br></div><div><span style="white-space:pre"> </span>int out_samples = av_rescale_rnd(</div><div><span style="white-space:pre"> </span>swr_get_delay(this->resample_context, this->decoder_ctx->sample_rate) + frame->frame->nb_samples,</div><div><span style="white-space:pre"> </span>this->encoder_ctx->sample_rate,</div><div><span style="white-space:pre"> </span>this->decoder_ctx->sample_rate,</div><div><span style="white-space:pre"> </span>AV_ROUND_UP);</div><div><br></div><div><span style="white-space:pre"> </span>int ret;</div><div><span style="white-space:pre"> </span>ret = av_samples_alloc(</div><div><span style="white-space:pre"> </span>converted_input_samples,</div><div><span style="white-space:pre"> </span>NULL,</div><div><span style="white-space:pre"> </span>this->encoder_ctx->channels,</div><div><span style="white-space:pre"> </span>out_samples,</div><div><span style="white-space:pre"> </span>this->encoder_ctx->sample_fmt,</div><div><span style="white-space:pre"> </span>0</div><div><span style="white-space:pre"> </span>);</div><div><span style="white-space:pre"> </span>if (ret < 0)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error av_samples_alloc().");</div><div><br></div><div><span style="white-space:pre"> </span>ret = swr_convert(</div><div><span style="white-space:pre"> </span>this->resample_context,</div><div><span style="white-space:pre"> </span>converted_input_samples,</div><div><span style="white-space:pre"> </span>out_samples,</div><div><span style="white-space:pre"> </span>(const uint8_t**)frame->frame->extended_data,</div><div><span style="white-space:pre"> </span>frame->frame->nb_samples</div><div><span style="white-space:pre"> </span>);</div><div><span style="white-space:pre"> </span>if (ret < 0)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error swr_convert().");</div><div><span style="white-space:pre"> </span>out_samples = ret;</div><div><br></div><div><span style="white-space:pre"> </span>ret = av_audio_fifo_realloc(</div><div><span style="white-space:pre"> </span>this->fifo,</div><div><span style="white-space:pre"> </span>av_audio_fifo_size(this->fifo) + out_samples</div><div><span style="white-space:pre"> </span>);</div><div><span style="white-space:pre"> </span>if (ret < 0)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error av_audio_fifo_realloc().");</div><div><br></div><div><span style="white-space:pre"> </span>ret = av_audio_fifo_write(</div><div><span style="white-space:pre"> </span>this->fifo,</div><div><span style="white-space:pre"> </span>(void**)converted_input_samples,</div><div><span style="white-space:pre"> </span>out_samples</div><div><span style="white-space:pre"> </span>);</div><div><span style="white-space:pre"> </span>if (ret < 0)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error av_audio_fifo_write().");</div><div><br></div><div><span style="white-space:pre"> </span>av_freep(&converted_input_samples[0]);</div><div><span style="white-space:pre"> </span>free(converted_input_samples);</div><div><br></div><div><span style="white-space:pre"> </span>this->expectedInputPts = frame->frame->pts + frame->frame->pkt_duration;</div><div>}</div><div><br></div><div><br></div><div>void AudioResampler::fillSilenceSamples(Frame* frame) {</div><div><span style="white-space:pre"> </span>uint64_t missingTime = frame->frame->pts - expectedInputPts;</div><div><br></div><div><span style="white-space:pre"> </span>uint64_t missingSamples = av_rescale(</div><div><span style="white-space:pre"> </span>missingTime,</div><div><span style="white-space:pre"> </span>frame->frame->sample_rate,</div><div><span style="white-space:pre"> </span>this->decoder_ctx->time_base.den</div><div><span style="white-space:pre"> </span>);</div><div><br></div><div><span style="white-space:pre"> </span>int out_missingSamples = av_rescale_rnd(</div><div><span style="white-space:pre"> </span>swr_get_delay(this->resample_context, this->decoder_ctx->sample_rate) + missingSamples,</div><div><span style="white-space:pre"> </span>this->encoder_ctx->sample_rate,</div><div><span style="white-space:pre"> </span>this->decoder_ctx->sample_rate,</div><div><span style="white-space:pre"> </span>AV_ROUND_NEAR_INF);</div><div><br></div><div><span style="white-space:pre"> </span>uint8_t** silence_samples = (uint8_t**)calloc(</div><div><span style="white-space:pre"> </span>this->encoder_ctx->channels,</div><div><span style="white-space:pre"> </span>sizeof(*silence_samples)</div><div><span style="white-space:pre"> </span>);</div><div><span style="white-space:pre"> </span>if (silence_samples == NULL)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error calloc().");</div><div><br></div><div><span style="white-space:pre"> </span>int ret;</div><div><span style="white-space:pre"> </span>ret = av_samples_alloc(</div><div><span style="white-space:pre"> </span>silence_samples,</div><div><span style="white-space:pre"> </span>NULL,</div><div><span style="white-space:pre"> </span>this->encoder_ctx->channels,</div><div><span style="white-space:pre"> </span>out_missingSamples,</div><div><span style="white-space:pre"> </span>this->encoder_ctx->sample_fmt,</div><div><span style="white-space:pre"> </span>0</div><div><span style="white-space:pre"> </span>);</div><div><span style="white-space:pre"> </span>if (ret < 0)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error av_samples_alloc().");</div><div><br></div><div><span style="white-space:pre"> </span>ret = av_audio_fifo_realloc(</div><div><span style="white-space:pre"> </span>this->fifo,</div><div><span style="white-space:pre"> </span>av_audio_fifo_size(this->fifo) + out_missingSamples</div><div><span style="white-space:pre"> </span>);</div><div><span style="white-space:pre"> </span>if (ret < 0)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error av_audio_fifo_realloc().");</div><div><br></div><div><span style="white-space:pre"> </span>ret = av_audio_fifo_write(</div><div><span style="white-space:pre"> </span>this->fifo,</div><div><span style="white-space:pre"> </span>(void**)silence_samples,</div><div><span style="white-space:pre"> </span>out_missingSamples</div><div><span style="white-space:pre"> </span>);</div><div><span style="white-space:pre"> </span>if (ret < 0)</div><div><span style="white-space:pre"> </span>throw new std::exception("Error av_audio_fifo_write().");</div><div><br></div><div><span style="white-space:pre"> </span>av_freep(&silence_samples[0]);</div><div><span style="white-space:pre"> </span>free(silence_samples);</div><div>}</div></span></div></font></div></div>