[Libav-user] Calculate spectrogram from the audio channel

Sat May 10 03:03:18 CEST 2014

On May 3, 2014, at 7:24 AM, wm4 <nfxjfg at googlemail.com> wrote:

> On Fri, 2 May 2014 17:48:37 -0700
> Ricky Huang <rhuang.work at gmail.com> wrote:
> 
>> Hello all,
>> 
>> I am trying to reproduce the Shazam algorithm as outlined in Avery Wang's paper "An Industrial-Strength Audio Search Algorithm" (http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf).  One of the step in this is to convert the audio to spectrogram and identify the spectrogram peaks.  I am wondering if building a custom audio-filter for ffmpeg would be the correct way to go?  If so, does anyone have any pointers on converting the audio data to spectrogram for me?  (algorithm to use, things to note, etc?)
>> 
>> 
>> Any help would be appreciated.  Thanks.
> 
> No idea about the algorithm, but if you want to see a sample filter how
> to integrate such a filter into libavfilter, have a look at
> libavfilter/avf_showspectrum.c. This filter visualizes the computed data.

Thanks for the pointer to the avf_showspectrum.c file.

I am wondering if anyone here knows what what does "spectrogram peaks" mean in the FFT with respect to the Shazam paper?  Is it the highest amplitude (intensity) points in each of the frequencies at each point in time?

Thanks in advance.

> If you actually want to export the filtered data instead of visualizing
> it audio-player style, you could do something like vf_cropdetect.c, and
> attach the filtered data to output AVFrames.
> 
> (If you just want to convert the data, my reply is probably not helpful
> at all.)
> _______________________________________________
> Libav-user mailing list
> Libav-user at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/libav-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ffmpeg.org/pipermail/libav-user/attachments/20140509/b1fb0f2f/attachment.html>