[Ffmpeg-devel] [PATCH] Which Vorbis encoder by default

Michael Niedermayer michaelni
Tue Oct 3 22:12:41 CEST 2006


On Tue, Oct 03, 2006 at 09:08:14PM +0200, Oded Shimon wrote:
> On Tue, Oct 03, 2006 at 08:53:18PM +0200, Michael Niedermayer wrote:
> > On Tue, Oct 03, 2006 at 08:20:00PM +0200, Oded Shimon wrote:
> > > On Tue, Oct 03, 2006 at 03:30:18PM +0200, Diego Biurrun wrote:
> > > > That's perfectly OK, but independent of adding native to the name of our
> > > > encoder .. :)
> > > 
> > > So what do you suggest for the case of ffmpeg compiled with libvorbis, yet 
> > > still wanting to use my encoder?
> > > The idea is that the codec named "vorbis" depends on if ffmpeg was 
> > > compiled with libvorbis or not, and some other name, will always point to 
> > > my encoder.
> > 
> > id suggest that we drop libvorbis support and you hurry a little with
> > improving quality of your encoder
> I'm at a dead end :(
> > see google/wikipedia/citeseer/source code of other encoders for 
> > psychoacoustics for example
> > and support for both blocksizes is also a must
> I've read lots of wikipedia stuff on psy, and as much as I can from 
> libvorbis source, and reached a conclusion that this stuff is way out of 
> my league. :(
> I understand some of the basic ideas there, but my knowledge of even basic 
> things such as fft is so weak (I only know how to implement it and use it 
> and what it's useful for, but I don't really understand it) that I can't 
> implement the ideas given there in any usable way...

you dont need to understand anyhing about FFTs, a FFT is just a optimized
fourier transform ...

lets limit ourselfs to the FT with real-valued input (complex input is
meaningless for us)
furthermore lets ignore the middle to last output coefficient they are
just the complex conjungantes of the first half and are consequently useless
and wont be calculated anyway by any sane real valued fft implementation

now if the complex output scares you just think of it as 2x as many real
values, if you do that you can even think of it as a 
input vector * constant matrix if you like ...

what does our FT now do? well its dead simple
feed it with a constant signal and you get a output with all 0 except
the first coefficient which is that constant you feeded
now try a constant*cosine wave with a frequency which is a integer multiply
of the block size, the FT of that is just a vector of all 0 except one which
is again the constant
now try a constant*sine wave with a frequency which is a integer multiply
of the block size, the FT of that is just a vector of all 0 except one which
is i*constant (or if you disslike complex values and used a 2x as large
real array then its just constant, its all the same just a different way to
look at it)

now lets try a sine wave which is shifted: A*cos(freq*(x+shift))
you get all coeffs 0 except one which is something like A*e^(i*pi*shift)
or if you prefer 2 real values, it should be
A*cos(pi*shift) and A*sin(pi*shift)
so if you would use the 2 as x and y coordinates on a plane then the distance
from the center would be the amplitide and the direction would be the phase
for pure sine or cosine it lies exatly on one of the coordinate axes  ...

so fourier transform just splits a signal into sine and cosine waves, the
output of the fourier transform are the amplitudes of these waves, you could
think of a FT simply as a correlation of cosine waves with the input for the
even coefficients and correlation of sine waves fo the odd coefficients

the inverse transform then just adds all these sines and cosine up

i hope i didnt make too many lame errors in the above ...

> Any volunteers to help me? :)
> > IMHO the encoder should have some sort of distortion measure
> > function (in reality that will be just a weighted sum of squares
> > where the weights for different coefficients come from the psycoacoustic
> > model)
> > 
> > and then just minimize rate(=number of bits) + distortion * quality(=constant)
> I think this method doesn't apply as well to audio as it does to video... 
> You can't play guessing games with the quantization because you can change 
> it anywhere to anything, as opposed to being a global constant (for the 
> single frame)...

it can be changed for every macroblock in video and every dct coefficient can
be quantized differently by using custom quant matrixes so no i dont agree at

also its not just quantization factors but also block sizes, and actual values
minimizing the distortion (choosing the scalar or vector which matches best)
without considering how many bits that will need just isnt optimal)

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is

More information about the ffmpeg-devel mailing list