[FFmpeg-devel] [RFC] AAC Encoder, now more optimal

Sat Sep 6 18:53:59 CEST 2008

On Sat, Sep 06, 2008 at 04:27:21PM +0300, Kostya wrote:
> On Sat, Sep 06, 2008 at 02:39:55PM +0200, Michael Niedermayer wrote:
> > On Sat, Sep 06, 2008 at 02:12:22PM +0200, Guillaume POIRIER wrote:
> > > Hello,
> > > 
> > > On Sat, Sep 6, 2008 at 1:43 PM, Guillaume POIRIER <poirierg at gmail.com> wrote:
> > > > Hello,
> > > >
> > > > On Sat, Sep 6, 2008 at 1:21 PM, Robert Swain <robert.swain at gmail.com> wrote:
> > > >> 2008/9/6 Guillaume POIRIER <poirierg at gmail.com>:
> > > >>> Hello,
> > > >>>
> > > >>> On Fri, Sep 5, 2008 at 3:13 PM, Kostya <kostya.shishkov at gmail.com> wrote:
> > > >>>> After some time (I'd like to have more free time to spend on it though),
> > > >>>> I want to expose my new AAC encoder.
> > > >>>
> > > >>> Is it possible to test your encoder with the files you sent?
> > > >>> I tried ~/Prgm/ffmpeg/ffmpeg -i flog.flac -acodec aac ~/Music/flog.m4a
> > > >>> but I'm getting:
> > > >>> FFmpeg version SVN-r15217, Copyright (c) 2000-2008 Fabrice Bellard, et al.
> > > >>>  configuration: --enable-gpl
> > > >>>  libavutil     49.10. 0 / 49.10. 0
> > > >>>  libavcodec    51.71. 0 / 51.71. 0
> > > >>>  libavformat   52.22. 0 / 52.22. 0
> > > >>>  libavdevice   52. 1. 0 / 52. 1. 0
> > > >>>  built on Sep  6 2008 13:04:25, gcc: 4.0.1 (Apple Inc. build 5465)
> > > >>> Input #0, flac, from 'met080814d1_01_Creeping_Death.flac':
> > > >>>  Duration: N/A, bitrate: N/A
> > > >>>    Stream #0.0: Audio: flac, 44100 Hz, stereo, s16
> > > >>> Unknown encoder 'aac'
> > > >>>
> > > >>> I copied all 4 files to ffmpeg/libavcodec/
> > > >>
> > > >> You would need to patch the build system too. As I recall I added a
> > > >> checkout script to the aacenc SoC dir a while ago. I'm not sure if it
> > > >> will still work though.
> > > >
> > > > Yep, found it. Thanks.
> > > > BTW, in order to compile, the AAC encoder also needs aac.h file from SOC.
> > > 
> > > Ok, I tested your encoder with some Rock music. I was a bit
> > > disapointed by the quality of the encoded audio file: it isn't good. I
> > > guess it's OK though since the deal is really more to have an LGPL
> > > encoder.
> > 
> > Id just like to say that to be accepted in svn the encoder MUST be better in
> > quality per bitrate compared to some other common encoder at least, just
> > implementing some half buggy rate distortion encoder is not good enough.
> 
> Sorry, I forgot to explicitly mention this is work in progress.
> Encoder is still in transitional state, so some features are missing
> compared to the previous version, especially M/S detection and rate control,
> so it is not possible to test it against anything else right now.
>  
> > The absolutely most important thing is that the encoder is regularely
> > tested to ensure that there are no quality regressions due to bugs or
> > misunderstandings of some paper/algo/... .
> > In that light, can you confirm that the current code is at least better than
> > the last iteration of the encoder?
> 
> The issues mentioned above disallow testing. But to my ear it was better,
> especially on transitions.
> There are 2-3 thing I have to deal with before it is suitable for SVN:
> * M/S detection - but how to incorporate it? Should it be performed during
> quantizers search or after and how?
> * Speed optimization
> * Other tricks (pulse tool, TNS) - less important though

IMO inclusion in SVN requires to produce equal or better quality / bitrate
than the encoder from that paper. and better than at least one common encoder
like faac. (reaching the paper one should be trivial by just implementing what
the paper describes, deviation from this have to be better not worse quality
wise)
The paper contains some graphs that compare it against the reference encoder
and it should be possible to similarly generate such graphs for your encoder.
This is a good check to ensure that things are correctly implemented.

I also think we should apply much stricter tests in the future for SOC project
decoders, that is PSNR/RMS difference, from the binary decoder but ideally
bit identical. To ensure that no bugs that are very hard to debug later sneak
in.

> 
> And about quantizers search method (I document it here in hope it would
> be easier to understand and discuss it):
> 
> * the code iterates over band groups (bands with the same number in different
> windows of window group) for all window groups since they are quantized
> with the same quantizer
> * for each of band groups all quantizers are tried (actually I determine
> quantizers for which quantizing have sence - i.e. outside them distortion
> and number of bits needed to code are the same as on the boundary - and
> search only in that range) to find out distortion and number of bits

> ** quantizing and bits estimation is not optimal since it will slow down
> encoding even more

> ** distortion = sum of squared quantizing errors

yes, but as quantization is approximate so is that

> * then the cost function is calculated:
>   C_{q1,q2} = SUM_{w} (quanterror_w / threshold_w * lambda + bits_w) + TC(q1,q2)
> where quanterror - sum of squared quantisation errors for band in window w,
>       threshold  - band threshold (provided by psychoacoustic model)
>       lambda     - rate control parameter
>       bits       - number of bits needed to encode that quantized band
>       TC(a,b)    - number of bits needed to encode scalefactor difference (q2-q1)
> 
> and path is calculated where the total cost is minimal.
> 
> I use several tricks to reduce computations for zero bands and to ensure
> final quantizers will not differ by more than 60.
> 
> The most problematic steps are quantization and (less so) inverse quantization.
> By replacing inverse quantization process (x*cbrt(x)*IQ) with table lookup
> (with size 8192*256, so not for final encoder), I've managed to reduce
> coding time from 72 seconds to mere 59 seconds. Unfortunately, it's not easy
> to speedup quantizing.
> 
> But there's an idea: represent coefficients in 'AAC domain', i.e. apply
> power to 3/4 and represent it as A * 2^(B/4) with integers, so it will
> be easier to quantize. Do you think it's worth trying?

You have a table of vector quantizers, quantization is finding the one
with the lowest RD, as the table contains the unquantized vectors
as well i have difficulty mapping your problems onto it.

And i honestly have no interrest in optimizing an approximation for which
we neither know how much speed it gains nor how much quality it looses.
Or has the design you use here been compared in some paper against the
optimal one?

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement. For
even the very wise cannot see all ends. -- Gandalf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080906/9c8fca36/attachment.pgp>