[FFmpeg-devel] [RFC] AAC Encoder, now more optimal

Sat Sep 6 15:27:21 CEST 2008

On Sat, Sep 06, 2008 at 02:39:55PM +0200, Michael Niedermayer wrote:
> On Sat, Sep 06, 2008 at 02:12:22PM +0200, Guillaume POIRIER wrote:
> > Hello,
> > 
> > On Sat, Sep 6, 2008 at 1:43 PM, Guillaume POIRIER <poirierg at gmail.com> wrote:
> > > Hello,
> > >
> > > On Sat, Sep 6, 2008 at 1:21 PM, Robert Swain <robert.swain at gmail.com> wrote:
> > >> 2008/9/6 Guillaume POIRIER <poirierg at gmail.com>:
> > >>> Hello,
> > >>>
> > >>> On Fri, Sep 5, 2008 at 3:13 PM, Kostya <kostya.shishkov at gmail.com> wrote:
> > >>>> After some time (I'd like to have more free time to spend on it though),
> > >>>> I want to expose my new AAC encoder.
> > >>>
> > >>> Is it possible to test your encoder with the files you sent?
> > >>> I tried ~/Prgm/ffmpeg/ffmpeg -i flog.flac -acodec aac ~/Music/flog.m4a
> > >>> but I'm getting:
> > >>> FFmpeg version SVN-r15217, Copyright (c) 2000-2008 Fabrice Bellard, et al.
> > >>>  configuration: --enable-gpl
> > >>>  libavutil     49.10. 0 / 49.10. 0
> > >>>  libavcodec    51.71. 0 / 51.71. 0
> > >>>  libavformat   52.22. 0 / 52.22. 0
> > >>>  libavdevice   52. 1. 0 / 52. 1. 0
> > >>>  built on Sep  6 2008 13:04:25, gcc: 4.0.1 (Apple Inc. build 5465)
> > >>> Input #0, flac, from 'met080814d1_01_Creeping_Death.flac':
> > >>>  Duration: N/A, bitrate: N/A
> > >>>    Stream #0.0: Audio: flac, 44100 Hz, stereo, s16
> > >>> Unknown encoder 'aac'
> > >>>
> > >>> I copied all 4 files to ffmpeg/libavcodec/
> > >>
> > >> You would need to patch the build system too. As I recall I added a
> > >> checkout script to the aacenc SoC dir a while ago. I'm not sure if it
> > >> will still work though.
> > >
> > > Yep, found it. Thanks.
> > > BTW, in order to compile, the AAC encoder also needs aac.h file from SOC.
> > 
> > Ok, I tested your encoder with some Rock music. I was a bit
> > disapointed by the quality of the encoded audio file: it isn't good. I
> > guess it's OK though since the deal is really more to have an LGPL
> > encoder.
> 
> Id just like to say that to be accepted in svn the encoder MUST be better in
> quality per bitrate compared to some other common encoder at least, just
> implementing some half buggy rate distortion encoder is not good enough.

Sorry, I forgot to explicitly mention this is work in progress.
Encoder is still in transitional state, so some features are missing
compared to the previous version, especially M/S detection and rate control,
so it is not possible to test it against anything else right now.

> The absolutely most important thing is that the encoder is regularely
> tested to ensure that there are no quality regressions due to bugs or
> misunderstandings of some paper/algo/... .
> In that light, can you confirm that the current code is at least better than
> the last iteration of the encoder?

The issues mentioned above disallow testing. But to my ear it was better,
especially on transitions.
There are 2-3 thing I have to deal with before it is suitable for SVN:
* M/S detection - but how to incorporate it? Should it be performed during
quantizers search or after and how?
* Speed optimization
* Other tricks (pulse tool, TNS) - less important though

And about quantizers search method (I document it here in hope it would
be easier to understand and discuss it):

* the code iterates over band groups (bands with the same number in different
windows of window group) for all window groups since they are quantized
with the same quantizer
* for each of band groups all quantizers are tried (actually I determine
quantizers for which quantizing have sence - i.e. outside them distortion
and number of bits needed to code are the same as on the boundary - and
search only in that range) to find out distortion and number of bits
** quantizing and bits estimation is not optimal since it will slow down
encoding even more
** distortion = sum of squared quantizing errors
* then the cost function is calculated:
  C_{q1,q2} = SUM_{w} (quanterror_w / threshold_w * lambda + bits_w) + TC(q1,q2)
where quanterror - sum of squared quantisation errors for band in window w,
      threshold  - band threshold (provided by psychoacoustic model)
      lambda     - rate control parameter
      bits       - number of bits needed to encode that quantized band
      TC(a,b)    - number of bits needed to encode scalefactor difference (q2-q1)

and path is calculated where the total cost is minimal.

I use several tricks to reduce computations for zero bands and to ensure
final quantizers will not differ by more than 60.

The most problematic steps are quantization and (less so) inverse quantization.
By replacing inverse quantization process (x*cbrt(x)*IQ) with table lookup
(with size 8192*256, so not for final encoder), I've managed to reduce
coding time from 72 seconds to mere 59 seconds. Unfortunately, it's not easy
to speedup quantizing.

But there's an idea: represent coefficients in 'AAC domain', i.e. apply
power to 3/4 and represent it as A * 2^(B/4) with integers, so it will
be easier to quantize. Do you think it's worth trying?

> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> Everything should be made as simple as possible, but not simpler.
> -- Albert Einstein