[FFmpeg-devel] [RFC] AAC encoder optimizations

Sun Sep 14 15:57:23 CEST 2008

On Sun, Sep 14, 2008 at 06:19:23AM -0600, Loren Merritt wrote:
> On Sat, 13 Sep 2008, Kostya wrote:
> 
> > Now it's ~30 time slower than realtime on Core2.
> 
> I understand that michael wants RD before heuristics, but what exactly are 
> you doing that makes this 1000x slower than faac, libvorbis, lame, and 
> 3000x slower than ffac3?
> Is this the first RDOed lossy audio codec anyone has ever written? 
> Are you testing the tensor product of all possible choices while others do 
> gradient descent? Is it just that unoptimized?

My knowledge of this area is more limited that Michael's since he
knows technologies from video coding. Maybe audio coding needs more
people like you.

Here's the short review of what I know of existing AAC encoders.

FAAC just operates on some 'quality' parameter (range 50-300) which it just
adjusts after encoding a frame if real and desired bitrates differ.
That 'quality parameter' is used in calculating maximum allowed distortion
and deriving quantization from it (I suspect it's similar to standard).

3GPP encoder modifies thresholds by adding a constant in loudness domain
    add(x,n) = pow(pow(x, 0.25) + n, 4)
to achieve perceptual entropy. And then quantizer is derived from threshold
and other band characteristics.

Reference encoder is reported to use two-loop RD search that is described in
ISO/IEC 13818-7:2004 Annex C (Encoder) section C.7.4:
for all scalefactor bands{
 calculate quantized coefficients
 calculate number of bits needed to code it
 if number of bits is too high{
  common scalefactor++;
 }
 calculate quantization error
 if error > max allowed error{
  scalefactor--;
  repeat loop;
 }
}

> --Loren Merritt