[FFmpeg-devel] [PATCH] G.729 Fixed-codebook vector decoding

Sun Jun 28 07:32:03 CEST 2009

2009/6/28 Vladimir Voroshilov <voroshil at gmail.com>:
> 2009/6/28 Michael Niedermayer <michaelni at gmx.at>:
>> On Sat, Jun 27, 2009 at 09:27:23AM +0700, Vladimir Voroshilov wrote:
>>> 2009/6/27 Michael Niedermayer <michaelni at gmx.at>:
>>> > On Sat, Jun 27, 2009 at 02:05:27AM +0700, Vladimir Voroshilov wrote:
>>> >> Memset is required, since vector should contain only several non-zero pulses
>>> >
>>> > if the thing is sparse, memset could maybe be replaced by just setting
>>> > the non zero values to zero, in that sense many operationson on sparse
>>> > vectors can be done more efficiently than working on an array of zeros
>>> > with one non zero value, which if it applies here means you have to
>>> > rewrite the related code to be more efficient
>>>
>>> Hmm..
>>>
>>> Does you mean "invert" vector (replace zero with non-zero and vice versa) ?
>>> What values whould contain non-zero parts in your suggestion?
>>>
>>> Here is how i uderstand fixed-codebook vector generation.
>>>
>>> Fixed-codebook vector defines where four (or three, depends on mode)
>>> pulses of sound are placed in
>>> current frame (which is zero at this stage). Afaik, fixed-codebook
>>> vector defines signal which in "new"
>>> in current frame and is not continuation oprevious frame's sound.
>>> Routine sets 1 at this positions, setting all other to 0.
>>>
>>> "1" related to pulse of sound, "0" means silence.
>>>
>>> Later this vector will be weightly summed with excitation signal.
>>>
>>> Thus, either your suggestion is deep math or heavy optimization which
>>> i can't understand.
>>
>> adding a sparse vector with 4 non zero pulses to another vector is a O(1)
>> operation, your code needs O(n) that is not acceptable. (that is based on
>> your description above)
>>
>> Its simply that you add n-4 values with n-4 zeros and 4 values with
>> 4 non zero values, thats not deep math
>
> Ok. Now i see what is the battle for. O(1) is obviously better even for me.
> But, i'm afraid O(1) implementation is much complex task.
>
> Well. Fixed-vector (called fc and fc_v in severral places) decribes contribution
> of non-periodic signal introduced in current frame.
>
> Initially it contains few (fixed number of, depending on mode)
> non-zero pulses and is a sparse vector.
> This pulses generated ?fixed-codebook indexes and signs. The latter
> either are decoded from frame or
> initialized with random values in case of frame erasure.
>
> Working with sparse vector at this stage is clean for me (small array
> of non-zero values' indexes/signs only, as i see).
>
> But already applying simple harmonic filter:
>
> for(i=pitch_delay; i<subframe_size; i++)
> fc[i]+=gain_pitch * fc[i-pitch_delay]
>
> becames much complex task for such sparse vector.
>
> Later fixed-codebook vector is used for decoding gain_code.
> There is scalarproduct_int16, which is called with fc_v. Of course i
> can compute scalarproduct
> on sparse vector too, but mmx optiomization can not be used.
>
> According to above, i see only two troubles (for me):
> 1. Complex code with harmonic filter. Sparse case will require either
> "if"s or some sorting of indexes.
> This is because accumulation is proceeded in-place and each step of
> loop affects next elements of array.
> 2. Inpossible reusage of existing MMX optiomizations. Perhaps
> calculation of scalarproduct can be merged with harmonic filter.
>
> How to implement "1" i didn't yet know and will be thankful for advice.

I tried to make some complexity aproximations of sparse vector usage
in harmonic filter.
Assuming there are four pulses and pitch delay can be between 20 and 149.
subframe size is 80.

0. Simplest case. pitch_delay>=80. filter is not applied.
0.1 non-sparse vector.
Requires zero-initialized array of length 80.
80-pitch_delay (60 in worst case) number of "+" and "*" and one memset
of size 80.
Adding check for non-zero will lead to 60 if's and 12 "+" and "*" in worst case.

1. sparse vector. pitch_delay<80
Requires array of size 16 with indexes and values (total 32)

All pulses are inside first one chunk of size 80/pitch_delay.
Filter will do int(80/pitch_delay)-1 iterations (three in worst case)
with four "*" and "+" inside.
In worst case this will introduce 12 additional non-zero pulses.

Pulses are not intersects in this case.

2. Pulses can be in different chunks of size 80/pitch_delay.
In this code filter have to check whether non zero values are
intersects in different chunks or not.
If not, complexity is the same as for previous case.
if they are, early pulses should change values of the next which they
intersects with.
This will require either sorting of index/value array at each
iteration with next item's index check or
searching this arrays to find intersects.
Both cases will require (in worst case) 12-1 iterations of inner loop.
which leads to 11*12=132 "if"s and assignments.

According to above i don't see any advantage of introducing sparse vector.
If i'm wrong or missing something, please fix me.

P.S. Adding scalar product calculation to harmonic filter will remove
one extra loop, but will prevent using mmx optimization.

-- 
Regards,
Vladimir Voroshilov     mailto:voroshil at gmail.com
JID: voroshil at gmail.com, voroshil at jabber.ru
ICQ: 95587719