[FFmpeg-devel] [PATCH 2/2] libavcodec: Implementation of AC3 fixed point decoder

Fri Apr 12 19:11:48 CEST 2013

Hi,

I just wanted to mention that I'm not someone of charge of anything
for ffmpeg: I just wanted to look at a FP implementation.

2013/4/12 Nedeljko Babic <Nedeljko.Babic at imgtec.com>:
>>> +            ac3_downmix_c_fixed16(s->outptr, s->downmix_coeffs,
[...]
> I didn't want to clog dsp context with this function because it is used only here
> but ok, I'll add it to ac3dsp context in next patch.

I have no idea how much it takes in a runtime profile, so maybe it is
not worth having it.

>>> -                memcpy(((float*)frame->data[ch]) + AC3_BLOCK_SIZE*blk, output[ch], 1024);
>>> +                memcpy(((SHORTFLOAT*)frame->data[ch]) + AC3_BLOCK_SIZE*blk, output[ch], 1024);
>>
>>I thought SHORTFLOAT was int16_t. How come it is the same copy size?
>>I'm surprised this overwrite does not crash the decoder.
>>
>
> Depending on flag CONFIG_AC3_FIXED, SHORTFLOAT is either int16_t (for fixed point decoder) or float
> (for floating point decoder). Since output is also defined as SHORTFLOAT, memcpy is working
> correctly.

Yes but output[ch] is AC3_BLOCK_SIZE elements, ie 256. The 1024 is
fine for 256*sizeof(float), but not here: you're overreading and
overwriting the buffers. This 1024 should be changed to
AC3_BLOCK_SIZE*sizeof(SHORTFLOAT).

And answering myself, you didn't notice it because this happens only
on error recovery, and you are probably not using corrupted streams.

>>On a side note, I guess this memcpy is long enough to have little
>>overhead compared to an inline copy loop using eg AV_COPY128.
>>
>
> You are probably correct, but in original implementation (floating point decoder) here is memcpy and
> I left it here in fixed point implementation because it is easier to merge implementations for these
> two decoders (since this is template file for both decoders).
> I can change this if you wish, but the change should be made for original (floating point) decoder also.

No, don't bother, this is be a separate consideration.

>>> +static void scale_coefs (
>>> +    int32_t *dst,
>>> +    const int *src,
>>> +    int dynrng,
>>> +    int len)
>>
>>int* pointer? I think the float context has the same issue, but
>>really, this should be int32_t too
>
> OK. I'll change this.

Just for the record, this is because int is 64bits on some architectures.

>>> +static int ac3_fixed_sqrt(int x)
>>
>>This is entirely dependent on the FP format, but that could go into
>>the Fixed context header (not the struct itself) as a helper function.
>
> I will move it to header.

I don't know what naming convention we could have, but maybe naming it
fp12_fixed_sqrt(x) would be suited to match the format rather than
where it is used?

Someone else should comment on this FP DSP design.

>>Also, quite a lot less important, but the input is probably 32bits, so
>>there may be wasteful sign extension occurring here.
>
> I am not sure on which sign extension you are referring

Sorry, I was thinking of another architecture, that will not use a FP
version anyway.

>>> +        dst[i] = av_clip_int16_c(((((int64_t)s0*wj - (int64_t)s1*wi + 0x40000000) >> 31) + 0x80) >> 8);
>>> +        dst[j] = av_clip_int16_c(((((int64_t)s0*wi + (int64_t)s1*wj + 0x40000000) >> 31) + 0x80) >> 8);
>>
>>I'm stuck on this, not really seeing how useful my remark can be, but...
>>Sometimes, you have shuffle insns taking chuncks of 8, 16 or 32bits of
>>a SIMD register and reordering them. Had the shift been 32, this would
>>have been possible.
>>Would it be possible to have 1 more bit of precision on the window
>>coeffs (and still be usable and fit on 32bits)?
>
> It is not possible because value would be out of range. This is classical fixed point operation.

OK, that was just a thought. Some particular input could cause overflow indeed.

--
Christophe