[FFmpeg-devel] [PATCH 10/12] WMAPRO: use vector_clipf_interleave()

Tue Sep 29 22:52:29 CEST 2009

Sascha Sommer <saschasommer at freenet.de> writes:

> Hi,
>
> On Sonntag, 27. September 2009, Mans Rullgard wrote:
>> ---
>>  libavcodec/wmaprodec.c |   20 ++++++++------------
>>  1 files changed, 8 insertions(+), 12 deletions(-)
>>
>> diff --git a/libavcodec/wmaprodec.c b/libavcodec/wmaprodec.c
>> index a489047..ac559a4 100644
>> --- a/libavcodec/wmaprodec.c
>> +++ b/libavcodec/wmaprodec.c
>> @@ -221,6 +221,7 @@ typedef struct WMAProDecodeCtx {
>>      WMAProChannelGrp chgroup[WMAPRO_MAX_CHANNELS];  ///< channel group
>> information
>>
>>      WMAProChannelCtx channel[WMAPRO_MAX_CHANNELS];  ///< per channel data
>> +    const float      *channel_ptr[WMAPRO_MAX_CHANNELS];
>
> In other places the star follows directly after the data type.
> const float*     channel_ptr.

True.  I don't like that style as it is highly misleading, but I'll
change it to maintain consistency.

> Also a doxygen comment could be added.

Yes, it could...

>>  } WMAProDecodeCtx;
>>
>>
>> @@ -443,6 +444,9 @@ static av_cold int decode_init(AVCodecContext *avctx)
>>      for (i = 0; i < 33; i++)
>>          sin64[i] = sin(i*M_PI / 64.0);
>>
>> +    for (i = 0; i < WMAPRO_MAX_CHANNELS; i++)
>> +        s->channel_ptr[i] = s->channel[i].out;
>> +
>>      if (avctx->debug & FF_DEBUG_BITSTREAM)
>>          dump_context(s);
>>
>> @@ -1331,19 +1335,11 @@ static int decode_frame(WMAProDecodeCtx *s)
>>      }
>>
>>      /** interleave samples and write them to the output buffer */
>> -    for (i = 0; i < s->num_channels; i++) {
>> -        float* ptr;
>> -        int incr = s->num_channels;
>> -        float* iptr = s->channel[i].out;
>> -        int x;
>> -
>> -        ptr = s->samples + i;
>> -
>> -        for (x = 0; x < s->samples_per_frame; x++) {
>> -            *ptr = av_clipf(*iptr++, -1.0, 32767.0 / 32768.0);
>> -            ptr += incr;
>> -        }
>> +    s->dsp.vector_clipf_interleave(s->samples, s->channel_ptr,
>> +                                   -1.0, 32767.0 / 32768.0,
>> +                                   s->samples_per_frame, s->num_channels);
>>
>> +    for (i = 0; i < s->num_channels; i++) {
>>          /** reuse second half of the IMDCT output for the next frame */
>>          memcpy(&s->channel[i].out[0],
>>                 &s->channel[i].out[s->samples_per_frame],
>
> Ok.

Assuming the addition of this function to dsputil is ok, that is.

> P.S: If you want a faster decoder you can try to output int16_t again.
> I don't know if such a patch would be acceptable for ffmpeg, however.

The trend seems to be to have floating-point output directly.  Speed
could be improved by optimising audioconvert.c, which is presently
totally devoid of any SIMD.  Decoding wmapro to int16 on Cortex-A8
spends, after my patches, 40% of the time there.

> Also the decoder currently always copies the frame data to a tmp buffer to 
> avoid problems with damaged streams that might cause overreads.
> This copy should only be needed when a frame crosses a packet boundary.

This should probably be done, although the benefit will be smaller
since it's a simple copy of compressed data.  The huge speedups I've
been getting with these patches are due to gcc being exceptionally bad
at floating-point maths.

-- 
M?ns Rullg?rd
mans at mansr.com