[FFmpeg-devel] [PATCH 04/12] Add vector_fmul_matrix to dsputil

Måns Rullgård mans
Sun Oct 18 22:17:48 CEST 2009


Michael Niedermayer <michaelni at gmx.at> writes:

> On Sun, Sep 27, 2009 at 11:49:20AM +0100, Mans Rullgard wrote:
>> ---
>>  libavcodec/dsputil.c |   28 ++++++++++++++++++++++++++++
>>  libavcodec/dsputil.h |    2 ++
>>  2 files changed, 30 insertions(+), 0 deletions(-)
>> 
>> diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
>> index ab916b7..b2cb0e3 100644
>> --- a/libavcodec/dsputil.c
>> +++ b/libavcodec/dsputil.c
>> @@ -4178,6 +4178,33 @@ static float scalarproduct_float_c(const float *v1, const float *v2, int len)
>>      return p;
>>  }
>>  
>> +void ff_vector_fmul_matrix_c(float **v, const float *mtx, int len, int w,
>> +                             float *restrict tmp)
>> +{
>> +    int i, j, k;
>> +
>> +    if (w == 2) {
>> +        for (i = 0; i < len; i++) {
>> +            float v0 = v[0][i]*mtx[0] + v[1][i]*mtx[1];
>> +            float v1 = v[0][i]*mtx[2] + v[1][i]*mtx[3];
>> +            v[0][i] = v0;
>> +            v[1][i] = v1;
>
> v1 is redundant, it can be written directly

Whatever you wish.

>> +        }
>> +    } else {
>> +        for (i = 0; i < len; i++) {
>> +            const float *m = mtx;
>> +            for (j = 0; j < w; j++) {
>> +                float s = 0;
>
>> +                for (k = 0; k < w; k++)
>> +                    s += v[k][i] * *m++;
>
> this is quite inefficient because for(k) v[k][i] needs 2 memory reads
> a flat 2d array would be better

And how will the data magically transform itself into such a layout?

>> @@ -445,6 +445,8 @@ typedef struct DSPContext {
>>       * @param len length of vectors, multiple of 4
>>       */
>>      float (*scalarproduct_float)(const float *v1, const float *v2, int len);
>> +    void (*vector_fmul_matrix)(float **v, const float *mtx, int len, int w,
>> +                               float *restrict tmp);
>>      /**
>>       * Calculate the sum and difference of two vectors of floats.
>>       * @param v1  first input vector, sum output, 16-byte aligned
>
> missing doxy explaining what this does
> (in case you wonder why do i complain and you would add it before commit
>  its because it makes review easier if one can check the API without
>  reverse engeneering it from the code ...)

It seemed obvious enough in this case.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list