[FFmpeg-devel] [PATCH] avcodec: use av_mod_intp2() where possible

Wed Apr 22 00:49:38 CEST 2015

On 21/04/15 7:12 PM, Michael Niedermayer wrote:
> On Tue, Apr 21, 2015 at 06:51:52PM -0300, James Almer wrote:
>> On 21/04/15 6:38 PM, Michael Niedermayer wrote:
>>> On Tue, Apr 21, 2015 at 05:45:25PM -0300, James Almer wrote:
>>>> Signed-off-by: James Almer <jamrial at gmail.com>
>>>> ---
>>>>  libavcodec/acelp_vectors.c     |  8 +++-----
>>>>  libavcodec/adpcm.c             |  2 +-
>>>>  libavcodec/alacenc.c           |  3 +--
>>>>  libavcodec/atrac3plus.c        |  4 ++--
>>>>  libavcodec/dnxhdenc.c          |  2 +-
>>>>  libavcodec/dvenc.c             |  2 +-
>>>>  libavcodec/ffv1.h              |  2 +-
>>>>  libavcodec/ffv1dec.c           |  3 +--
>>>>  libavcodec/g726.c              |  2 +-
>>>>  libavcodec/g729dec.c           |  2 +-
>>>>  libavcodec/golomb.h            |  2 +-
>>>>  libavcodec/h264.c              |  5 ++---
>>>>  libavcodec/h264_refs.c         |  2 +-
>>>>  libavcodec/hevc.c              | 26 ++++++++++++--------------
>>>>  libavcodec/hevc_cabac.c        |  8 ++++----
>>>>  libavcodec/hevc_filter.c       | 15 ++++++---------
>>>>  libavcodec/hevc_mvs.c          |  4 ++--
>>>>  libavcodec/hevc_ps.c           |  4 ++--
>>>>  libavcodec/hevc_refs.c         |  5 ++---
>>>>  libavcodec/hevcpred_template.c |  4 ++--
>>>>  libavcodec/mpeg12enc.c         |  8 ++++----
>>>>  libavcodec/opus.h              |  2 +-
>>>>  libavcodec/opus_celt.c         |  9 ++++-----
>>>>  libavcodec/proresenc_kostya.c  |  6 ++----
>>>>  libavcodec/put_bits.h          |  2 +-
>>>>  libavcodec/vorbisdec.c         |  5 ++---
>>>>  26 files changed, 61 insertions(+), 76 deletions(-)
>>>>
>>>> diff --git a/libavcodec/acelp_vectors.c b/libavcodec/acelp_vectors.c
>>>> index 86851a3..7aef8c7 100644
>>>> --- a/libavcodec/acelp_vectors.c
>>>> +++ b/libavcodec/acelp_vectors.c
>>>> @@ -133,12 +133,11 @@ void ff_acelp_fc_pulse_per_track(
>>>>          int pulse_count,
>>>>          int bits)
>>>>  {
>>>> -    int mask = (1 << bits) - 1;
>>>>      int i;
>>>>  
>>>>      for(i=0; i<pulse_count; i++)
>>>>      {
>>>> -        fc_v[i + tab1[pulse_indexes & mask]] +=
>>>> +        fc_v[i + tab1[av_mod_uintp2(pulse_indexes, bits)]] +=
>>>>                  (pulse_signs & 1) ? 8191 : -8192; // +/-1 in (2.13)
>>>>  
>>>>          pulse_indexes >>= bits;
>>>> @@ -154,13 +153,12 @@ void ff_decode_10_pulses_35bits(const int16_t *fixed_index,
>>>>                                  int half_pulse_count, int bits)
>>>>  {
>>>>      int i;
>>>> -    int mask = (1 << bits) - 1;
>>>>  
>>>>      fixed_sparse->no_repeat_mask = 0;
>>>>      fixed_sparse->n = 2 * half_pulse_count;
>>>>      for (i = 0; i < half_pulse_count; i++) {
>>>> -        const int pos1   = gray_decode[fixed_index[2*i+1] & mask] + i;
>>>> -        const int pos2   = gray_decode[fixed_index[2*i  ] & mask] + i;
>>>> +        const int pos1   = gray_decode[av_mod_uintp2(fixed_index[2*i+1], bits)] + i;
>>>> +        const int pos2   = gray_decode[av_mod_uintp2(fixed_index[2*i  ], bits)] + i;
>>>>          const float sign = (fixed_index[2*i+1] & (1 << bits)) ? -1.0 : 1.0;
>>>>          fixed_sparse->x[2*i+1] = pos1;
>>>>          fixed_sparse->x[2*i  ] = pos2;
>>> [...]
>>>> diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h
>>>> index 5081397..bfc4d71 100644
>>>> --- a/libavcodec/ffv1.h
>>>> +++ b/libavcodec/ffv1.h
>>>> @@ -143,7 +143,7 @@ static av_always_inline int fold(int diff, int bits)
>>>>          diff = (int8_t)diff;
>>>>      else {
>>>>          diff +=  1 << (bits  - 1);
>>>> -        diff &= (1 <<  bits) - 1;
>>>> +        diff  = av_mod_uintp2(diff, bits);
>>>>          diff -=  1 << (bits  - 1);
>>>>      }
>>>>  
>>>
>>> iam not sure some of these changes are are a good idea
>>> for example above the mask has to be calculated anyway what can be
>>> gained from av_mod_uintp2() use here ?
>>> i think this should be bechmarked when its in performance critical code
>>
>> "1 << (bits  - 1)" is not the same as "(1 <<  bits) - 1". The latter is not going to be
>> calculated if the arch optimized version of av_mod_uintp2 is used.
> 
> yep, i should not review patches when i am in a (IRC) meeting
> 
> before the patch fold looks like this: (for the non 8bit case)
>         addl    100(%rsp), %ecx
>         andl    104(%rsp), %ecx
>         subl    100(%rsp), %ecx
> .L869:
> afterwards it looks like this:
> 
>         addl    104(%rsp), %ecx
>         andl    64(%rsp), %ecx
>         subl    104(%rsp), %ecx
> .L869:
> 
> so it seems ok
> 
> 
>>
>> The one file where i don't know if there's going to be any gain is in one of the opus_celt.c
>> changes, where the mask needs to be calculated even if arch optimized av_mod_uintp2 is used.
>> That one can be removed, but every other change is pretty straightforward.
> 
> ok, then iam fine with the patch
> 
> Thanks

I'm going to apply the changes that are straightforward (single line changes that didn't use
mask variables prior to this patch, or that created one but then used it once). av_mod_uintp2
should generate the same assembly with the non bmi2 optimized version.
For the ones where a mask variable was created and used more than once I'll try to bench and
check the resulting assembly, since it's apparently 100% dependent on the compiler being smart
enough to avoid doing the calculation more than once.