[FFmpeg-devel] [PATCH] swresample/resample: speed up build_filter

Michael Niedermayer michael at niedermayer.cc
Wed Nov 4 15:44:31 CET 2015


On Wed, Nov 04, 2015 at 01:11:45AM -0500, Ganesh Ajjanagadde wrote:
> This speeds up build_filter by ~ 50%. This gain should be pretty
> consistent across all architectures and platforms.
> 
> Essentially, this relies on a observation that the filters have some
> even/odd symmetry that may be exploited during the construction of the
> polyphase filter bank. In particular, phases (scaled to [0, 1]) in [0.5, 1] are
> easily derived from [0, 0.5] and expensive reevaluation of function
> points are unnecessary. This requires some rather annoying even/odd
> bookkeeping as can be seen from the patch.
> 
> I vaguely recall from signal processing theory more general symmetries allowing even greater
> optimization of the construction. At a high level, "even functions"
> correspond to 2, and one can imagine variations. Nevertheless, for the sake
> of some generality and because of existing filters, this is all that is
> being exploited.
> 
> Currently, this patch relies on phase_count being even or (trivially) 1,
> though this is not an inherent limitation to the approach. This
> assumption is safe as phase_count is 1 << phase_bits, and is hence a
> power of two. There is no way for user API to set it to a nontrivial odd
> number. This assumption has been placed as an assert in the code.
> 
> To repeat, this assumes even symmetry of the filters, which is the most common
> way to get generalized linear phase anyway and is true of all currently
> supported filters.
> 
> As a side note, accuracy should be identical or perhaps slightly better
> due to this "forcing" filter symmetries leading to a better phase
> characteristic. As before, I can't test this claim easily, though it may
> be of interest.
> 
> Patch tested with FATE.
> 
> Sample benchmark (x86-64, Haswell, GNU/Linux):
> 
> test: swr-resample-dblp-44100-2626
> 
> new:
> 527376779 decicycles in build_filter(loop 1000),     256 runs,      0 skips
> 524361765 decicycles in build_filter(loop 1000),     512 runs,      0 skips
> 516552574 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
> 
> old:
> 974178658 decicycles in build_filter(loop 1000),     256 runs,      0 skips
> 972794408 decicycles in build_filter(loop 1000),     512 runs,      0 skips
> 954350046 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
> 
> Note that lower level optimizations are entirely possible, I focussed on
> getting the high level semantics correct. In any case, this should
> provide a good foundation.
> 
> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
> ---
>  libswresample/resample.c | 56 +++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 48 insertions(+), 8 deletions(-)
> 
> diff --git a/libswresample/resample.c b/libswresample/resample.c
> index 235c9a9..d5fc5e7 100644
> --- a/libswresample/resample.c
> +++ b/libswresample/resample.c
> @@ -144,7 +144,7 @@ static int build_filter(ResampleContext *c, void *filter, double factor, int tap
>                          int filter_type, int kaiser_beta){
>      int ph, i;
>      double x, y, w;
> -    double *tab = av_malloc_array(tap_count,  sizeof(*tab));
> +    double *tab = av_malloc_array(tap_count+1,  sizeof(*tab));
>      const int center= (tap_count-1)/2;
>  
>      if (!tab)
> @@ -154,9 +154,10 @@ static int build_filter(ResampleContext *c, void *filter, double factor, int tap
>      if (factor > 1.0)
>          factor = 1.0;
>  
> -    for(ph=0;ph<phase_count;ph++) {
> +    av_assert0(phase_count == 1 || phase_count % 2 == 0);
> +    for(ph = 0; ph <= phase_count / 2; ph++) {
>          double norm = 0;
> -        for(i=0;i<tap_count;i++) {
> +        for(i=0;i<=tap_count;i++) {
>              x = M_PI * ((double)(i - center) - (double)ph / phase_count) * factor;
>              if (x == 0) y = 1.0;
>              else        y = sin(x) / x;
> @@ -180,26 +181,65 @@ static int build_filter(ResampleContext *c, void *filter, double factor, int tap
>              }
>  
>              tab[i] = y;
> -            norm += y;
> +            if (i < tap_count)
> +                norm += y;
>          }
>  
>          /* normalize so that an uniform color remains the same */
>          switch(c->format){
>          case AV_SAMPLE_FMT_S16P:
> -            for(i=0;i<tap_count;i++)
> +            for(i=0;i<tap_count;i++) {
>                  ((int16_t*)filter)[ph * alloc + i] = av_clip(lrintf(tab[i] * scale / norm), INT16_MIN, INT16_MAX);
> +            }
> +            if (tap_count % 2 == 0) {
> +                for (i = 0; i < tap_count; i++)
> +                    ((int16_t*)filter)[(phase_count-ph) * alloc + tap_count-1-i] = ((int16_t*)filter)[ph * alloc + i];
> +            }
> +            else {
> +                for (i = 1; i <= tap_count; i++)
> +                    ((int16_t*)filter)[(phase_count-ph) * alloc + tap_count-i] =
> +                        av_clip(lrintf(tab[i] * scale / (norm - tab[0] + tab[tap_count])), INT16_MIN, INT16_MAX);
> +            }
>              break;
>          case AV_SAMPLE_FMT_S32P:

> -            for(i=0;i<tap_count;i++)
> +            for(i=0;i<tap_count;i++) {
>                  ((int32_t*)filter)[ph * alloc + i] = av_clipl_int32(llrint(tab[i] * scale / norm));
> +            }

this and similar changes look uneeded

otherwise LGTM

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151104/08c075b2/attachment.sig>


More information about the ffmpeg-devel mailing list