[FFmpeg-devel] [PATCH] yadif: restore speed of the C filtering code

Michael Niedermayer michaelni at gmx.at
Sat Mar 2 13:02:07 CET 2013


On Fri, Mar 01, 2013 at 06:20:19PM +0100, James Darnley wrote:
> Always use the special filter for the first and last 3 columns (only).
> 
> The changes made in 64ed397 slowed the filter to just under 3/4 of what
> it was.  This commit restores almost all of that speed while maintaining
> identical output.
> 
> For reference, on my Athlon64:
> 1733222 decicycles in old
> 2358563 decicycles in new
> 1740014 decicycles in this
> ---
>  libavfilter/vf_yadif.c          |   93 +++++++++++++++++++++++---------------
>  libavfilter/x86/vf_yadif_init.c |   12 +----
>  libavfilter/yadif.h             |    4 +-
>  3 files changed, 60 insertions(+), 49 deletions(-)
> 
> diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c
> index b7c2d80..3bd0d17 100644
> --- a/libavfilter/vf_yadif.c
> +++ b/libavfilter/vf_yadif.c
> @@ -34,9 +34,9 @@
>  #define PERM_RWP AV_PERM_WRITE | AV_PERM_PRESERVE | AV_PERM_REUSE
>  
>  #define CHECK(j)\
> -    {   int score = FFABS(cur[mrefs + off_left + (j)] - cur[prefs + off_left - (j)])\
> +    {   int score = FFABS(cur[mrefs - 1 + (j)] - cur[prefs - 1 - (j)])\
>                    + FFABS(cur[mrefs  +(j)] - cur[prefs  -(j)])\
> -                  + FFABS(cur[mrefs + off_right + (j)] - cur[prefs + off_right - (j)]);\
> +                  + FFABS(cur[mrefs + 1 + (j)] - cur[prefs + 1 - (j)]);\
>          if (score < spatial_score) {\
>              spatial_score= score;\
>              spatial_pred= (cur[mrefs  +(j)] + cur[prefs  -(j)])>>1;\
> @@ -51,15 +51,46 @@
>          int temporal_diff2 =(FFABS(next[mrefs] - c) + FFABS(next[prefs] - e) )>>1; \
>          int diff = FFMAX3(temporal_diff0 >> 1, temporal_diff1, temporal_diff2); \
>          int spatial_pred = (c+e) >> 1; \
> -        int off_right = (x < w - 1) ? 1 : -1;\
> -        int off_left  = x ? -1 : 1;\
> -        int spatial_score = FFABS(cur[mrefs + off_left]  - cur[prefs + off_left]) + FFABS(c-e) \
> -                          + FFABS(cur[mrefs + off_right] - cur[prefs + off_right]) - 1; \
> +        int spatial_score = FFABS(cur[mrefs - 1] - cur[prefs - 1]) + FFABS(c-e) \
> +                          + FFABS(cur[mrefs + 1] - cur[prefs + 1]) - 1; \
>   \
> -        if (x > 2 && x < w - 3) {\
> -            CHECK(-1) CHECK(-2) }} }} \
> -            CHECK( 1) CHECK( 2) }} }} \
> -        }\
> +        CHECK(-1) CHECK(-2) }} }} \
> +        CHECK( 1) CHECK( 2) }} }} \
> + \
> +        if (mode < 2) { \
> +            int b = (prev2[2 * mrefs] + next2[2 * mrefs])>>1; \
> +            int f = (prev2[2 * prefs] + next2[2 * prefs])>>1; \
> +            int max = FFMAX3(d - e, d - c, FFMIN(b - c, f - e)); \
> +            int min = FFMIN3(d - e, d - c, FFMAX(b - c, f - e)); \
> + \
> +            diff = FFMAX3(diff, min, -max); \
> +        } \
> + \
> +        if (spatial_pred > d + diff) \
> +           spatial_pred = d + diff; \
> +        else if (spatial_pred < d - diff) \
> +           spatial_pred = d - diff; \
> + \
> +        dst[0] = spatial_pred; \
> + \
> +        dst++; \
> +        cur++; \
> +        prev++; \
> +        next++; \
> +        prev2++; \
> +        next2++; \
> +    }
> +
> +#define FILTER_EDGES(start, end) \
> +    for (x = start;  x < end; x++) { \
> +        int c = cur[mrefs]; \
> +        int d = (prev2[0] + next2[0])>>1; \
> +        int e = cur[prefs]; \
> +        int temporal_diff0 = FFABS(prev2[0] - next2[0]); \
> +        int temporal_diff1 =(FFABS(prev[mrefs] - c) + FFABS(prev[prefs] - e) )>>1; \
> +        int temporal_diff2 =(FFABS(next[mrefs] - c) + FFABS(next[prefs] - e) )>>1; \
> +        int diff = FFMAX3(temporal_diff0 >> 1, temporal_diff1, temporal_diff2); \
> +        int spatial_pred = (c+e) >> 1; \

this duplciates the macro, i dont think thats neccessary
it should be enough to fix the implementation so the compiler can
optimize things to constants in the main case


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I have never wished to cater to the crowd; for what I know they do not
approve, and what they approve I do not know. -- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130302/a5b65169/attachment.asc>


More information about the ffmpeg-devel mailing list