[FFmpeg-devel] [PATCH] Implement PAFF in H.264

Mon Oct 15 14:11:02 CEST 2007

On Mon, 15 Oct 2007, Martin Zlomek wrote:

> I have discovered that h264_loop_filter_strength_mmx2(), used for
> 'bS' calculation in inter predicted macroblocks, gives wrong output
> when used in fields (compared with the reference decoder).
> Attached patch is temporary not-optimized solution giving correct
> output.
>
> I would like to ask Loren Merritt for help - could you look at this,
> please? Learning MMX instructions and understanding your code would
> not take me a while...

It's probably due to mvy_limit. I don't know how to fix it without 
sacrificing some speed (not that I benchmarked anything), so I would just 
leave filter_mb_fast for the common case that needs to be fast (i.e. 
progressive), and let PAFF uses the non-asm code.
But if you really want: patch is untested and doesn't modify the 
prototype and caller to match.

--Loren Merritt
-------------- next part --------------
Index: libavcodec/i386/h264dsp_mmx.c
===================================================================

--- libavcodec/i386/h264dsp_mmx.c	(revision 10691)
+++ libavcodec/i386/h264dsp_mmx.c	(working copy)
@@ -552,9 +552,7 @@
     asm volatile(
         "pxor %%mm7, %%mm7 \n\t"
         "movq %0, %%mm6 \n\t"
-        "movq %1, %%mm5 \n\t"
-        "movq %2, %%mm4 \n\t"
-        ::"m"(ff_pb_1), "m"(ff_pb_3), "m"(ff_pb_7)
+        ::"m"(ff_pb_1)
     );
     // could do a special case for dir==0 && edges==1, but it only reduces the
     // average filter time by 1.2%
@@ -563,6 +561,18 @@
         const int mask_mv = dir ? mask_mv1 : mask_mv0;
         DECLARE_ALIGNED_8(const uint64_t, mask_dir) = dir ? 0 : 0xffffffffffffffffULL;
         int b_idx, edge, l;
+        asm volatile(
+            "movq %0, %%mm5 \n\t"
+            "movq %1, %%mm4 \n\t"
+            ::"m"(ff_pb_3), "m"(ff_pb_7)
+        );
+        if(interlaced && dir==1) {
+            // halve mvy_limit
+            asm volatile(
+                "movq %%mm5, %%mm4 \n\t"
+                "movq %%mm6, %%mm5 \n\t"
+            :);
+        }
         for( b_idx=12, edge=0; edge<edges; edge+=step, b_idx+=8*step ) {
             asm volatile(
                 "pand %0, %%mm0 \n\t"