[FFmpeg-devel] [PATCH] Some IWMMXT functions for libavcodec #2

Dmitry Antipov dmantipov
Sat May 17 11:19:28 CEST 2008


Michael Niedermayer wrote:

> the iterations are always an even number IIRC, but dont hesitate to add a
> assert(!(h&1));

But both vsad_intra16_c() and vsse_intra16_c() has an outer loop
'for(y=1; y<h; y++)', so, if H if always even, the number of iterations
is always odd. For loop unrolled by 2, this means we either need an
additional check within the loop body, or move the last iteration outside
of the loop.

For an always even H and number of iterations H - 1:

#define BODY(p,q,r,s) \
         "add %1, %1, %2                      \n\t" \
         "wldrd wr" #p ", [%1]                \n\t" \
         "wldrd wr" #q ", [%1, #8]            \n\t" \
         "wsadbz wr" #r ", wr" #r ", wr" #p " \n\t" \
         "wsadbz wr" #s ", wr" #s ", wr" #q " \n\t" \
         "waddw wr0, wr0, wr" #r "            \n\t" \
         "waddw wr0, wr0, wr" #s "            \n\t"

int vsad_intra16_iwmmxt(void *c, uint8_t *pix, uint8_t *dummy, int stride, int h)
{
     int s;

     assert(!(h&1));

     asm volatile("mov r1, %3                \n\t"
                  "wzero wr0                 \n\t"
                  "wldrd wr1, [%1]           \n\t"
                  "wldrd wr2, [%1, #8]       \n\t"
                  /* main loop */
                  "1:                        \n\t"
                  BODY(3, 4, 1, 2)
                  BODY(1, 2, 3, 4)
                  "subs r1, r1, #2           \n\t"
                  "bne 1b                    \n\t"
                  /* last step */
                  BODY(3, 4, 1, 2)
                  "textrmsw %0, wr0, #0      \n\t"
                  : "=r"(s), "+r"(pix)
                  : "r"(stride), "r"(h - 2)
                  : "r1");
     return s;
}

#undef BODY

If number of iterations H - 1 (not H) is always even, the last step
BODY(3, 4, 1, 2) is not needed.

As for the latencies, I don't see (avoidable) ones here. If someone do,
please specify an exact instruction sequence which causes the latency -
I'll check it against the manuals and try to redesign the code again.

Dmitry





More information about the ffmpeg-devel mailing list