[FFmpeg-devel] [PATCH] Some IWMMXT functions for libavcodec #2
Siarhei Siamashka
siarhei.siamashka
Tue May 20 20:31:59 CEST 2008
On Tuesday 20 May 2008, Dmitry Antipov wrote:
> Yes, your test_cachemiss is ~13% slower on XScale too. So doing
> pre-increment if possible should be considered as a good idea (for pix_sum,
> it's at the cost of having 1 'sub').
You can have pre-increment without any cost.
> So, I'm voting for something like the following for pix_sum (it might be
> a bit unreadable without preprocessing :-):
[...]
> If 'pix' is fully cached, WMMX2 version is ~19% faster. Otherwise,
> it goes at the same speed as WMMX (it looks like loading uncached
> data is quite expensive, so an overhead introduced by 'add's is
> marginal).
Please add the following implementation of "pix_sum" function to your
benchmark set and post the results. I strongly suspect that it is a lot
faster than any of your variants.
#define SUM1() \
"wldrd wr1, [%1, %2]! \n\t" \
"wsadb wr3, wr2, wr0 \n\t" \
"wldrd wr2, [%1, #8] \n\t" \
"wsadb wr3, wr1, wr0 \n\t"
#define SUM4() \
SUM1() \
SUM1() \
SUM1() \
SUM1()
int pix_sum_iwmmxt2_pipelined(uint8_t *pix, int line_size)
{
int s;
asm volatile(
"wldrd wr1, [%1] \n\t"
"wzero wr0 \n\t"
"wldrd wr2, [%1, #8] \n\t"
"wsadbz wr3, wr1, wr0 \n\t"
SUM1()
SUM1()
SUM1()
SUM4()
SUM4()
SUM4()
"wsadb wr3, wr2, wr0 \n\t"
"textrmsw %0, wr3, #0 \n\t"
: "=r"(s), "+r"(pix)
: "r"(line_size));
return s;
}
--
Best regards,
Siarhei Siamashka
More information about the ffmpeg-devel
mailing list