[FFmpeg-devel] [PATCH] modification of the MMX H264 MC chroma functions to support RV40

Mathieu Velten matmaul
Tue Dec 23 00:40:18 CET 2008


2008/12/22 Michael Niedermayer <michaelni at gmx.at>:
> how much does it slow h.264 down?
> please put START/STOP_TIMER surrounding the call to the MC code
> and tell us the results.

sorry I am not familiar with this method so I will just paste the results.
if you could explain to me what the results means...

for mc8_rnd :
without :
57080 dezicycles in mc8_rnd, 1 runs, 0 skips
31905 dezicycles in mc8_rnd, 2 runs, 0 skips
16360 dezicycles in mc8_rnd, 4 runs, 0 skips
9010 dezicycles in mc8_rnd, 8 runs, 0 skips
4840 dezicycles in mc8_rnd, 16 runs, 0 skips
2752 dezicycles in mc8_rnd, 32 runs, 0 skips
1716 dezicycles in mc8_rnd, 64 runs, 0 skips
1299 dezicycles in mc8_rnd, 128 runs, 0 skips
1031 dezicycles in mc8_rnd, 256 runs, 0 skips
983 dezicycles in mc8_rnd, 512 runs, 0 skips
1092 dezicycles in mc8_rnd, 1024 runs, 0 skips
1002 dezicycles in mc8_rnd, 2048 runs, 0 skips
937 dezicycles in mc8_rnd, 4095 runs, 1 skips
915 dezicycles in mc8_rnd, 8191 runs, 1 skips
986 dezicycles in mc8_rnd, 16381 runs, 3 skips
990 dezicycles in mc8_rnd, 32765 runs, 3 skips
948 dezicycles in mc8_rnd, 65531 runs, 5 skips
989 dezicycles in mc8_rnd, 131049 runs, 23 skips
1366 dezicycles in mc8_rnd, 262012 runs, 132 skips
1578 dezicycles in mc8_rnd, 523947 runs, 341 skips30 bitrate=  -0.0kbits/s
1720 dezicycles in mc8_rnd, 1047798 runs, 778 skips80 bitrate=  -0.0kbits/s
1962 dezicycles in mc8_rnd, 2094856 runs, 2296 skips2 bitrate=  -0.0kbits/s
2100 dezicycles in mc8_rnd, 4188963 runs, 5341 skips1 bitrate=  -0.0kbits/s

with :
11620 dezicycles in mc8_rnd, 1 runs, 0 skips
8350 dezicycles in mc8_rnd, 2 runs, 0 skips
5497 dezicycles in mc8_rnd, 4 runs, 0 skips
3081 dezicycles in mc8_rnd, 8 runs, 0 skips
1871 dezicycles in mc8_rnd, 16 runs, 0 skips
1266 dezicycles in mc8_rnd, 32 runs, 0 skips
961 dezicycles in mc8_rnd, 64 runs, 0 skips
895 dezicycles in mc8_rnd, 128 runs, 0 skips
857 dezicycles in mc8_rnd, 256 runs, 0 skips
831 dezicycles in mc8_rnd, 512 runs, 0 skips
965 dezicycles in mc8_rnd, 1024 runs, 0 skips
915 dezicycles in mc8_rnd, 2048 runs, 0 skips
887 dezicycles in mc8_rnd, 4095 runs, 1 skips
888 dezicycles in mc8_rnd, 8190 runs, 2 skips
958 dezicycles in mc8_rnd, 16382 runs, 2 skips
945 dezicycles in mc8_rnd, 32765 runs, 3 skips
927 dezicycles in mc8_rnd, 65533 runs, 3 skips
976 dezicycles in mc8_rnd, 131052 runs, 20 skips
1351 dezicycles in mc8_rnd, 262032 runs, 112 skips
1572 dezicycles in mc8_rnd, 523987 runs, 301 skips34 bitrate=  -0.0kbits/s
1720 dezicycles in mc8_rnd, 1047854 runs, 722 skips01 bitrate=  -0.0kbits/s
1967 dezicycles in mc8_rnd, 2095038 runs, 2114 skips7 bitrate=  -0.0kbits/s
2105 dezicycles in mc8_rnd, 4189391 runs, 4913 skips1 bitrate=  -0.0kbits/s

for mc4 :
with :
9360 dezicycles in mc4, 1 runs, 0 skips
6735 dezicycles in mc4, 2 runs, 0 skips
3962 dezicycles in mc4, 4 runs, 0 skips
3265 dezicycles in mc4, 8 runs, 0 skips
2655 dezicycles in mc4, 16 runs, 0 skips
2317 dezicycles in mc4, 32 runs, 0 skips
2374 dezicycles in mc4, 64 runs, 0 skips
2111 dezicycles in mc4, 128 runs, 0 skips
1831 dezicycles in mc4, 256 runs, 0 skips
1945 dezicycles in mc4, 512 runs, 0 skips
1882 dezicycles in mc4, 1024 runs, 0 skips
1850 dezicycles in mc4, 2048 runs, 0 skips
1738 dezicycles in mc4, 4096 runs, 0 skips
1658 dezicycles in mc4, 8192 runs, 0 skips
1605 dezicycles in mc4, 16384 runs, 0 skips
1597 dezicycles in mc4, 32768 runs, 0 skipstime=4.75 bitrate=  -0.0kbits/s
1564 dezicycles in mc4, 65534 runs, 2 skips
1562 dezicycles in mc4, 131069 runs, 3 skipsime=8.59 bitrate=  -0.0kbits/s
1551 dezicycles in mc4, 262138 runs, 6 skipsime=12.18 bitrate=  -0.0kbits/s
1575 dezicycles in mc4, 524280 runs, 8 skipsime=26.28 bitrate=  -0.0kbits/s
1583 dezicycles in mc4, 1048564 runs, 12 skipse=54.18 bitrate=  -0.0kbits/s

without :
11350 dezicycles in mc4, 1 runs, 0 skips
6585 dezicycles in mc4, 2 runs, 0 skips
4682 dezicycles in mc4, 4 runs, 0 skips
5298 dezicycles in mc4, 8 runs, 0 skips
3505 dezicycles in mc4, 16 runs, 0 skips
2822 dezicycles in mc4, 32 runs, 0 skips
2672 dezicycles in mc4, 64 runs, 0 skips
2329 dezicycles in mc4, 128 runs, 0 skips
1953 dezicycles in mc4, 256 runs, 0 skips
1997 dezicycles in mc4, 512 runs, 0 skips
1941 dezicycles in mc4, 1024 runs, 0 skips
1906 dezicycles in mc4, 2048 runs, 0 skips
1763 dezicycles in mc4, 4095 runs, 1 skips
1675 dezicycles in mc4, 8191 runs, 1 skips
1603 dezicycles in mc4, 16382 runs, 2 skips
1589 dezicycles in mc4, 32766 runs, 2 skipstime=4.50 bitrate=  -0.0kbits/s
1554 dezicycles in mc4, 65532 runs, 4 skips
1554 dezicycles in mc4, 131058 runs, 14 skipsme=8.22 bitrate=  -0.0kbits/s
1539 dezicycles in mc4, 262117 runs, 27 skipsme=14.01 bitrate=  -0.0kbits/s
1564 dezicycles in mc4, 524227 runs, 61 skipsme=27.07 bitrate=  -0.0kbits/s
1571 dezicycles in mc4, 1048452 runs, 124 skips=53.72 bitrate=  -0.0kbits/s

>> @@ -45,17 +46,16 @@
>>          /* 1 dimensional filter only */
>>          const int dxy = x ? 1 : stride;
>>
>> -        rnd_reg = rnd ? &ff_pw_4 : &ff_pw_3;
>> -
>>          __asm__ volatile(
>>              "movd %0, %%mm5\n\t"
>>              "movq %1, %%mm4\n\t"
>> -            "movq %2, %%mm6\n\t"         /* mm6 = rnd */
>> +            "movq %2, %%mm6\n\t"
>
>> +            "psrlw $3, %%mm6\n\t"        /* mm6 = bias >> 3 */
>
> is this a useless instruction that can be merged into the table?
>

I can do it in C (bias_reg = ff_pw_tab[bias>>3]) instead of shift the
mmx register itself but I'm not sure we will gain in performance.

> [...]
>> Index: libavcodec/x86/h264dsp_mmx.c
>> ===================================================================
>> --- libavcodec/x86/h264dsp_mmx.c      (revision 16270)
>> +++ libavcodec/x86/h264dsp_mmx.c      (working copy)
>> @@ -19,6 +19,7 @@
>>   */
>>
>>  #include "dsputil_mmx.h"
>> +#include "libavcodec/rv40data.h"
>
> duplicating lots of tables ...
>

then I could just copy the small rv40_bias table I need in dsputil_mmx.c
or I could create a rv40data.c file and use extern, as you want.

cleaner patch attached with rv40_bias copied in dsputil_mmx.c

Mathieu Velten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rv40_mc_mmx_v2.diff
Type: text/x-diff
Size: 11805 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081223/a0859a86/attachment.diff>



More information about the ffmpeg-devel mailing list