[FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86

Muhammad Faiz mfcc64 at gmail.com
Wed Jun 8 02:28:41 CEST 2016


On Tue, Jun 7, 2016 at 4:18 PM, Muhammad Faiz <mfcc64 at gmail.com> wrote:
> On Tue, Jun 7, 2016 at 10:36 AM, James Almer <jamrial at gmail.com> wrote:
>> On 6/4/2016 4:36 AM, Muhammad Faiz wrote:
>>> benchmark on x86_64
>>> cqt_time:
>>> plain = 3.292 s
>>> SSE   = 1.640 s
>>> SSE3  = 1.631 s
>>> AVX   = 1.395 s
>>> FMA3  = 1.271 s
>>> FMA4  = not available
>>
>> Try using the START_TIMER and STOP_TIMER macros to wrap the s->cqt_calc
>> call in libavfilter/avf_showcqt.c
>> It will potentially give more accurate results than the current
>> UPDATE_TIME(s->cqt_time) check.
>>
> OK, but probably I will check it privately (not sending patch)
>

plain:
2339760 decicycles in cqt_calc,       1 runs,      0 skips
2305160 decicycles in cqt_calc,       2 runs,      0 skips
2248260 decicycles in cqt_calc,       4 runs,      0 skips
2211985 decicycles in cqt_calc,       8 runs,      0 skips
2195152 decicycles in cqt_calc,      16 runs,      0 skips
2188133 decicycles in cqt_calc,      32 runs,      0 skips
2182856 decicycles in cqt_calc,      64 runs,      0 skips
2182876 decicycles in cqt_calc,     128 runs,      0 skips
2178021 decicycles in cqt_calc,     256 runs,      0 skips
2178197 decicycles in cqt_calc,     512 runs,      0 skips
2173667 decicycles in cqt_calc,    1024 runs,      0 skips
2175272 decicycles in cqt_calc,    2048 runs,      0 skips
2171456 decicycles in cqt_calc,    4096 runs,      0 skips
2169706 decicycles in cqt_calc,    8192 runs,      0 skips
2169493 decicycles in cqt_calc,   16384 runs,      0 skips

sse:
1432400 decicycles in cqt_calc,       1 runs,      0 skips
1413420 decicycles in cqt_calc,       2 runs,      0 skips
1340840 decicycles in cqt_calc,       4 runs,      0 skips
1240880 decicycles in cqt_calc,       8 runs,      0 skips
1175592 decicycles in cqt_calc,      16 runs,      0 skips
1155657 decicycles in cqt_calc,      32 runs,      0 skips
1157220 decicycles in cqt_calc,      64 runs,      0 skips
1132563 decicycles in cqt_calc,     128 runs,      0 skips
1121175 decicycles in cqt_calc,     256 runs,      0 skips
1112374 decicycles in cqt_calc,     512 runs,      0 skips
1109323 decicycles in cqt_calc,    1024 runs,      0 skips
1102490 decicycles in cqt_calc,    2048 runs,      0 skips
1098801 decicycles in cqt_calc,    4096 runs,      0 skips
1100257 decicycles in cqt_calc,    8192 runs,      0 skips
1101172 decicycles in cqt_calc,   16384 runs,      0 skips

sse3:
1612720 decicycles in cqt_calc,       1 runs,      0 skips
1539780 decicycles in cqt_calc,       2 runs,      0 skips
1398232 decicycles in cqt_calc,       4 runs,      0 skips
1331866 decicycles in cqt_calc,       8 runs,      0 skips
1262878 decicycles in cqt_calc,      16 runs,      0 skips
1538833 decicycles in cqt_calc,      32 runs,      0 skips
1384517 decicycles in cqt_calc,      64 runs,      0 skips
1246595 decicycles in cqt_calc,     128 runs,      0 skips
1178879 decicycles in cqt_calc,     256 runs,      0 skips
1120117 decicycles in cqt_calc,     512 runs,      0 skips
1092902 decicycles in cqt_calc,    1024 runs,      0 skips
1077479 decicycles in cqt_calc,    2048 runs,      0 skips
1069110 decicycles in cqt_calc,    4096 runs,      0 skips
1067095 decicycles in cqt_calc,    8192 runs,      0 skips
1066812 decicycles in cqt_calc,   16383 runs,      1 skips

avx:
1333000 decicycles in cqt_calc,       1 runs,      0 skips
1261940 decicycles in cqt_calc,       2 runs,      0 skips
1082250 decicycles in cqt_calc,       4 runs,      0 skips
1036575 decicycles in cqt_calc,       8 runs,      0 skips
 977935 decicycles in cqt_calc,      16 runs,      0 skips
 950680 decicycles in cqt_calc,      32 runs,      0 skips
 950307 decicycles in cqt_calc,      64 runs,      0 skips
 959265 decicycles in cqt_calc,     128 runs,      0 skips
 943070 decicycles in cqt_calc,     256 runs,      0 skips
 931758 decicycles in cqt_calc,     512 runs,      0 skips
 929080 decicycles in cqt_calc,    1023 runs,      1 skips
 923407 decicycles in cqt_calc,    2046 runs,      2 skips
 918616 decicycles in cqt_calc,    4094 runs,      2 skips
 917359 decicycles in cqt_calc,    8189 runs,      3 skips
 916981 decicycles in cqt_calc,   16379 runs,      5 skips

fma3:
1050200 decicycles in cqt_calc,       1 runs,      0 skips
1019680 decicycles in cqt_calc,       2 runs,      0 skips
 969420 decicycles in cqt_calc,       4 runs,      0 skips
 945985 decicycles in cqt_calc,       8 runs,      0 skips
 905312 decicycles in cqt_calc,      16 runs,      0 skips
 964126 decicycles in cqt_calc,      32 runs,      0 skips
1041993 decicycles in cqt_calc,      64 runs,      0 skips
 969205 decicycles in cqt_calc,     128 runs,      0 skips
 917490 decicycles in cqt_calc,     256 runs,      0 skips
 885880 decicycles in cqt_calc,     512 runs,      0 skips
 867781 decicycles in cqt_calc,    1024 runs,      0 skips
 852242 decicycles in cqt_calc,    2048 runs,      0 skips
 844318 decicycles in cqt_calc,    4096 runs,      0 skips
 839100 decicycles in cqt_calc,    8191 runs,      1 skips
 836639 decicycles in cqt_calc,   16383 runs,      1 skips

Thank's


More information about the ffmpeg-devel mailing list