[FFmpeg-devel] [PATCH]v5 Opus Pyramid Vector Quantization Search in x86 SIMD asm
Ivan Kalvachev
ikalvachev at gmail.com
Sat Jul 22 14:18:30 EEST 2017
This patch is ready for review and inclusion.
Explanation of what it does and how it works
could be found in the previous WIP threads:
[v1] http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html
[v2] http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html
[v3] http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html
[v4] http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html
The changes compared to WIP v4 are small:
- Using r4d ops to clear the high bits on int32 arguments.
- Correctly map the cglobal registry usage.
- Use SSE4 instead of SSE42, since blend is only SSE4.1.
- Fix building with --disable-x86asm .
- Remove testing defines.
Loading constants in registers is (now) always same or better speed.
Avoiding stall forwarding is faster on all CPU except Ryzen.
On Ryzen the alternative is about 7 cycles faster, that's why
I've left the code disabled, but without define.
I've also left the two other defines, as they are useful
for debugging and creating binary identical results
to other algorithms.
- Disable the 256bit AVX2 variant usage.
I'm leaving the code in the assembly as disabled,
in case it is useful in future.
---
I'm including some of the benchmarks.
Some data is removed, since it was used to test different methods.
Benchmarks are done at default settings (96kbps),
but with different samples. All samples are above 1h long.
In summary, the function is about 2-3x faster
than the improved FFmpeg C version.
===========================================================
K10 AMD Phenom(tm) II X4 945 Processor
//v4
706 706 706 706 706 // NULL
4146 4161 4169 4184 4188 4328 4379 // SSE2
4988 5015 5016 5030 5185 // USE_APPROXIMATION 0
13860 13828 13846 13846 13831 // C
===========================================================
Pentium Dual Core E5800
//V4
3006 3012 3019 3023 3025 // SSE2
9066 9071 9074 9077 9081 // C
//===========================================================
Ryzen 1800X
//v3
357 // NULL
1999 2001 2004 // AVX1 GCC
2010 2029 // SSE4 MSVC
2012 2026 2027 // AVX1 MSVC
2166 2170 2171 // AVX2 & STALL_WRITE_FORWARDING 1
2176 2179 2180 2180 2189 // AVX2
2226 2230 2234 // AVX2 & USE_APPROXIMATION 0
6216 6162 6162 // C only GCC
61909 61545 // C only MSVC
//v4
1931 1933 1935 // v4 AVX1
2096 2097 2098 // v4 AVX2 & STALL_WRITE_FORWARDING 1
2103 2110 2112 // v4 AVX2
//===========================================================
Intel(R) Core(TM) i7-3930K CPU
//v3
272 // NULL
1755 1756 1764 // AVX1
1847 1855 1866 // SSE4
2003 2009 // USE_APPROXIMATION 00
2103 2110 2112 // AVX2
4855 4856 // C only
//===========================================================
SkyLake i7 6700HQ
//v2
264 // NULL
1764 1765 1772 1773 1780 // SSE4
1782 1782 1787 1795 1796 // AVX1
1805 1807 1807 1811 1815 // AVX1 & USE_APPROXIMATION 0
1826 1827 1828 1833 1833 // SSE2
1850 1853 1857 1857 1868 // AVX2
6878 6934 6879 6921 6899 // C
-b:a 48kbps, 96kbps, 510kbps
sse4: 2049, 1826, 955
sse2: 2065, 1874, 943
avx: 2106, 1868, 950
c: 9202, 7080, 1392
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-SIMD-opus-pvq_search-implementation.patch
Type: text/x-patch
Size: 24414 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170722/c9ec5d19/attachment.bin>
More information about the ffmpeg-devel
mailing list