[FFmpeg-devel] [PATCH] lavc/aacenc_utils: unroll abs_pow34_v loop

Ganesh Ajjanagadde gajjanag at gmail.com
Sat Mar 19 06:12:14 CET 2016


It seems like in all usages, size is a multiple of 4. This is documented
as an assert.

Yields speedup in this function, and small speedup for aac encoding overall.

Sample benchmark (Haswell, -march=native + GCC):
old:
   [...]
   1390 decicycles in abs_pow34_v,  127138 runs,   3934 skips63.1x
   1385 decicycles in abs_pow34_v,  254191 runs,   7953 skips64.4x
   1383 decicycles in abs_pow34_v,  508305 runs,  15983 skips65.3x

new:
   [...]
   1109 decicycles in abs_pow34_v,  127122 runs,   3950 skips61.2x
   1107 decicycles in abs_pow34_v,  254177 runs,   7967 skips63.5x
   1106 decicycles in abs_pow34_v,  508292 runs,  15996 skips65.3x

old:
ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.55s user 0.03s system 99% cpu 4.581 total
new:
ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.50s user 0.04s system 99% cpu 4.537 total

Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
---
 libavcodec/aacenc_utils.h | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
index 0203b6e..800b78f 100644
--- a/libavcodec/aacenc_utils.h
+++ b/libavcodec/aacenc_utils.h
@@ -37,20 +37,26 @@
 #define ROUND_TO_ZERO 0.1054f
 #define C_QUANT 0.4054f
 
-static inline void abs_pow34_v(float *av_restrict out, const float *av_restrict in, const int size)
-{
-    int i;
-    for (i = 0; i < size; i++) {
-        float a = fabsf(in[i]);
-        out[i] = sqrtf(a * sqrtf(a));
-    }
-}
-
 static inline float pos_pow34(float a)
 {
     return sqrtf(a * sqrtf(a));
 }
 
+static inline void abs_pow34_v(float *av_restrict out, const float *av_restrict in, const int size)
+{
+    av_assert2(!(size % 4));
+    for (int i = 0; i < size; i+=4) {
+        float a0 = fabsf(in[i]);
+        float a1 = fabsf(in[i+1]);
+        float a2 = fabsf(in[i+2]);
+        float a3 = fabsf(in[i+3]);
+        out[i  ] = pos_pow34(a0);
+        out[i+1] = pos_pow34(a1);
+        out[i+2] = pos_pow34(a2);
+        out[i+3] = pos_pow34(a3);
+    }
+}
+
 /**
  * Quantize one coefficient.
  * @return absolute value of the quantized coefficient
-- 
2.7.3



More information about the ffmpeg-devel mailing list