[FFmpeg-devel] [PATCH 0/3] cbs: Improve performance of writing slices

Andreas Rheinhardt andreas.rheinhardt at googlemail.com
Sun Nov 4 06:48:39 EET 2018

When assembling slices in cbs_mpeg2, cbs_h264 and cbs_h265, a
combination of a bitreader and bitwriter is used to copy the data (after
the slice header has been assembled). They copy in blocks of 16 bits.

This is inefficient: E.g. the bitreader can be eliminated by first
copying bits until the input is byte-aligned. If then the bitwriter
is also byte-aligned after this, one can use memcpy to improve
performance (I got a more than 20x speed increase for copying the
slice data if the slices are big enough and properly aligned). If it is
not byte-aligned, one has nevertheless eliminated the shifting done in
the bitreader. Shifting 32 bits at once also proved advantageous.

The aligned case is very common:
For MPEG2, the slice header doesn't contain lots of interesting fields
to modify (e.g. the extra_information_slice is reserved), so that there
is not really a point in changing the slice at all. (One could actually
speed the mpeg2_metadata filter further up by not decomposing slices at
For H.264 CABAC mode and H.265, the slice header is always byte-aligned,
so that one would have to intentionally produce misaligned data to have

My patch aims to create the identical output as the current version,
with one exception: Currently, cbs_h264 and cbs_h265 assert that the
last few bits of input aren't zero as they are supposed to contain the
rbsp_stop_one_bit. This is probably done because the behaviour of ff_ctz
is undefined when its argument is zero. My version doesn't check for this
in the aligned mode, as ff_ctz isn't required here at all. And in the
unaligned mode, my version only checks the last 8 bits, whereas the
current version checks between 8 and 23 bits.

This is my first contribution on this mailing list and I tried my best
to follow your patch submission checklist. But I could not check fate,
although I tested my patch with several files and they created the same
output as the current version.

Andreas Rheinhardt (3):
  cbs_mpeg2: Improve performance of writing slices
  cbs_h264: Improve performance of writing slices
  cbs_h265: Improve performance of writing slices

 libavcodec/cbs_h2645.c | 139 ++++++++++++++++++++++++++++-------------
 libavcodec/cbs_mpeg2.c |  39 ++++++++----
 2 files changed, 122 insertions(+), 56 deletions(-)


More information about the ffmpeg-devel mailing list