[FFmpeg-devel] [PATCH 0/4] cbs_h2645: Avoid memcpy when reading

Andreas Rheinhardt andreas.rheinhardt at googlemail.com
Tue Nov 20 13:38:35 EET 2018

The aim of this patchset is to avoid the memcpy that currently happens in
cbs_h2645_fragment_add_nals during the reading stage of cbs_h2645. This is
done by taking advantage of the way ff_h2645_packet_split (and internally
ff_h2645_extract_rbsp) works: If the NAL initially didn't contain any 0x03
escapes, then no copying is performed and the data and raw_data pointers of
the returned NAL unit coincide; otherwise the data is part of the H2645Packet's
rbsp_buffer. (A few trailing zeros in between NAL units can also make
ff_h2645_extract_rbsp copy. This happens with some transport streams.)

So my first patch simply tests whether a NAL unit's data and raw_data pointers
agree; if so, then the data is part of the fragment's data and therefore we can
use the fragment's data_ref. Given that 0x03 escape bytes are rare, this already
gives a noticeable speed boost. Notice that the NAL->data pointer is a pointer
to const uint8_t, whereas a CBS unit's data is not const-qualified, but this
doesn't pose any problems: Even if one wanted to modify the data (if I am not
mistaken, then this doesn't happen right now, because all modifications are done
on the content), one would have to ensure that the data is writable before one
does so.

In order to avoid copying the other NAL units, too, I had to write an analogue
of av_fast_malloc for buffers: avpriv_buffer_fast_alloc. I chose avpriv,
because it is easier to add a function to the public API than to remove it. And
of course I also had to modify ff_h2645_packet_split slightly to work with it.
Notice that the cbs-filters all uninitialize the fragment when they are done
processing a packet, so that the rbsp_buffer will be writable again when
decomposing the next so that the number of reallocations for rbsp_buffer is not
higher than now. (The only exception to this is H.264/HEVC content in mp4/Matroska where the SPS (or VPS in case of HEVC) in the extradata contain escape 0x03.
This often happens with typical PAL framerates if they are written in the VUI.)

For lots of content, the gain that this last change yields is negligible, but
there is one kind of material that really benefits from it: Content with
hardcoded black bars. See the benchmarks.

In both situations, there is padding at the end of the new data, but the padding
isn't zeroed. I don't see a problem with this and anyway, this is the same as in
cbs_mpeg2, where the padding at the end of a unit is actually the beginning of
the next unit (except for the last unit of a packet, of course).

I have also modified the documentation of ff_h2645_packet_split to document the
behaviour that cbs_h2645 now relies upon.

Here are benchmarks where the timer includes both the calls to
ff_h2645_packet_split as well as cbs_h2645_fragment_add_nals in
cbs_h2645_split_fragment. Due to the change in ff_h2645_packet_split this is
the only admissible way of comparing when all patches are applied:

A 5.1 Mb/s file with 50p, no hardcoded black bars and 8 runs of 262144 runs each; one slice per frame:
Current version:     107737 Decicycles
First patch applied:  76169 Decicycles
All patches applied:  75837 Decicycles

A 7.8 Mb/s file with 50p, hardcoded black bars, one slice per frame. 8 runs of
131072 runs each.
Current version:     379114 Decicycles
First patch applied: 369410 Decicycles
All patches applied: 327677 Decicycles

If one only measures the call to cbs_h2645_fragment_add_nals, the difference
gets bigger, of course. Because of the modifications to ff_h2645_packet_split
no benchmarks for the whole patchset are given.

First file again:
Current version:      36940 Decicycles
First patch applied:   6364 Decicycles

Second file again:
Current version:      60532 Decicycles
First patch applied:  48801 Decicycles

Andreas Rheinhardt (4):
  cbs_h2645: Avoid memcpy when splitting fragment
  avutil/buffer: Add av_fast_malloc equivalent
  h2645_parse: Make ff_h2645_packet_split reference-compatible
  cbs_h2645: Avoid memcpy when splitting fragment #2

 libavcodec/cbs_h2645.c             | 45 +++++++++++++++---------------
 libavcodec/cbs_h2645.h             |  2 ++
 libavcodec/extract_extradata_bsf.c |  4 +--
 libavcodec/h2645_parse.c           | 28 +++++++++++++++----
 libavcodec/h2645_parse.h           | 14 ++++++++--
 libavcodec/h264_parse.c            |  4 +--
 libavcodec/h264dec.c               |  6 ++--
 libavcodec/hevc_parse.c            |  5 ++--
 libavcodec/hevc_parser.c           |  4 +--
 libavcodec/hevcdec.c               |  4 +--
 libavutil/buffer.c                 | 37 ++++++++++++++++++++++++
 libavutil/buffer.h                 | 19 +++++++++++++
 12 files changed, 128 insertions(+), 44 deletions(-)


More information about the ffmpeg-devel mailing list