[FFmpeg-user] key frame

Fri Jun 28 00:55:06 EEST 2024

Am 27.06.24 um 22:39 schrieb Mark Filipak:
> Hello All,
>
> I'm considering buying professional video software to evaluate and
> analyze FFmpeg trims and splices and for troubleshooting. My objective
> is to improve my edits, and to improve FFmpeg. I'm retired, I have
> plenty of time, I have plenty of money.
>
> From here:
> https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/uc_system/design/guides/videodg/vidguide/basics.html
>
> "I-frames are also known as key frames because their content is
> independent of any other frames and they can be used as a reference
> for other frames."
>
> If "key frame" is simply another name for an I-frame, why are there
> two names? pdr0 & Balling at trac.ffmpeg.org hint that key frames are
> specific I-frames with specific methods but they don't elaborate and I
> don't want to burden them.
>
> I'd appreciate an explanation of 'key frame', or a link to an existing
> explanation of course. All I can find is keyframe animation, which of
> course is a technique, not an MPEG method.
>
> Thank you,
> Mark.
>
The basic, to improve the ITU terminology :

e.g.: ITU-T Rec. H.262 (2000 E),

"ISO/IEC 13818-2 : 2000 (E)
3
Definitions
For the purposes of this Recommendation | International Standard, the
following definitions apply.
3.1
AC coefficient: Any DCT coefficient for which the frequency in one or
both dimensions is non-zero.
3.2
big picture: A coded picture that would cause VBV buffer underflow as
defined in C.7. Big pictures can only
occur in sequences where low_delay is equal to 1. "Skipped picture" is a
term that is sometimes used to describe the
same concept.
3.3B-field picture: A field structure B-Picture.
3.4B-frame picture: A frame structure B-Picture.
3.5
B-picture; bidirectionally predictive-coded picture: A picture that is
coded using motion compensated
prediction from past and/or future reference fields or frames.
3.6
backward compatibility: A newer coding standard is backward compatible
with an older coding standard if
decoders designed to operate with the older coding standard are able to
continue to operate by decoding all or part of a
bitstream produced according to the newer coding standard.
3.7
backward motion vector: A motion vector that is used for motion
compensation from a reference frame or
reference field at a later time in display order.
3.8backward prediction: Prediction from the future reference frame (field).
3.9base layer: First, independently decodable layer of a scalable hierarchy.
3.10bitstream; stream: An ordered series of bits that forms the coded
representation of the data.
3.11bitrate: The rate at which the coded bitstream is delivered from the
storage medium to the input of a decoder.
3.12block: An 8-row by 8-column matrix of samples, or 64 DCT
coefficients (source, quantised or dequantised).
3.13
bottom field: One of two fields that comprise a frame. Each line of a
bottom field is spatially located
immediately below the corresponding line of the top field.
3.14
byte aligned: A bit in a coded bitstream is byte-aligned if its position
is a multiple of 8 bits from the first bit in
the stream.
3.15
byte: Sequence of 8 bits.
3.16
channel: A digital medium that stores or transports a bitstream
constructed according to ITU-T Rec. H.262 |
ISO/IEC 13818-2.
3.17
chrominance format: Defines the number of chrominance blocks in a
macroblock.
3.18
chroma simulcast: A type of scalability (which is a subset of SNR
scalability) where the enhancement layer(s)
contain only coded refinement data for the DC coefficients, and all the
data for the AC coefficients, of the chrominance
components.
3.19
chrominance component: A matrix, block or single sample representing one
of the two colour difference
signals related to the primary colours in the manner defined in the
bitstream. The symbols used for the chrominance
signals are Cr and Cb.
3.20coded B-frame: A B-frame picture or a pair of B-field pictures.
3.21coded frame: A coded frame is a coded I-frame, a coded P-frame or a
coded B-frame.
3.22
coded I-frame: An I-frame picture or a pair of field pictures, where the
first field picture is an I-picture and the
second field picture is an I-picture or a P-picture.
3.23
coded P-frame: A P-frame picture or a pair of P-field pictures.
3.24
coded picture: A coded picture is made of a picture header, the optional
extensions immediately following it,
and the following picture data. A coded picture may be a coded frame or
a coded field.
3.25
coded video bitstream: A coded representation of a series of one or more
pictures as defined in ITU-T
Rec. H.262 | ISO/IEC 13818-2.
3.26
coded order: The order in which the pictures are transmitted and
decoded. This order is not necessarily the
same as the display order.
3.27
coded representation: A data element as represented in its encoded form.
3.28
coding parameters: The set of user-definable parameters that
characterise a coded video bitstream. Bitstreams
are characterised by coding parameters. Decoders are characterised by
the bitstreams that they are capable of decoding.
3.29
component: A matrix, block or single sample from one of the three
matrices (luminance and two chrominance)
that make up a picture.
3.30compression: Reduction in the number of bits used to represent an
item of data.
3.31constant bitrate coded video: A coded video bitstream with a
constant bitrate.
3.32constant bitrate: Operation where the bitrate is constant from start
to finish of the coded bitstream.
3.33data element: An item of data as represented before encoding and
after decoding.
3.34
data partitioning: A method for dividing a bitstream into two separate
bitstreams for error resilience purposes.
The two bitstreams have to be recombined before decoding.
3.35D-Picture: A type of picture that shall not be used except in
ISO/IEC 11172-2.
3.36DC coefficient: The DCT coefficient for which the frequency is zero
in both dimensions.
3.37DCT coefficient: The amplitude of a specific cosine basis function.
3.38decoder input buffer: The First-In First-Out (FIFO) buffer specified
in the video buffering verifier.
3.39decoder: An embodiment of a decoding process.
3.40
decoding (process): The process defined in ITU-T Rec. H.262 | ISO/IEC
13818-2 that reads an input coded
bitstream and produces decoded pictures.
3.41
dequantisation: The process of rescaling the quantised DCT coefficients
after their representation in the
bitstream has been decoded and before they are presented to the inverse DCT.
3.42
digital storage media (DSM): A digital storage or transmission device or
system.
3.43
discrete cosine transform (DCT): Either the forward discrete cosine
transform or the inverse discrete
cosine transform. The DCT is an invertible, discrete orthogonal
transformation. The inverse DCT is defined in Annex A
of ITU-T Rec. H.262 | ISO/IEC 13818-2.
3.44
display aspect ratio: The ratio height/width (in spatial measurement
units such as centimeters) of the intended
display.
3.45
display order: The order in which the decoded pictures are displayed.
Normally this is the same order in
which they were presented at the input of the encoder.
3.46
display process: The (non-normative) process by which reconstructed
frames are displayed.
3.47
dual-prime prediction: A prediction mode in which two forward
field-based predictions are averaged. The
predicted block size is 16 × 16 luminance samples.
3.48
editing: The process by which one or more coded bitstreams are
manipulated to produce a
new coded bitstream. Conforming edited bitstreams must meet the
requirements defined in ITU-T Rec. H.262 |
ISO/IEC 13818-2.
3.49
encoder: An embodiment of an encoding process.
3.50
encoding (process): A process, not specified in ITU-T Rec. H.262 |
ISO/IEC 13818-2, that reads a
stream of input pictures and produces a valid coded bitstream as defined
in ITU-T Rec. H.262 | ISO/IEC 13818-2.
3.51
enhancement layer: A relative reference to a layer (above the base
layer) in a scalable hierarchy. For all forms
of scalability, its decoding process can be described by reference to
the lower layer decoding process and the appropriate
additional decoding process for the enhancement layer itself.
3.52
fast forward playback: The process of displaying a sequence, or parts of
a sequence, of pictures in display-
order faster than real-time.
3.53
fast reverse playback: The process of displaying the picture sequence in
the reverse of display order faster
than real-time.
3.54
field: For an interlaced video signal, a "field" is the assembly of
alternate lines of a frame. Therefore an
interlaced frame is composed of two fields, a top field and a bottom field.
3.55
field-based prediction: A prediction mode using only one field of the
reference frame. The predicted block
size is 16 × 16 luminance samples.
3.56
field period: The reciprocal of twice the frame rate.
3.57
field picture; field structure picture: A field structure picture is a
coded picture with picture_structure is
equal to "Top field" or "Bottom field".
3.58
flag: A one bit integer variable which may take one of only two values
(zero and one).
3.59
forbidden: The term "forbidden" when used in the clauses defining the
coded bitstream indicates that the value
shall never be used. This is usually to avoid emulation of start codes.
3.60
forced updating: The process by which macroblocks are intra-coded from
time-to-time to ensure that
mismatch errors between the inverse DCT processes in encoders and
decoders cannot build up excessively.
3.61
forward compatibility: A newer coding standard is forward compatible
with an older coding standard if
decoders designed to operate with the newer coding standard are able to
decode bitstreams of the older coding standard.
3.62
forward motion vector: A motion vector that is used for motion
compensation from a reference frame or
reference field at an earlier time in display order.
3.63
forward prediction: Prediction from the past reference frame (field).
3.64
frame: A frame contains lines of spatial information of a video signal.
For progressive video, these lines
contain samples starting from one time instant and continuing through
successive lines to the bottom of the frame. For
interlaced video, a frame consists of two fields, a top field and a
bottom field. One of these fields will commence one
field period later than the other.
3.65frame-based prediction: A prediction mode using both fields of the
reference frame.
3.66frame period: The reciprocal of the frame rate.
3.67
frame picture; frame structure picture: A frame structure picture is a
coded picture with picture_structure is
equal to "Frame".
3.68
frame rate: The rate at which frames are output from the decoding process.
3.69
future reference frame (field): A future reference frame (field) is a
reference frame (field) that occurs at a
later time than the current picture in display order.
3.70
frame re-ordering: The process of re-ordering the reconstructed frames
when the coded order is different
from the display order. Frame re-ordering occurs when B-frames are
present in a bitstream. There is no frame re-ordering
when decoding low delay bitstreams.
3.71
group of pictures: A notion defined only in ISO/IEC 11172-2 (MPEG-1
Video). In ITU-T Rec. H.262 |
ISO/IEC 13818-2, a similar functionality can be achieved by the mean of
inserting group of pictures headers.
3.72
header: A block of data in the coded bitstream containing the coded
representation of a number of data
elements pertaining to the coded data that follow the header in the
bitstream.
3.73
hybrid scalability: Hybrid scalability is the combination of two (or
more) types of scalability.
3.74
interlace: The property of conventional television frames where
alternating lines of the frame represent
different instances in time. In an interlaced frame, one of the field is
meant to be displayed first. This field is called the
first field. The first field can be the top field or the bottom field of
the frame.
3.75I-field picture: A field structure I-Picture.
3.76I-frame picture: A frame structure I-Picture.
3.77I-picture; intra-coded picture: A picture coded using information
only from itself.
3.78intra coding: Coding of a macroblock or picture that uses
information only from that macroblock or picture.
3.78.1Inverse DCT, IDCT: Inverse discrete cosine transform, as defined
in Annex A.
3.79
level: A defined set of constraints on the values which may be taken by
the parameters of ITU-T Rec. H.262 |
ISO/IEC 13818-2 within a particular profile. A profile may contain one
or more levels. In a different context, level is the
absolute value of a non-zero coefficient (see "run").
3.80
layer: In a scalable hierarchy denotes one out of the ordered set of
bitstreams and (the result of) its associated
decoding process (implicitly including decoding of all layers below this
layer).
3.81
layer bitstream: A single bitstream associated to a specific layer
(always used in conjunction with layer
qualifiers, e. g. "enhancement layer bitstream").
3.82
lower layer: A relative reference to the layer immediately below a given
enhancement layer (implicitly
including decoding of all layers below this enhancement layer).
3.83
luminance component: A matrix, block or single sample representing a
monochrome representation of the
signal and related to the primary colours in the manner defined in the
bitstream. The symbol used for luminance is Y.
3.84
Mbit: 1 000 000 bits.
3.85
macroblock: The four 8 by 8 blocks of luminance data and the two (for
4:2:0 chrominance format), four
(for 4:2:2 chrominance format) or eight (for 4:4:4 chrominance format)
corresponding 8 by 8 blocks of chrominance data
coming from a 16 by 16 section of the luminance component of the
picture. Macroblock is sometimes used to refer to the
sample data and sometimes to the coded representation of the sample
values and other data elements defined in the
macroblock header of the syntax defined in ITU-T Rec. H.262 | ISO/IEC
13818-2. The usage is clear from the context.
3.86
motion compensation: The use of motion vectors to improve the efficiency
of the prediction of sample values.
The prediction uses motion vectors to provide offsets into the past
and/or future reference frames or reference fields
containing previously decoded sample values that are used to form the
prediction error.
3.87
motion estimation: The process of estimating motion vectors during the
encoding process.
3.88
motion vector: A two-dimensional vector used for motion compensation
that provides an offset from the
coordinate position in the current picture or field to the coordinates
in a reference frame or reference field.
3.89
non-intra coding: Coding of a macroblock or picture that uses
information both from itself and from
macroblocks and pictures occurring at other times.
3.90opposite parity: The opposite parity of top is bottom, and vice versa.
3.91P-field picture: A field structure P-Picture.
3.92P-frame picture: A frame structure P-Picture.
3.93
P-picture; predictive-coded picture: A picture that is coded using
motion compensated prediction from past
reference fields or frame.
3.94
parameter: A variable within the syntax of ITU-T Rec. H.262 | ISO/IEC
13818-2 which may take one of a
range of values. A variable which can take one of only two values is
called a flag.
3.95
parity (of field): The parity of a field can be top or bottom.
3.96
past reference frame (field): A past reference frame (field) is a
reference frame (field) that occurs at an earlier
time than the current picture in display order.
3.97
picture: Source, coded or reconstructed image data. A source or
reconstructed picture consists of three
rectangular matrices of 8-bit numbers representing the luminance and two
chrominance signals. A "coded picture" is
defined in 3.21 of ITU-T Rec. H.262 | ISO/IEC 13818-2. For progressive
video, a picture is identical to a frame, while
for interlaced video, a picture can refer to a frame, or the top field
or the bottom field of the frame depending on the
context.
3.98
picture data: In the VBV operations, picture data is defined as all the
bits of the coded picture, all the
header(s) and user data immediately preceding it if any (including any
stuffing between them) and all the stuffing
following it, up to (but not including) the next start code, except in
the case where the next start code is an end of
sequence code, in which case it is included in the picture data.
3.99
prediction: The use of a predictor to provide an estimate of the sample
value or data element currently being
decoded.
3.100prediction error: The difference between the actual value of a
sample or data element and its predictor.
3.101predictor: A linear combination of previously decoded sample values
or data elements.
3.102profile: A defined subset of the syntax of ITU-T Rec. H.262 |
ISO/IEC 13818-2.
NOTE – In ITU-T Rec. H.262 | ISO/IEC 13818-2, the word "profile" is used
as defined above. It should not be confused with
other definitions of "profile" and in particular it does not have the
meaning that is defined by JTC1/SGFS.
3.103
time.progressive: The property of film frames where all the samples of
the frame represent the same instances in
3.104quantisation matrix: A set of sixty-four 8-bit values used by the
dequantiser.
3.105
quantised DCT coefficients: DCT coefficients before dequantisation. A
variable length coded representation
of quantised DCT coefficients is transmitted as part of the coded video
bitstream.
3.106
quantiser scale: A scale factor coded in the bitstream and used by the
decoding process to scale the
dequantisation.
3.107
random access: The process of beginning to read and decode the coded
bitstream at an arbitrary point.
3.108
reconstructed frame: A reconstructed frame consists of three rectangular
matrices of 8-bit numbers
representing the luminance and two chrominance signals. A reconstructed
frame is obtained by decoding a coded frame.
3.109
reconstructed picture: A reconstructed picture is obtained by decoding a
coded picture. A reconstructed
picture is either a reconstructed frame (when decoding a frame picture),
or one field of a reconstructed frame (when
decoding a field picture). If the coded picture is a field picture, then
the reconstructed picture is the top field or the
bottom field of the reconstructed frame.
3.110
reference field: A reference field is one field of a reconstructed
frame. Reference fields are used for forward
and backward prediction when P-pictures and B-pictures are decoded. Note
that when field P-pictures are decoded,
prediction of the second field P-picture of a coded frame uses the first
reconstructed field of the same coded frame as a
reference field.
3.111
reference frame: A reference frame is a reconstructed frame that was
coded in the form of a coded I-frame or
a coded P-frame. Reference frames are used for forward and backward
prediction when P-pictures and B-pictures are
decoded.
3.112
re-ordering delay: A delay in the decoding process that is caused by
frame re-ordering.
3.113
reserved: The term "reserved" when used in the clauses defining the
coded bitstream, indicates that the value
may be used in the future for ITU-T | ISO/IEC defined extensions.
3.114
sample aspect ratio (SAR): This specifies the relative distance between
samples. It is defined (for the
purposes of ITU-T Rec. H.262 | ISO/IEC 13818-2), as the vertical
displacement of the lines of luminance samples in a
frame divided by the horizontal displacement of the luminance samples.
Thus, its units are (metres per line) ÷ (metres per
sample).
3.115
scalable hierarchy: Coded video data consisting of an ordered set of
more than one video bitstream.
3.116
scalability: Scalability is the ability of a decoder to decode an
ordered set of bitstreams to produce a
reconstructed sequence. Moreover, useful video is output when subsets
are decoded. The minimum subset that can thus
be decoded is the first bitstream in the set which is called the base
layer. Each of the other bitstreams in the set is called
an enhancement layer. When addressing a specific enhancement layer,
"lower layer" refer to the bitstream which
precedes the enhancement layer.
3.117
side information: Information in the bitstream necessary for controlling
the decoder.
3.118
16 × 8 prediction: A prediction mode similar to field-based prediction
but where the predicted block size is
16 × 8 luminance samples.
3.119
run: The number of zero coefficients preceding a non-zero coefficient,
in the scan order. The absolute value of
the non-zero coefficient is called "level".
3.120
saturation: Limiting a value that exceeds a defined range by setting its
value to the maximum or minimum of
the range as appropriate.
3.121skipped macroblock: A macroblock for which no data is encoded.
3.122slice: A consecutive series of macroblocks which are all located in
the same horizontal row of macroblocks.
3.123
SNR scalability: A type of scalability where the enhancement layer(s)
contain only coded refinement data for
the DCT coefficients of the lower layer.
3.124source; input: Term used to describe the video material or some of
its attributes before encoding.
3.125
spatial prediction: Prediction derived from a decoded frame of the lower
layer decoder used in spatial
scalability.
3.126
spatial scalability: A type of scalability where an enhancement layer
also uses predictions from sample data
derived from a lower layer without using motion vectors. The layers can
have different frame sizes, frame rates or
chrominance formats.
3.127
start codes (system and video): 32-bit codes embedded in that coded
bitstream that are unique. They are used
for several purposes including identifying some of the structures in the
coding syntax.
3.128
stuffing (bits); stuffing (bytes): Code-words that may be inserted into
the coded bitstream that are discarded in
the decoding process. Their purpose is to increase the bitrate of the
stream which would otherwise be lower than the
desired bitrate.
3.129
temporal prediction: Prediction derived from reference frames or fields
other than those defined as spatial
prediction.
3.130
temporal scalability: A type of scalability where an enhancement layer
also uses predictions from sample data
derived from a lower layer using motion vectors. The layers have
identical frame size, and chrominance formats, but can
have different frame rates.
3.131
top field: One of two fields that comprise a frame. Each line of a top
field is spatially located immediately
above the corresponding line of the bottom field.
3.132top layer: The topmost layer (with the highest layer_id) of a
scalable hierarchy.
3.133variable bitrate: Operation where the bitrate varies with time
during the decoding of a coded bitstream.
3.134
variable length coding (VLC): A reversible procedure for coding that
assigns shorter code-words to frequent
events and longer code-words to less frequent events.
3.135
video buffering verifier (VBV): A hypothetical decoder that is
conceptually connected to the output of the
encoder. Its purpose is to provide a constraint on the variability of
the data rate that an encoder or editing process may
produce.
3.136
video sequence: The highest syntactic structure of coded video
bitstreams. It contains a series of one or more
coded frames.
3.137
xxx profile decoder: Decoder able to decode one or a scalable hierarchy
of bitstreams of which the top layer
conforms to the specifications of the xxx profile (with xxx being any of
the defined Profile names).
3.138
xxx profile scalable hierarchy: Set of bitstreams of which the top layer
conforms to the specifications of the
xxx profile.
3.139
xxx profile bitstream: A bitstream of a scalable hierarchy with a
profile indication corresponding to xxx. Note
that this bitstream is only decodable together with all its lower layer
bitstreams (unless it is a base layer bitstream).
3.140
zigzag scanning order: A specific sequential ordering of the DCT
coefficients from (approximately) the
lowest spatial frequency to the highest."

Gloster