Preface - Teasers - Enhanced Terminology - Reference - Encoding of DVD & Bluray Content - About Audio - Recovering The Camera Shots
Basic Primitives - Pulldown Primitives - Advanced Interpolations - Seen In The Wild, Repairing Video


24' is a contraction of "24/1.001" (aka 23.976). Tip: If used as a parameter, some tools require "24000/1001" (i.e. "24/1.001" as a rational number).

30' is a contraction of "30/1.001" (aka 29.970). Tip: If used as a parameter, some tools require "30000/1001" (i.e. "30/1.001" as a rational number).

60' is a contraction of "60/1.001" (aka 59.940). Tip: If used as a parameter, some tools require "60000/1001" (i.e. "60/1.001" as a rational number).

x/1.001 is 1: A video slowdown x-factor for 24fps to 24'fps, or 30fps to 30'fps conversions, or 2: An audio downsampling x-factor for 24'fps to 24fps, or 30'fps to 30fps conversions.

x/0.96 is 1: A video speedup x-factor for 24fps to 25fps conversions, or 2: An audio upsampling x-factor for 25fps to 24fps conversions.

x1.001 is 1: A video speedup x-factor For 24'fps to 24fps, or 30'fps to 30fps conversions, or 2: An audio upsampling x-factor For 24fps to 24'fps, or 30fps to 30'fps conversions.

x0.96 is 1: A video slowdown x-factor for 25fps to 24fps conversions, or 2: An audio downsampling x-factor for 24fps to 25fps conversions.

[note 1] 24'fps & 30'fps are forced -- imposed by metadata -- frame rates, but the actual (original) pictures are 24pps & 30pps respectively. However, 60'sps is the actual (original) field rate of NTSC TV cameras.

Cosmetics are tools that treat combing and judder as picture defects to be repaired by altering pixels. Note: The presented notation does not address cosmetics.

Mechanics: are tools that treat combing and judder as structure defects to be repaired by adding, dropping, or re-sequencing pictures and/or half-pictures. Note: The presented notation solely addresses mechanics.
Tip: When remuxing or transcoding, the most fruitful approach is to, as best as possible, 1: Convert a source stream to the original (shooting) camera stream, then 2: Convert the camera stream to the desired target stream. The objective of the notation is to illustrate or document how to do that.
Tip: When repairing or transcoding video, a winning strategy is to correct its mechanics prior to applying cosmetics. Usually, once mechanical corrections have been made, flaws that appeared to have been cosmetic are revealed to have been mechanical so that, if needed at all, fewer and milder cosmetics can be applied. Mechanical repairs can sometimes be accomplished without transcoding whereas cosmetic repairs mandate transcoding.

Pictures, Half Pictures, and Scans.

picture: A full height, full aspect image in which all samples have been shot at a single moment. By convention, successive pictures are denoted A B etc. Pictures have no timing information except by context. Note that frames are containers for encoded pictures, not the pictures themselves, therefore, pictures should not be called "frames".

quasi picture: A picture formed by interlacing scans. By convention, consecutive quasi pictures are denoted Ab Cd etc., and depicted (A+b)(C+d) etc. (in order to convey their scan origins). Quasi pictures have no timing information except by context.

picture stream, 1: Any sequence of raw pictures shot (or decoded) at intervals of 1/pps seconds; 2: The depiction of a picture stream in a diagram.

cinema: A picture stream shot at 24pps (i.e. with 1/24 second intervals). Commercial DVDs & Blurays never contain cinema. Instead, commercial DVDs & Blurays contain one of the following.
1: cinema-at-24'fps          [note 1], NOTATION: [24pps]24'fps                                           & 48KHz[x1.001]48KHz [note 6]
2: cinema-at-24'fps-soft     [note 2], NOTATION: [24pps]24'fps                                           & 48KHz[x1.001]48KHz [note 6]
3: cinema-at-25fps-forced    [note 3], NOTATION: [24pps]25fps                                            & 48KHz[x0.96]48KHz
4: cinema-at-25fps-telecined [note 4], NOTATION: [24pps__48hps__(Aa-Xx)(Aa-LlLm-WxXx)=50hps__25pps]25fps
5: cinema-at-30'fps          [note 5], NOTATION: [24pps__48hps__(Aa-Dd)(AaBbBcCdDd)=60hps__30pps]30'fps  & 48KHz[x1.001]48KHz
[note 1] Cinema-at-24'fps is cinema that is framed with metadata that forces play at 24'fps (with 1001/24000 second intervals). Forcing 24pps to 24'fps stretches running times by 3.6 seconds per hour of cinema [note 7].

[note 2] Cinema-at-24'fps-soft is cinema that is framed with metadata that forces both telecine via 2-3-2-3 pulldown -- also called soft telecine -- and play at 30'fps (with 1001/30000 second intervals). As with cinema-at-24'fps, forcing also stretches running times by 3.6 seconds per hour of cinema. Some players force them to 24'pps and some players telecine them via 2-3-2-3 pulldown and then force them to 30'fps. Either forcing method stretches running times by 3.6 seconds per hour of cinema [note 7].

[note 3] Cinema-at-25fps-forced is cinema that is framed with metadata that forces play at 25fps (with 1/25 second intervals) -- also called PAL speedup. The forcing shrinks running times by 2 minutes 24 seconds per hour of cinema [note 8].

[note 4] Cinema-at-25fps-telecined is cinema that is telecined via Euro pulldown and plays (unforced) at 25fps (with 1/25 second intervals). Running times and audio are unaffected and unaltered. Players could easily detelecine and play them at 24pps but generally don't bother.

[note 5] Cinema-at-30'fps-hard is cinema that is telecined via 2-3-2-3 pulldown -- also called hard telecine -- then framed with metadata that forces play at 30'fps (with 1001/30000 second intervals). As with cinema-at-24'fps, forcing also stretches running times by 3.6 seconds per hour of cinema [note 7].

[note 6] Yes, cinema-at-24'fps and cinema-at-24'fps-soft have the same notion. They have the same notation because mechanically, they have the same picture & frame & audio structures and factors. The difference is: 24'fps-soft requests decoders to also 2-3-2-3 pulldown pictures to display them at 30'pps.

[note 7] No players play cinema-at-24'fps or cinema-at-24'fps-soft or cinema-at-30'fps-hard at 24pps (i.e. unforced) because the included audio has been upsampled by x1.001 in advance in order to synchronize the audio to the forced video, so unforcing the video without also downsampling the audio would produce unsynchronized audio. Players could unforce them while also correcting the audio -- a somewhat time consuming process -- but they don't bother, probably because correcting the audio would affect only running time and audio tonality but wouldn't otherwise be noticed.

[note 8] No players play cinema-at-25fps-forced at 24pps (i.e. unforced) because the included audio has been downsampled by x0.96 in advance in order to synchronize the audio to the forced video, so unforcing the video without also upsampling the audio would produce unsynchronized audio. Players could unforce the video while also correcting the audio -- a somewhat time consuming process -- but they don't bother, probably because correcting the audio would affect only running time and audio tonality but wouldn't otherwise be noticed.

cinema running time (aka theatrical running time) is the running time at 24fps. Once a video's picture & frame structure has been identified as cinema (see "cinema"), cinema running time can be calculated.
For cinema-at-24'fps,          cinema running time = player running time x 1.001
For cinema-at-24'fps-soft,     cinema running time = player running time x 1.001
For cinema-at-25fps-forced,    cinema running time = player running time x 0.96
For cinema-at-25fps-telecined, cinema running time = player running time
For cinema-at-30'fps,          cinema running time = player running time x 1.001

pps (pictures per second): Picture rate. "pps" can explicitly appear at any point in a notation but will usually appear solely as the 1st element. Thereafter, "pps" should not appear in a notation unless new pictures are added (for example, "24pps__120pps", which is an interpolate-x5 conversion) and/or existing pictures are dropped. If pps is immediately followed by a differing fps, the pictures are displayed with speedup (for example, "[24pps]25fps") or with slowdown (for example, "[24pps]24'fps").

speedup is a video rate x-factor: fps/pps, that is greater than one. Unfortunately, speedup is sometimes erroneously expressed as a differential: (fps-pps)/fps, or as a differential percentage. The contrasting forms of speedup and contrasting interpretations mean that any expression that claims a speedup should be carefully examined. Only one form of speedup: cinema-at-25fps-forced, is found on commercial DVDs & Blurays.
For example, for cinema-at-25fps-forced (notation: [24pps]25fps) a camera shoots 24pps which is subsequently framed at 25fps. The x-factor is x1.041[6..] but can be expressed as x25/24 or x/0.96 -- all 3 are equally valid expressions. This compendium uses "x/0.96". Unfortunately, 24pps to 25fps speedup is sometimes expressed as "4% speedup". Many people are mislead by "4%" because neither (24pps)x(4%) nor (24pps)x(104%) are correct -- if you think about it, "4% speedup" (which implies that the frames are sped up) doesn't make sense because it's really the pictures in the frames that are actually sped up.
Tip: Don't confuse the x/0.96 video speedup x-factor with the x0.96 audio downsample x-factor -- See "About Audio", "48KHz[x0.96]48KHz".

slowdown is a video rate x-factor: fps/pps, that is less than one. Unfortunately, slowdown is sometimes erroneously expressed as a differential: (fps-pps)/fps, or as a differential percentage. The contrasting forms of slowdown and contrasting interpretations mean that any expression that claims a slowdown should be carefully examined. Only two forms of slowdown: cinema-at-24'fps & cinema-at-30'fps, are found on commercial DVDs & Blurays.
Example: For cinema-at-24'fps (notation: [24pps]24'fps), a camera shoots 24pps which is subsequently framed at 24'fps. The x-factor is x0.999[000999..] but can be expressed as x1000/1001 or x/1.001 -- all 3 are equally valid expressions. This compendium uses "x/1.001". Unfortunately, 24pps to 24'fps slowdown is sometimes expressed as "0.1% slowdown". Many people are mislead by "0.1%" because neither (24pps)x(0.1%) nor (24pps)x(100.1%) are correct -- if you think about it, "0.1% slowdown" (which implies that the frames are slowed down) doesn't make sense because it's really the pictures in the frames that are actually slowed down.
Tip: Don't confuse the x/1.001 video slowdown x-factor with the x1.001 audio upsample x-factor -- See "About Audio", "48KHz[x1.001]48KHz".

picture diagram: A string of symbols in parentheses that depict the contents of a pps stream. For example, the first 10 pictures of a pps stream are diagrammed as pictures, thusly: (A)(B)(C)(D)(E)(F)(G)(H)(I)(J), or as weaved halfpic pairs, thusly: (A+a)(B+b)(C+c)(D+d)(E+e)(F+f)(G+g)(H+h)(I+i)(J+j), or, with the understanding that the resulting pictures are combed, as quasi pictures that contain halfpic pairs that, in turn, are made from scan pairs, thusly: (A+b)(C+d)(E+f)(G+h)(I+j)(K+l)(M+n)(O+p)(Q+r)(S+t).

halfpic (half-picture): A half height, 2x aspect image unweaved from a picture or quasi picture, or decoded from a scan. Halfpics have no timing information except by context. The word "Halfpic" is a useful fiction -- a fiction because no such word is found in MPEG specifications, and useful because it concisely differentiates it from pictures (which are found in MPEG specifications). By convention, halfpics unweaved from pictures are denoted A a B b etc. and depicted (A)(a)(B)(b) etc. while halfpics unweaved from quasi pictures or scans are denoted A b C d etc. and depicted (A)(b)(C)(d) etc. Halfpics unweaved from quasi pictures provide a way to access the original scan images, but halfpics are not fields and should not be called "fields".

top halfpic: A halfpic copied from picture or quasi picture lines 1 3 5 etc. By convention, top halfpics unweaved from pictures A B etc. are also denoted A B etc., and top halfpics unweaved from quasi pictures A C etc. are also denoted A C etc. To resolve confusion between pictures and top halfpics, consider this: If a stream is denoted "pps", then A denotes the 1st picture, but if a stream is denoted "hps", then A denotes the 1st halfpic.

bottom halfpic: A halfpic copied from picture or quasi picture lines 2 4 6 etc. By convention, bottom halfpics unweaved from pictures A B etc. are denoted a b etc., and bottom halfpics unweaved from quasi pictures A C etc. are denoted b d etc.

halfpic stream, 1: Any sequence of raw halfpics; 2: The depiction of a halfpic stream in a diagram.

unweave, 1: To copy halfpics from pictures as discrete images; 2: To import a pps stream and export a hps stream. Unweaving should not be called "deinterlacing".

weave, 1: To copy-combine two halfpics to create a picture; 2: To import an hps stream and export a pps stream. Weaving halfpics that originate from differing pictures of a motion sequence (for example, during pulldown) creates a combed picture. Weaving should not be called "interlacing".

halfpic diagram: A string of symbols in parentheses that depict the contents of a halfpic stream. By convention, an hps stream derived from pictures (in which halfpic pairs (A) & (a) for example were shot at the same moment) is diagrammed, for its 1st 10 members, thusly: (A)(a)(B)(b)(C)(c)(D)(d)(E)(e), whereas an hps stream derived from scan fields (in which halfpic pairs (A) & (b) for example were shot at consecutive moments) is diagrammed, for its 1st 10 members, thusly: (A)(b)(C)(d)(D)(e)(F)(g)(H)(i).

scan: A half height, 2x aspect image that is part of a scan field stream. Note that once they are decoded, scans become halfpics. By convention, successive scan fields are denoted [A] [b] [C] [d] etc. Scans have no timing information except by context. Note that fields are containers for encoded scans, not the scans themselves, therefore, referring to decoded scans as "fields" is misleading.

1st scan: The scan appearing 1st in each pair of scans in strides and diagrams. A 1st scan can be a top scan or a bottom scan. Ignoring horizontal & vertical retrace times, suppose the 1st sample of the 1st line of a 1st scan is shot at time zero. If so, then the last sample of the last line of a 1st scan is shot at time 1/sps seconds. Thus, the lines of a 1st scan are shot at incremental times, not all at the same time. Sample-to-sample incremental shooting times can be (and usually are) ignored, and 1st scans are encoded and decoded as though they all occur at time zero.

2nd scan: The scan appearing 2nd in each pair of scans in strides and diagrams. A 2nd scan can be a top scan or a bottom scan. Ignoring horizontal & vertical retrace times, the 1st sample of the 1st line of a 2nd scan is shot at time 1/sps seconds, and the last sample of the last line of a 2nd scan is shot at time 2/sps seconds. In other words, the lines of a 2nd scan are shot at incremental times, not all at the same time. Sample-to-sample incremental shooting times can be (and usually are) ignored, and 2nd scans are encoded and decoded as though they all occur at time 2/sps seconds.

top scan: A scan that contributes sample lines 1 3 5 etc. to the quasi picture created when scans are decoded to halfpics and those halfpics are weaved. That a scan is a top scan is stored as 'top_field_first' metadata. If 'top_field_first' has the value '1', it claims that the 1st scan is a top scan. If that claim is wrong, the encoding is field-swapped (which is an encoding error).

bottom scan: A scan that contributes sample lines 2 4 6 etc. to the quasi picture created when scans are decoded to halfpics and those halfpics are weaved. That a scan is a bottom scan is stored as 'top_field_first' metadata. If 'top_field_first' has the value '0', it claims that the 1st scan is a bottom scan. If that claim is wrong, the encoding is field-swapped (which is an encoding error).

scan stream, 1: Any sequence of scans shot (or decoded) at intervals of 1/sps seconds; 2: The depiction of a scan stream in a diagram.

sps (scans per second): Scan rate.

sps stream: A scan stream.

scan diagram: A string of symbols in parentheses that depict the contents of a scan stream. For example, the first 10 scans of an sps stream are diagrammed as scans, thusly: [A][b][C][d][E][f][G][h][I][j].

Frames and Fields.

frame: An elemental stream container containing a coded picture, a coded quasi picture, or a pair of coded fields. Frames add timing information: PTSs (presentation time stamps) and DTSs (decoder time stamps), as decoder metadata. Unlike CBR (constant bit rate) video (in which frames per second depends on compression), DVD & Bluray videos are always CFR (constant frame rate).

frame # (frame number): Because there is no such thing as a zeroth frame, when tools (and programmers) refer to frame 0, they always mean the frame at frame index 0 (i.e. frame #1). For example, when FFprobe refers to "frames.frame.0", ".0" means index 0. When a tool refers to a frame number that is non-zero, and if frame number is important, then proceed with caution. Look for clues that can resolve whether the tool means frame # or frame index. Tip: PTS provides a more relable reference to frames.

TB (time base): A video transport stream's time base, calculated as 1/(system clock). All MPEG2-TSs (MPEG2 transport streams) found on all commercial DVDs & Blurays have a 90000Hz system clock. All commercial DVDs & Blurays therefore have TB = 1/90000 seconds per tick (i.e. 11.[1..] nanoseconds per tick). Unfortunately, literature is often found that calls "90000Hz" a time base instead of a system clock, so proceed with caution in calculations that depend on the value of TB.

PTS (presentation time stamp). Each frame in a frame stream is assigned a PTS in its metadata [note 1]. A frame's PTS is its time of occurance relative to the system clock. All commercial DVDs & Blurays employ an MPEG2-TS (transport stream) that utilizes a 90000Hz system clock. So, for MPEG2, for a particular video's fps, PTS interval (aka deltaPTS) is computed:
PTS interval, ticks per frame = (system clock, ticks per second) / (video FPS, frames per second) = 1 / TB / FPS
Individual frame PTSs are then computed as follows:
PTS-of-N = N x (PTS interval), where 'N' is frame index (0 1 2 ..).
If a source video's PTS-of-1 is non-zero, the video has a mastering flaw, it was probably clipped improperly from a longer video.

[note 1] Recommendation: When encoding a target video, always set the target's PTS-of-frame-1 to zero prior to populating the target's frames.

PTS resolution. Ideally, PTSs are always integers -- if not, they are truncated to integers. To avoid truncation during video processing, TB (time base) can be temporarily forced to provide higher resolution. For example, forcing TB from 1/90000 to 1/360000 will maximize resolution [note 1] for a wide range of frame & picture rates, including all PTSs found on DVDs & Blurays (and even for pictures at 120'pps). The following table lists key information about PTSs and values of TB.
                      PTS intervals for various TBs    ...[note 1]
              TB:    1/90000    1/180000    1/360000
          24'fps:    3753.75     7507.5      15015
           24fps:    3750        7500        15000
           25fps:    3600        7200        14400
          30'fps:    3003        6006        12012
           30fps:    3000        6000        12000
          60'fps:    1501.5      3003         6006
           60fps:    1500        3000         6000
         120'fps:     750.75     1501.5       3003
          120fps:     750        1500         3000
max running time:   13:15:21     6:37:40     3:18:50   ...[note 2]
[note 1] The values of PTS have units: ticks. In most tools, PTS values are limited to 32 bits -- max value: 4294967295 ticks. The values of PTS intervals have units: ticks-per-frame. Ticks and ticks-per-frame and frames-per-second can be combined to compute max running time as follows:
Max running time, seconds = (4294967295 ticks) / (PTS interval, ticks per frame) / (FPS, frames per second).
Actual PTSs on commercial DVD & Bluray discs (i.e. in MPEG2-TSs) are 33 bits -- max value: 8589934591 ticks -- that, for 24'fps video, can support up to 2288360 frames (running times up to 26:30:43). Since MPEG-2 videos can be longer than 32 bits support, it is possible to encounter videos that can't be handled by existing video tools. What saves the situation is that such videos do not physically fit on DVD or Bluray discs.

[note 2] At 1/360000, TB can resolve 2.[7..] nanoseconds. In all known video tools, TB is a 32-bit integer, so it has a tick limit of 4294967295. TB = 1/360000 can therefore support running times up to 3:18:50
Max running time, seconds = (4294967295 ticks) / (360000 ticks per second) = 11930.464708[3..] seconds = 3:18:50.464708[3..].
For videos longer than 3:18:50 but less than 6:37:40, cut TB to 1/180000 and avoid making 24'fps or 120'fps videos.
Tip: Before changing TB, check whether the change will produce interger PTSs as follows:
PTS interval check = 1 / TB / FPS.
For example:
1 / (1/180000) / 24'fps = 180000 / (24/1.001) = 180000 x 1.001 / 24 = 7507.5 fails the check whereas
1 / (1/360000) / 24'fps = 360000 / (24/1.001) = 360000 x 1.001 / 24 = 15015 passes the check.

interlace: "The property of conventional television frames where alternating lines of the frame represent different instances in time." -- source: ITU-T H.262, 3.74. Author's comment: The ITU (MPEG) definition assumes readers know what "conventional television frames" means. Also, it is incomplete and somewhat misleading because halfpics can also originate from television frames -- scan fields, actually -- and different instances in time can originate in otherwise progressive video due to pulldown.

progressive: "The property of film frames where all the samples of the frame represent the same instances in time." -- source: ITU-T H.262, 3.103. Author's comment: The ITU (MPEG) definition implies that the combed pictures in pulldown are not progressive but in other sections of H.262, it clearly indicates that they are. The ITU (MPEG) seems to have a problem delineating individual frames from frame streams as though progressive streams can have frames that are not progressive.

frame stream, 1: A sequence of frames in an elemental stream; 2: The depiction of a frame stream in a diagram.

fps (frames per second): Frame rate.

fps stream: A frame stream.

frame diagram: A string of symbols in brackets that depict the contents of a frame stream. For example, the first 10 frames of an fps stream are diagrammed as framed pictures, thusly: [A][B][C][D][E][F][G][H][I][J], or as framed halfpic pairs, thusly: [A+a][B+b][C+c][D+d][E+e][F+f][G+g][H+h][I+i][J+j], or as framed scan pairs, thusly: [A+b][C+d][E+f][G+h][I+j][K+l][M+n][O+p][Q+r][S+t]. Technically, the contents of the frames found on commercial DVDs & Blurays are always pictures. Showing them as halfpic pairs or scan pairs is intended to further define the contained images in terms of their origins and to aid the visualization of how they look when played.

field: An encoded scan. The 1st field in a frame must be simultaneous with its frame, otherwise, the encoding is field-swapped (which is an encoding error).

top_field_first: A metadata bit that, if '1', claims that the 1st field contains a top scan, or if '0', claims that the 1st field contains a bottom scan. If the claim is wrong, the encoding is field-swapped (which is an encoding error), and implementing the following halfpic notation will correct it: (bA)(Ab).