[FFmpeg-user] Advice on optimizing bitstream filter usage - unexplained frame count differences

Gia Ferrari g at giferrari.net
Sat May 11 00:55:44 EEST 2024


I feel I've come up against the limits of my ability to narrow down the cause of a surprising behavior related to bitstream filters, and could use a pointer or two on where to focus my learning. A general description of the system challenge I'm trying to solve is at the bottom of this message if you're curious, but my primary interest is in understanding why my solution is yielding odd results.

Source file is HEVC, containing 1557 keyframes (and many more non-keyframes). I'd like to select only the keyframes before decoding (CPU is scarce), rescale the selected frames, and write to a new MP4.

$ ffmpeg -i test_input.mp4 -bsf:v "noise=drop=not(key)" -vf "scale=320:-1" -an -fps_mode vfr -enc_time_base -1 -c:v libx265 -crf 32 -preset ultrafast test_output_combined_command.mp4

This takes 5 minutes and is CPU-bound. It also generates much fewer frames than I expect, resulting in choppy video. If I split up the operation into two separate commands, the operation completes in 20 seconds and results in a visually-correct video.

$ ffmpeg -i test_input.mp4 -bsf:v "noise=drop=not(key)" -an -fps_mode vfr -enc_time_base -1 -c:v copy test_intermediate.mp4
$ ffmpeg -i test_intermediate.mp4 -vf "scale=320:-1" -fps_mode vfr -c:v libx265 -crf 32 -preset ultrafast test_output_split_command.mp4

Checking the 3 output files with ffprobe reveals some surprising results:

$ ffprobe -v error -select_streams v:0 -count_frames -count_packets -show_entries stream=nb_read_frames,nb_read_packets -of flat test_output_combined_command.mp4

$ ffprobe -v error -select_streams v:0 -count_frames -count_packets -show_entries stream=nb_read_frames,nb_read_packets -of flat test_intermediate.mp4
$ ffprobe -v error -select_streams v:0 -count_frames -count_packets -show_entries stream=nb_read_frames,nb_read_packets -of flat test_output_split_command.mp4

I'm at a bit of a loss to explain these differences. Note that the second ffprobe command (on test_intermediate.mp4) took multiple minutes to run, while the others were almost instantaneous. test_intermediate.mp4 is also nearly the same filesize as the input file, despite containing one thirtieth of the frames.

I feel like I'm missing a fundamental concept here. Why does one approach generate more frames than the other? Is noise=drop causing empty packets to be written somewhere? Thanks for any and all hints.

Application context:
A system I'm working on generates large HEVC MP4s on disk. These are aged out over time to conserve space. When that happens, I'd like to generate low-fps (anything above 0.25FPS is fine) low-resolution (320:180 is enough, believe it or not) versions of these files to use elsewhere. I've got meager CPU resources to dedicate to this. I've made the following observations and mitigations:

- Decoding the full stream takes significant CPU, too much for me to spare. To get around this, I drop all non-keyframes before decoding via -bsf:v "noise=drop=not(key)". This brings down the cost to acceptable levels. The source of the video files reliably includes a keyframe every 2 seconds. Hardware decoding options are not available in my environment.
- Encoding at the resolution I need (320:180) using the veryfast or ultrafast preset is very much fast enough for my needs, provided the input framerate is suitably reduced as per above.
- Scaling the video down to 320:180 is also fast enough, again provided input framerate reduction.

Gia Ferrari (she/they)

Sent with [Proton Mail](https://proton.me/) secure email.

More information about the ffmpeg-user mailing list