[FFmpeg-user] Reducing image2pipe png decoder latency

Maxim Khitrov max at mxcrypt.com
Mon Dec 16 18:30:36 EET 2019

On Mon, Dec 16, 2019 at 11:04 AM Ted Park <kumowoon1025 at gmail.com> wrote:
> > Losing the alpha channel is not ideal. I can do that by encoding both
> > images to JPG and blending the two together, but the top image is
> > mostly blank, so that just results in a dark background. Send the
> > background, overlay, and a separate grayscale alpha mask for the
> > overlay, all as JPGs, is another option, but that increases image
> > encoding time and bandwidth requirements.
> I can’t say for sure since I don’t know what your source is, but there probably was no alpha channel to begin with. If the top image is blank maybe you want to switch the order and/or use a different blend mode? As you say, generating a mask and applying it will be too complicated if 10 frames of delay is unacceptable.

The background JPG is a video capture, the overlay is a HUD-like UI,
which is mostly transparent except for the actual UI elements.
Anything I do with color-keying or blend modes is still going to be
suboptimal relative to preserving the original alpha channel, which is
definitely there in the source.

> > Also, it looks like I made a mistake when testing PNG-only stream. I'm
> > now seeing the same 10+ frame latency with a single PNG input as with
> > JPG background and PNG overlay, which actually makes me feel a little
> > better since that eliminates the filter graph as the source of the
> > delay (unless it's the RGB -> YUV conversion?). I think it has to be
> > the PNG decoder.
> What exactly is this latency being measured as, by the way? I thought it was between the two streams of images, but I guess not? Are you sure it’s not the network or the image generator taking longer?
> As long as you have a png in there the slower speed is unavoidable because of the conversion to rgb, but also the png stream is probably a lot bigger, so if you have to transfer it over a network to process (back into yuv it sounds like) you should probably try to keep it in yuv. Anything you do in rgb should be doable in yuv too with a little math, as long as you’re not rotoscoping things out or something.

My frame generator is writing two synchronized streams of images to
ffmpeg. Each pair of images is combined into one video frame. The
timecode of each frame is written both to stdout (just before the
images are submitted to ffmpeg) and burned into each image. I'm then
playing ffmpeg output video stream with ffplay. By taking a screenshot
of my desktop, I can see the current frame that was just written to
ffmpeg (from stdout) and the current frame that's being played in
ffplay. The difference is the total latency of the encoder pipeline.

The only variable I'm changing is whether the overlay image is encoded
as JPG vs PNG. Everything else is staying the same, and this is all
done locally right now, so no network latency. RGB to YUV conversion
has to happen somewhere because the source of the overlay is an RGB
OpenGL texture. Does it seem likely that this conversion alone would
account for 6+ additional frames of latency (200ms at 30 fps)?

The fact that the latency does not seem to depend on the size of the
image, with even 300x200 PNGs adding the same latency as 1920x1080,
leads me to believe that this is caused by buffering happening
somewhere in the png decoder.


More information about the ffmpeg-user mailing list