[FFmpeg-trac] #10510(undetermined:new): FFMPEG decode 608 caption to webvtt with big cues. How to generate with smaller cues (i.e. gop size)?

Tue Aug 8 11:10:52 EEST 2023

#10510: FFMPEG decode 608 caption to webvtt with big cues. How to generate with
smaller cues (i.e. gop size)?
-------------------------------------+-------------------------------------
             Reporter:  Qi Cao       |                    Owner:  (none)
                 Type:  enhancement  |                   Status:  new
             Priority:  normal       |                Component:
                                     |  undetermined
              Version:  unspecified  |               Resolution:
             Keywords:  caption      |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
Description changed by Qi Cao:

Old description:

> Summary of the bug: We are using FFMPEG to decode 608 caption to webvtt.
> webvtt files contains cues with large timespan, such as playing music for
> two minutes. Since we are getting this till the end of caption, it is too
> late to display the caption. We are wondering if we could decode the
> captions to smaller cues, such as gop size. We have been playing the
> segment command switch with no luck. Could you shed light on this?
>
> How to reproduce:
> ffmpeg -data_field first -y -t 10 -f lavfi -i
> "movie='fallbeatcaptiontest.mp4':streams='v+a'[out0+subcc][out1]" -c:v
> rawvideo -f rawvideo -pix_fmt nv12 -c:a pcm_s16le -f s16le -ar 16000 -ac
> 1 -map a -map v -f nut base.nut -map 0:s -c:s webvtt -f webvtt cap-
> base.vtt
>

> Tried:
> ffmpeg -data_field first -y -t 10 -f lavfi -i
> "movie='fallbeatcaptiontest.mp4':streams='v+a'[out0+subcc][out1]" -c:v
> rawvideo -f rawvideo -pix_fmt nv12 -c:a pcm_s16le -f s16le -ar 16000 -ac
> 1 -map a -map v -f nut base.nut -map 0:s -f segment -segment_time 2
> -segment_format webvtt output_cue_%03d.vtt
>
> It generated multiple vtt files, but didn't chunk existing large cue to 2
> seconds as expected.

New description:

 Summary of the bug: We are using FFMPEG to decode 608 caption to webvtt.
 webvtt files contains cues with large timespan, such as playing music for
 two minutes. Since we are getting this till the end of caption, it is too
 late to display the caption. We are wondering if we could decode the
 captions to smaller cues, such as gop size. We have been playing the
 segment command switch with no luck. Could you shed light on this?

 How to reproduce:
 ffmpeg -data_field first -y -t 10 -f lavfi -i
 "movie='fallbeatcaptiontest.mp4':streams='v+a'[out0+subcc][out1]" -c:v
 rawvideo -f rawvideo -pix_fmt nv12 -c:a pcm_s16le -f s16le -ar 16000 -ac 1
 -map a -map v -f nut base.nut -map 0:s -c:s webvtt -f webvtt cap-base.vtt

 Tried:
 ffmpeg -data_field first -y -t 10 -f lavfi -i
 "movie='fallbeatcaptiontest.mp4':streams='v+a'[out0+subcc][out1]" -c:v
 rawvideo -f rawvideo -pix_fmt nv12 -c:a pcm_s16le -f s16le -ar 16000 -ac 1
 -map a -map v -f nut base.nut -map 0:s -f segment -segment_time 2
 -segment_format webvtt output_cue_%03d.vtt

 It generated multiple vtt files, but didn't chunk existing large cue to 2
 seconds as expected.

 chatgpt said segment muxer only applies to video and audio, not caption.
 Is that true?

--
-- 
Ticket URL: <https://trac.ffmpeg.org/ticket/10510#comment:3>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker