[FFmpeg-devel] AVCHD/H.264 decoder: further development/corrections

Sun Feb 1 00:09:47 CET 2009

On Sun, Jan 25, 2009 at 08:08:06PM +0100, Ivan Schreter wrote:
[...]
> I identified the following problems and potential solutions:
> 
>    1. Inconsistency between packets returned via av_read_frame() and
>       actually delivered full frames from avcodec_decode_video()
>    2. Key frame calculation and seeking
>    3. Reporting frame type to libavformat
> 
> Now the details:
> 
> *1. Inconsistency between packets and decoded frames*
> 
> H.264 decoder returns AVPackets via av_read_frame(), which contain 
> either a full frame or just a field (half frame). The former case is not 
> problematic, since decoded frames are 1:1 to returned packets. It is 
> problematic, though, when the decoder returns packets, which DO NOT 
> correspond to a full frame. This is the case of interlaced AVCHD video 
> as produced by various full-HD camcorders (at least Panasonic, Sony and 
> Canon). H.264 standard allows namely coding by field, so one picture in 
> H.264 terms (as currently returned as AVPacket from av_read_frame()) can 
> contain either a single field, two fields (frame) or even repeated 
> fields (so in total 1-3 fields per AVPacket).
> 
> I'd concentrate first on H.264 pictures having 1 to 2 fields only, since 
> the other case (3 fields per picture) is probably not that interesting 
> now (it is used to quasi stretch FPS from original cinema material to 
> television frame rates).
> 

> Although the decoder itself takes this into account, the interface in 
> libavformat doesn't. Thus, currently only video having full frames per 
> packet decodes really correctly (and this also only with not-yet-applied 
> patch concerning frame types). Reason: av_read_frame() doesn't return 
> whole frames, although it is documented so.

"decoding" of fields and even field/frame mixes works perfectly, and bitexact
you can try the reference bitstreams ...
what doesnt work is the timestamps and these cause the user apps o drop and
duplicate "randomly"

> 
> *Potential solution:* For field pictures, delay returning a packet from 
> h264_parse(), until the second field picture is also read. The decoder 
> should take then care of decoding both fields correctly and returning a 
> full frame for each packet.

as mentioned in another mail this has its problems sadly

> 
> *Alternative solution:* Return field packet from h264_parse() 
> immediately, but somehow tell libavformat that the packet does not 
> represent a full frame and second field has to be read as well. Read it 
> in libavformat, extending the existing packet. Thus, av_read_frame() 
> returns then full frame.

you might want to look at
svn di -r12162:12161

[...]
> Now the question: Which solution is the "right" one? I'd go for the 
> first one or possibly for the alternative. The first proposed solution 
> seems to be most "compatible", since we don't need to extend AVPacket to 
> address the issue.
> 
> Your opinions? Or eventually a different idea?

The avparser for h264 should take the input timestamps frm the demuxer
decode all the relevant SEIs and headers and return the correctly
"interpolated" timestamps.

[...]
> Further, I'd propose keeping a small cache of (PTS, position, 
> convergence_duration) triples for frames containing SEI recovery point 
> message, so the seeking around "current" location would be faster. 
> Reason: video editing software, where we often need to seek one frame 
> forward/backward.

see AVIndexEntry

> 
> Your opinions/suggestions?
> 
> *3. Reporting frame type to libavformat*
> 
> This is a minor thing, but still important for correct computation of 
> PTS/DTS and key frame flags. compute_pkt_fields() relies on having the 
> information about picture type (I/P/B-frame). However, H.264 doesn't 
> have strict I/P/B frames, there is even a possibility to have mixed-type 
> slices inside of one frame. Indeed, my camcorder produces in interlaced 
> mode top field as I-slice and bottom field as P-slice referring to the 
> top field.
> 

> So my suggestion is, report picture type I-frame for key frames (which 
> are key frames is discussed above) and report P-frame for all frames 
> containing only P- and I- slices. Other frames containing also B-slices 
> will be reported as B-frames.

this is technically correct i agree, but because it takes time and the
information is effectively useless, there is no relation beteen pict_type
and timestamps ...
we can take a shortcut and just use the type of the first slice

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090201/2d046c5a/attachment.pgp>