Ticket #1854 (new enhancement)
libswscale has unreasonable alignment constraints
| Reported by: | gjdfgh | Owned by: | |
|---|---|---|---|
| Priority: | wish | Component: | swscale |
| Version: | git-master | Keywords: | |
| Cc: | Blocked By: | ||
| Blocking: | Reproduced by developer: | no | |
| Analyzed by developer: | no |
Description
First of, there's a difference between alignment of single pixels, and alignment of the image start pointer. It's reasonable to demand strict alignment constraints on single pixels, e.g. that a pixels of size 4 should be aligned to 4.
But having a 16 byte alignment on a pixel start pointer is not reasonable. For example, this doesn't allow passing cropped images to libswscale. (Unless the images are cropped in a way they satisfy the alignment constraints again.)
Rather than fully reverting to the slow path in this case, libswscale should use the slow path only for the first N unaligned pixels, until an aligned pixel is reached, and then continue with an accelerated SIMD code path.
Change History
comment:1 Changed 7 months ago by cehoyos
- Priority changed from normal to wish
- Version changed from unspecified to git-master
comment:2 Changed 7 months ago by compn
do you have a benchmark with your code or ffmpeg that shows the difference in time between a slow path and the simd path?
bonus points for a sample file which shows this problem.
comment:3 Changed 2 months ago by michael
swscale does not revert to "the slow path" on misaligned data, it should just be slower because some operations are slower on misaligned data and some codepathes are not possible on misaligned data because the CPU instructions require aligned data.
Please provide some specific examples where there are problems, like unreasonable slowdown, crashes (which would be a bug) or other. This feature request as such is hard to implement as its to vague.
The suggestion about mixing unaligned and aligned is generally not possible as source & destination can have different alignmnet and even if they match the data tables then would likely not.



If this is reproducible with ffmpeg, please provide an example command line together with complete, uncut console output to make this a valid ticket.