[Libav-user] What data type should frame byte stream in m4a file be converted to?
gaobo_work at yeah.net
Wed Jul 28 04:06:28 EEST 2021
I am a newbie in audio processing research and using pydub module in python3 to manipulate audio files in “m4a” format. It is ok for me to read the original m4a files with pydub at the beginning, but after a few steps (such as VAD and data augmentation operation) of operations, i am unable to read out frames in the produced m4a files as numpy.ndarray and receive errors shown below:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/pydub/audio_segment.py", line 272, in get_array_of_samples
array_type_override = self.array_type
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/pydub/audio_segment.py", line 277, in array_type
return get_array_type(self.sample_width * 8)
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/pydub/utils.py", line 43, in get_array_type
t = ARRAY_TYPES[bit_depth]
It is wierd that all m4a files, no matte original inputs or final outputs, can be successfully opened in audio applications and produce reasonable sounds in speakers. By further investigating the problem, i notice that the frame in final outputs are bytes in the size of 8, while that in the original inputs are bytes in the size of 2.
When original input and final output files are both opened with audacity, both as displayed as “mono 16000Hz 32-bit float”. Since frames in the size of 2bytes is unable to be interpreted as 32bit-float, I guess 32bit-float is the result of the normalization operation in Audacity.
My Question is for frame in bytes size of 2, 4, 8, what data type should (in numpy) should it be converted to?
And does any guru knows normalization operation employed in audacity?
Thanks a lot!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Libav-user