mlx.data.features.mfsc#
- mlx.data.features.mfsc(n_filterbank, sampling_freq, frame_size_ms=25, frame_stride_ms=10, pre_emphasis_coeff=0.97, window_type=WindowType.Hamming, low_freq=0, high_freq=- 1, mel_floor=1.0, freq_scale=FrequencyScale.MEL, post_process=None)#
Returns a function that computes spectrogram features from audio in particular mel-frequency spectral coefficients (MFSCs).
Note
This feature extractor operates on mono audio provided as a 1D array. Meaning the input should have shape
(N,)
and not(N, 1)
. You may want to look intosqueeze()
to transform arrays of shape(N, 1)
to(N,)
.The featurization function
computes a sliding window of the input audio
applies a pre-emphasis filter
applies a windowing function
computes the power spectrum
computes triangular filterbank features
compute the log of the features
apply any post processing function that may be provided
The following example loads the librispeech dataset and computes MFSC features:
from mlx.data.datasets import load_librispeech from mlx.data.features import mfsc dset = ( load_librispeech() .squeeze("audio") .key_transform("audio", mfsc(80, 16000)) .to_stream() .prefetch(16, 8) .batch(16) .prefetch(2, 1) )
- Parameters:
n_filterbank (int) – How many frequency bands to use. This number will be the dimensionality of the resulting features.
sampling_freq (int) – The sampling frequency of the input audio in Hz.
frame_size_ms (int) – Each output feature will correspond to that many milliseconds of input audio. (default: 25)
frame_stride_ms (int) – Two consecutive features will correspond to audio windows that are that many milliseconds apart. (default: 10)
pre_emphasis_coeff (float) – Defines the free parameter of the FIR filter that does the pre-emphasis. (default: 0.97)
window_type (WindowType) – Defines the windowing function to use before computing the power spectrum. (default: WindowType.Hamming)
low_freq (int) – The lowest frequency to use when creating the frequency bands. Simply put, signal power in lower frequencies is ignored. (default: 0)
high_freq (int) – The highest frequency to use when creating the frequency bands. Simply put, signal power in higher frequencies is ignored. If set to -1 then
sampling_freq // 2
is used. (default: -1)mel_floor (float) – The minimum power collected in the filterbanks. Even though the name is
mel_floor
this applies even when using different frequency scales. (default: 1.0)freq_scale (FrequencyScale) – The frequency scale to use when computing the frequency bands for the filterbanks. (default: FrequencyScale.MEL)
post_process (callable, optional) – An optional callable to post process the MFSC features. (default: None)