hezar.models.speech_recognition.whisper.whisper_feature_extractor module

class hezar.models.speech_recognition.whisper.whisper_feature_extractor.WhisperFeatureExtractor(config: WhisperFeatureExtractorConfig, **kwargs)[source]

Bases: AudioFeatureExtractor

A feature extractor for Whisper model.

This feature extractor inherits from AudioFeatureExtractor which contains most of the main methods.

This class extracts mel-filter bank features from raw speech using a custom numpy implementation of the Short Time Fourier Transform which should match pytorch’s torch.stft equivalent.

static zero_mean_unit_var_norm(input_values: List[ndarray], attention_mask: List[ndarray], padding_value: float = 0.0) List[ndarray][source]

Every array in the list is normalized to have zero mean and unit variance

class hezar.models.speech_recognition.whisper.whisper_feature_extractor.WhisperFeatureExtractorConfig(feature_size: 'int' = 80, sampling_rate: 'int' = 16000, padding: 'str' = 'longest', padding_value: 'float' = 0.0, padding_side: 'str' = 'right', hop_length: 'int' = 160, chunk_length: 'int' = 30, n_fft: 'int' = 400, return_attention_mask: 'bool' = False)[source]

Bases: AudioFeatureExtractorConfig

chunk_length: int = 30
feature_size: int = 80
hop_length: int = 160
n_fft: int = 400
name: str = 'whisper_feature_extractor'
padding: str = 'longest'
padding_side: str = 'right'
padding_value: float = 0.0
return_attention_mask: bool = False
sampling_rate: int = 16000