hezar.models.speech_recognition.whisper.whisper_feature_extractor module¶

class hezar.models.speech_recognition.whisper.whisper_feature_extractor.WhisperFeatureExtractor(config: WhisperFeatureExtractorConfig, **kwargs)[source]¶

Bases: AudioFeatureExtractor

A feature extractor for Whisper model.

This feature extractor inherits from AudioFeatureExtractor which contains most of the main methods.

This class extracts mel-filter bank features from raw speech using a custom numpy implementation of the Short Time Fourier Transform which should match pytorch’s torch.stft equivalent.

static zero_mean_unit_var_norm(input_values: List[ndarray], attention_mask: List[ndarray], padding_value: float = 0.0) → List[ndarray][source]¶: Every array in the list is normalized to have zero mean and unit variance

class hezar.models.speech_recognition.whisper.whisper_feature_extractor.WhisperFeatureExtractorConfig(feature_size: 'int' = 80, sampling_rate: 'int' = 16000, padding: 'str' = 'longest', padding_value: 'float' = 0.0, padding_side: 'str' = 'right', hop_length: 'int' = 160, chunk_length: 'int' = 30, n_fft: 'int' = 400, return_attention_mask: 'bool' = False)[source]¶

Bases: AudioFeatureExtractorConfig

chunk_length: int = 30¶

feature_size: int = 80¶

hop_length: int = 160¶

n_fft: int = 400¶

name: str = 'whisper_feature_extractor'¶

padding: str = 'longest'¶

padding_side: str = 'right'¶

padding_value: float = 0.0¶

return_attention_mask: bool = False¶

sampling_rate: int = 16000¶