hezar.models.speech_recognition.whisper.whisper_feature_extractor module¶
- class hezar.models.speech_recognition.whisper.whisper_feature_extractor.WhisperFeatureExtractor(config: WhisperFeatureExtractorConfig, **kwargs)[source]¶
Bases:
AudioFeatureExtractor
A feature extractor for Whisper model.
This feature extractor inherits from AudioFeatureExtractor which contains most of the main methods.
This class extracts mel-filter bank features from raw speech using a custom numpy implementation of the Short Time Fourier Transform which should match pytorch’s torch.stft equivalent.
- class hezar.models.speech_recognition.whisper.whisper_feature_extractor.WhisperFeatureExtractorConfig(feature_size: 'int' = 80, sampling_rate: 'int' = 16000, padding: 'str' = 'longest', padding_value: 'float' = 0.0, padding_side: 'str' = 'right', hop_length: 'int' = 160, chunk_length: 'int' = 30, n_fft: 'int' = 400, return_attention_mask: 'bool' = False)[source]¶
Bases:
AudioFeatureExtractorConfig
- chunk_length: int = 30¶
- feature_size: int = 80¶
- hop_length: int = 160¶
- n_fft: int = 400¶
- name: str = 'whisper_feature_extractor'¶
- padding: str = 'longest'¶
- padding_side: str = 'right'¶
- padding_value: float = 0.0¶
- return_attention_mask: bool = False¶
- sampling_rate: int = 16000¶