hezar.data.datasets.speech_recognition_dataset module¶
- class hezar.data.datasets.speech_recognition_dataset.SpeechRecognitionDataset(config: SpeechRecognitionDatasetConfig, split=None, **kwargs)[source]¶
Bases:
Dataset
- class hezar.data.datasets.speech_recognition_dataset.SpeechRecognitionDatasetConfig(task: 'TaskType | List[TaskType]' = None, path: 'str' = None, feature_extractor_path: 'str' = None, tokenizer_path: 'str' = None, sampling_rate: 'int' = 16000, audio_array_padding_type: 'bool | str | PaddingType' = 'longest', max_audio_array_length: 'int' = None, labels_padding_type: 'bool | str | PaddingType' = 'longest', labels_max_length: 'int' = None, audio_file_path_column: 'str' = 'path', audio_column: 'str' = 'audio', audio_array_column: 'str' = 'array', transcript_column: 'str' = 'sentence')[source]¶
Bases:
DatasetConfig
- audio_array_column: str = 'array'¶
- audio_array_padding_type: bool | str | PaddingType = 'longest'¶
- audio_column: str = 'audio'¶
- audio_file_path_column: str = 'path'¶
- feature_extractor_path: str = None¶
- labels_max_length: int = None¶
- labels_padding_type: bool | str | PaddingType = 'longest'¶
- max_audio_array_length: int = None¶
- name: str = 'speech_recognition'¶
- path: str = None¶
- sampling_rate: int = 16000¶
- tokenizer_path: str = None¶
- transcript_column: str = 'sentence'¶