hezar.data.datasets package¶
Submodules¶
- hezar.data.datasets.dataset module
- hezar.data.datasets.image_captioning_dataset module
ImageCaptioningDataset
ImageCaptioningDatasetConfig
ImageCaptioningDatasetConfig.image_processor_config
ImageCaptioningDatasetConfig.images_paths_column
ImageCaptioningDatasetConfig.max_length
ImageCaptioningDatasetConfig.name
ImageCaptioningDatasetConfig.path
ImageCaptioningDatasetConfig.task
ImageCaptioningDatasetConfig.test_split_size
ImageCaptioningDatasetConfig.text_column
ImageCaptioningDatasetConfig.tokenizer_path
- hezar.data.datasets.ocr_dataset module
OCRDataset
OCRDatasetConfig
OCRDatasetConfig.id2label
OCRDatasetConfig.image_processor_config
OCRDatasetConfig.images_paths_column
OCRDatasetConfig.invalid_characters
OCRDatasetConfig.max_length
OCRDatasetConfig.name
OCRDatasetConfig.path
OCRDatasetConfig.reverse_digits
OCRDatasetConfig.reverse_text
OCRDatasetConfig.task
OCRDatasetConfig.text_column
OCRDatasetConfig.text_split_type
OCRDatasetConfig.tokenizer_path
TextSplitType
- hezar.data.datasets.sequence_labeling_dataset module
SequenceLabelingDataset
SequenceLabelingDatasetConfig
SequenceLabelingDatasetConfig.ignore_index
SequenceLabelingDatasetConfig.is_iob_schema
SequenceLabelingDatasetConfig.label_all_tokens
SequenceLabelingDatasetConfig.max_length
SequenceLabelingDatasetConfig.name
SequenceLabelingDatasetConfig.path
SequenceLabelingDatasetConfig.tags_field
SequenceLabelingDatasetConfig.task
SequenceLabelingDatasetConfig.tokenizer_path
SequenceLabelingDatasetConfig.tokens_field
- hezar.data.datasets.speech_recognition_dataset module
SpeechRecognitionDataset
SpeechRecognitionDatasetConfig
SpeechRecognitionDatasetConfig.audio_array_column
SpeechRecognitionDatasetConfig.audio_array_padding_type
SpeechRecognitionDatasetConfig.audio_column
SpeechRecognitionDatasetConfig.audio_file_path_column
SpeechRecognitionDatasetConfig.feature_extractor_path
SpeechRecognitionDatasetConfig.labels_max_length
SpeechRecognitionDatasetConfig.labels_padding_type
SpeechRecognitionDatasetConfig.max_audio_array_length
SpeechRecognitionDatasetConfig.name
SpeechRecognitionDatasetConfig.path
SpeechRecognitionDatasetConfig.sampling_rate
SpeechRecognitionDatasetConfig.task
SpeechRecognitionDatasetConfig.tokenizer_path
SpeechRecognitionDatasetConfig.transcript_column
- hezar.data.datasets.text_classification_dataset module
- hezar.data.datasets.text_summarization_dataset module
TextSummarizationDataset
TextSummarizationDatasetConfig
TextSummarizationDatasetConfig.max_length
TextSummarizationDatasetConfig.max_target_length
TextSummarizationDatasetConfig.name
TextSummarizationDatasetConfig.path
TextSummarizationDatasetConfig.prefix
TextSummarizationDatasetConfig.summary_field
TextSummarizationDatasetConfig.task
TextSummarizationDatasetConfig.text_field
TextSummarizationDatasetConfig.title_field
TextSummarizationDatasetConfig.tokenizer_path