hezar.data.datasets.image_captioning_dataset module¶ class hezar.data.datasets.image_captioning_dataset.ImageCaptioningDataset(config: ImageCaptioningDatasetConfig, split=None, **kwargs)[source]¶ Bases: Dataset required_backends: List[str | Backends] = [Backends.SCIKIT]¶ class hezar.data.datasets.image_captioning_dataset.ImageCaptioningDatasetConfig(task: TaskType = TaskType.IMAGE2TEXT, path: str | None = None, tokenizer_path: str | None = None, text_column: str = 'label', max_length: int | None = None, test_split_size: float = 0.2, image_processor_config: ImageProcessorConfig | None = None)[source]¶ Bases: DatasetConfig Configuration class for image captioning datasets. Parameters: path (str) – Path to the dataset. tokenizer_path (str) – Path to the tokenizer file. text_column (str) – Column name for text in the dataset. images_paths_column (str) – Column name for image paths in the dataset. max_length (int) – Maximum length of text. test_split_size (float) – Size of the test split. image_processor_config (ImageProcessorConfig) – Configuration for image processing. image_processor_config: ImageProcessorConfig = None¶ images_paths_column = 'image_path'¶ max_length: int = None¶ name: str = 'image_captioning'¶ path: str = None¶ task: TaskType = 'image2text'¶ test_split_size: float = 0.2¶ text_column: str = 'label'¶ tokenizer_path: str = None¶