hezar.data.datasets.text_summarization_dataset module¶
- class hezar.data.datasets.text_summarization_dataset.TextSummarizationDataset(config: TextSummarizationDatasetConfig, split=None, preprocessor=None, **kwargs)[source]¶
Bases:
Dataset
A text summarization dataset class. As of now this class is intended for datasets existing on the Hub!
- Parameters:
config (TextSummarizationDatasetConfig) – Dataset config object.
split – Which split to use.
**kwargs – Extra config parameters to assign to the original config.
- class hezar.data.datasets.text_summarization_dataset.TextSummarizationDatasetConfig(path: str | None = None, task: TaskType = TaskType.TEXT_GENERATION, max_size: int | float | None = None, hf_load_kwargs: dict | None = None, prefix: str | None = None, text_field: str | None = None, summary_field: str | None = None, title_field: str | None = None, max_length: int | None = None, labels_max_length: int | None = None)[source]¶
Bases:
DatasetConfig
Configuration class for text summarization datasets.
- Parameters:
path (str) – Path to the dataset.
prefix (str) – Prefix for conditional generation.
text_field (str) – Field name for text in the dataset.
summary_field (str) – Field name for summary in the dataset.
title_field (str) – Field name for title in the dataset.
max_length (int) – Maximum length of text.
labels_max_length (int) – Maximum length of the target summary.
- labels_max_length: int = None¶
- max_length: int = None¶
- name: str = 'text_summarization'¶
- path: str = None¶
- prefix: str = None¶
- summary_field: str = None¶
- text_field: str = None¶
- title_field: str = None¶