hezar.data.datasets.text_summarization_dataset module¶

class hezar.data.datasets.text_summarization_dataset.TextSummarizationDataset(config: TextSummarizationDatasetConfig, split=None, preprocessor=None, **kwargs)[source]¶

Bases: Dataset

A text summarization dataset class. As of now this class is intended for datasets existing on the Hub!

Parameters:

config (TextSummarizationDatasetConfig) – Dataset config object.
split – Which split to use.
**kwargs – Extra config parameters to assign to the original config.

Bases: DatasetConfig

Configuration class for text summarization datasets.

Parameters:

path (str) – Path to the dataset.
prefix (str) – Prefix for conditional generation.
text_field (str) – Field name for text in the dataset.
summary_field (str) – Field name for summary in the dataset.
title_field (str) – Field name for title in the dataset.
max_length (int) – Maximum length of text.
labels_max_length (int) – Maximum length of the target summary.

labels_max_length: int = None¶

max_length: int = None¶

name: str = 'text_summarization'¶

path: str = None¶

prefix: str = None¶

summary_field: str = None¶

task: TaskType = 'text_generation'¶

text_field: str = None¶

title_field: str = None¶