hezar.data.datasets.text_classification_dataset module

class hezar.data.datasets.text_classification_dataset.TextClassificationDataset(config: TextClassificationDatasetConfig, split=None, **kwargs)[source]

Bases: Dataset

A text classification dataset class. As of now this class is intended for datasets existing on the Hub!

Parameters:
  • config (TextClassificationDatasetConfig) – Dataset config object.

  • split – Which split to use.

  • **kwargs – Extra config parameters to assign to the original config.

class hezar.data.datasets.text_classification_dataset.TextClassificationDatasetConfig(task: TaskType = TaskType.TEXT_CLASSIFICATION, path: str | None = None, tokenizer_path: str | None = None, label_field: str | None = None, text_field: str | None = None, max_length: int | None = None)[source]

Bases: DatasetConfig

Configuration class for text classification datasets.

Parameters:
  • path (str) – Path to the dataset.

  • tokenizer_path (str) – Path to the tokenizer file.

  • label_field (str) – Field name for labels in the dataset.

  • text_field (str) – Field name for text in the dataset.

  • max_length (int) – Maximum length of text.

label_field: str = None
max_length: int = None
name: str = 'text_classification'
path: str = None
task: TaskType = 'text_classification'
text_field: str = None
tokenizer_path: str = None