hezar.data.datasets.text_classification_dataset module¶
- class hezar.data.datasets.text_classification_dataset.TextClassificationDataset(config: TextClassificationDatasetConfig, split=None, preprocessor=None, **kwargs)[source]¶
Bases:
Dataset
A text classification dataset class. As of now this class is intended for datasets existing on the Hub!
- Parameters:
config (TextClassificationDatasetConfig) – Dataset config object.
split – Which split to use.
preprocessor – Dataset’s preprocessor
**kwargs – Extra config parameters to assign to the original config.
- class hezar.data.datasets.text_classification_dataset.TextClassificationDatasetConfig(path: str | None = None, task: TaskType = TaskType.TEXT_CLASSIFICATION, max_size: int | float | None = None, hf_load_kwargs: dict | None = None, label_field: str | None = None, text_field: str | None = None, max_length: int | None = None)[source]¶
Bases:
DatasetConfig
Configuration class for text classification datasets.
- Parameters:
path (str) – Path to the dataset.
label_field (str) – Field name for labels in the dataset.
text_field (str) – Field name for text in the dataset.
max_length (int) – Maximum length of text.
- label_field: str = None¶
- max_length: int = None¶
- name: str = 'text_classification'¶
- path: str = None¶
- text_field: str = None¶