hezar.data.datasets.dataset module¶
- class hezar.data.datasets.dataset.Dataset(config: DatasetConfig, split=None, **kwargs)[source]¶
Bases:
Dataset
Base class for all datasets in Hezar.
- Parameters:
config – The configuration object for the dataset.
**kwargs – Additional keyword arguments.
- config_filename¶
Default dataset config file name.
- Type:
str
- cache_dir¶
Default cache directory for the dataset.
- Type:
str
- cache_dir = '/home/runner/.cache/hezar/datasets'¶
- config_filename = 'dataset_config.yaml'¶
- classmethod load(hub_path: str | os.PathLike, config: DatasetConfig = None, config_filename: str | None = None, split: str | SplitType | None = None, cache_dir: str = None, **kwargs) Dataset [source]¶
Load the dataset from a hub path.
- Parameters:
hub_path (str | os.PathLike) – Path to dataset from hub or locally.
config – (DatasetConfig): A config object to ignore the config in the repo or in case the repo has no dataset_config.yaml file
config_filename (Optional[str]) – Dataset config file name. Falls back to dataset_config.yaml if not given.
split (Optional[str | SplitType]) – Dataset split, defaults to “train”.
cache_dir (str) – Path to cache directory, defaults to Hezar’s cache directory
**kwargs – Config parameters as keyword arguments.
- Returns:
An instance of the loaded dataset.
- Return type: