hezar.preprocessors.preprocessor module¶
- class hezar.preprocessors.preprocessor.Preprocessor(config: PreprocessorConfig, **kwargs)[source]¶
Bases:
object
Base class for all data preprocessors.
- Parameters:
config – Preprocessor properties
- classmethod load(hub_or_local_path, subfolder: str | None = None, force_return_dict: bool = False, cache_dir: str | None = None, **kwargs)[source]¶
Load a preprocessor or a pipeline of preprocessors from a local or Hub path. This method automatically detects any preprocessor in the path. If there’s only one preprocessor, returns it and if there are more, returns a dictionary of preprocessors.
This method must also be overridden by subclasses as it internally calls this method for every possible preprocessor found in the repo.
- Parameters:
hub_or_local_path – Path to hub or local repo
subfolder – Subfolder for the preprocessor.
force_return_dict – Whether to return a dict even if there’s only one preprocessor available on the repo
cache_dir – Path to cache directory
**kwargs – Extra kwargs
- Returns:
A Preprocessor subclass or a dict of Preprocessor subclass instances
- preprocessor_subfolder = 'preprocessor'¶
- class hezar.preprocessors.preprocessor.PreprocessorsContainer[source]¶
Bases:
OrderedDict
A class to hold the preprocessors by their name
- property audio_feature_extractor¶
Return audio feature extractor if available
- property image_processor¶
Return image processor if available
- push_to_hub(repo_id, subfolder=None, commit_message=None, private=None)[source]¶
Push every preprocessor item in the container
- property text_normalizer¶
Return text normalizer if available
- property tokenizer¶
Return tokenizer if available