hezar.preprocessors.preprocessor module¶
- class hezar.preprocessors.preprocessor.Preprocessor(config: PreprocessorConfig, **kwargs)[source]¶
- Bases: - object- Base class for all data preprocessors. - Parameters:
- config – Preprocessor properties 
 - classmethod load(hub_or_local_path, subfolder: str | None = None, force_return_dict: bool = False, cache_dir: str | None = None, **kwargs)[source]¶
- Load a preprocessor or a pipeline of preprocessors from a local or Hub path. This method automatically detects any preprocessor in the path. If there’s only one preprocessor, returns it and if there are more, returns a dictionary of preprocessors. - This method must also be overridden by subclasses as it internally calls this method for every possible preprocessor found in the repo. - Parameters:
- hub_or_local_path – Path to hub or local repo 
- subfolder – Subfolder for the preprocessor. 
- force_return_dict – Whether to return a dict even if there’s only one preprocessor available on the repo 
- cache_dir – Path to cache directory 
- **kwargs – Extra kwargs 
 
- Returns:
- A Preprocessor subclass or a dict of Preprocessor subclass instances 
 
 - preprocessor_subfolder = 'preprocessor'¶
 
- class hezar.preprocessors.preprocessor.PreprocessorsContainer[source]¶
- Bases: - OrderedDict- A class to hold the preprocessors by their name - property audio_feature_extractor¶
- Return audio feature extractor if available 
 - property image_processor¶
- Return image processor if available 
 - push_to_hub(repo_id, subfolder=None, commit_message=None, private=None)[source]¶
- Push every preprocessor item in the container 
 - property text_normalizer¶
- Return text normalizer if available 
 - property tokenizer¶
- Return tokenizer if available