hezar.utils.hub_utils module

hezar.utils.hub_utils.clean_cache(cache_dir: str | None = None, delay: int = 10)[source]

Clean the whole cache directory of Hezar

Parameters:
  • cache_dir – Optionally provide the cache dir path or the default cache dir will be used otherwise.

  • delay – How many seconds to wait before performing the deletion action

hezar.utils.hub_utils.clone_repo(repo_id: str, save_path: str, **kwargs)[source]

Clone a repo on the hub to local directory

Parameters:
  • repo_id – Repo name or id

  • save_path – Path to clone the repo to

Returns:

the local path to the repo

hezar.utils.hub_utils.exists_in_cache(hub_path, repo_type='model')[source]
hezar.utils.hub_utils.exists_on_hub(hub_path: str, repo_type='model')[source]

Determine whether the repo exists on the hub or not

Parameters:
  • hub_path – Repo name or id

  • repo_type – Repo type like model, dataset, etc.

Returns:

True or False

hezar.utils.hub_utils.get_local_cache_path(repo_id, repo_type)[source]

Given the hub path and repo type, configure the local path to save everything e.g, ~/.hezar/models/<repo_name>

Parameters:
  • repo_id – Repo name or id

  • repo_type – Repo type e.g, model, dataset, etc

Returns:

Path to local cache directory

hezar.utils.hub_utils.get_state_dict_from_hub(hub_id, filename, subfolder=None)[source]

Load a state dict from a repo on the HF Hub. Works on any repo no matter the library.

Parameters:
  • hub_id – Path to repo id

  • filename – Weights file name

  • subfolder – Optional subfolder in the repo

Returns:

A PyTorch state dict obj

hezar.utils.hub_utils.list_repo_files(hub_or_local_path: str, subfolder: str | None = None, repo_type: str | RepoType = RepoType.MODEL)[source]

List all files in a Hub or local model repo

Parameters:
  • hub_or_local_path – Path to hub or local repo

  • subfolder – Optional subfolder path

  • repo_type – Repo type of either dataset or model

Returns:

A list of all file names