hezar.trainer.trainer module

class hezar.trainer.trainer.Trainer(model: Model, config: TrainerConfig, train_dataset: Dataset, eval_dataset: Dataset | None = None, data_collator: Callable | None = None, preprocessor: Preprocessor | PreprocessorsContainer | None = None, metrics_handler: MetricsHandler | None = None, optimizer: Optimizer | None = None, lr_scheduler=None, accelerator: Accelerator | None = None)[source]

Bases: object

Base trainer class for training all Hezar models and all tasks. Usually you can use this class as-is, but for special cases you can also override any of the core methods in your own custom Trainer.

Parameters:
  • model (Model | torch.nn.Module) – The main model to train and evaluate

  • config (TrainerConfig) – Training configuration and parameters

  • train_dataset (Dataset) – Train dataset

  • eval_dataset (Dataset) – Evaluation dataset

  • data_collator – Collate function, usually included in the dataset object itself

  • preprocessor – Preprocessor object(s)

  • metrics_handler – Optional metrics handler

  • optimizer (optim.Optimizer) – Model optimizer

  • lr_scheduler – Optional learning-rate scheduler

  • accelerator (Accelerator) – Accelerator object for a customized distributed environment

compute_loss(model_outputs: Dict, labels: Tensor, **kwargs) Tensor[source]

Compute loss from model outputs

Parameters:
  • model_outputs – Logits from model outputs

  • labels – Ground truth labels

Returns:

The loss tensor

create_eval_dataloader(dataset) DataLoader[source]

Create eval data loader using a ranged sampler that can handle slicing data, shuffling, etc.

create_train_dataloader(dataset) DataLoader[source]

Create train data loader using a ranged sampler that can handle slicing data, shuffling, etc.

dataset_config_file = 'dataset_config.yaml'
default_lr_scheduler = None
default_optimizer = 'adam'
evaluate(eval_dataset: Dataset | None = None)[source]

Evaluates the model on the whole eval dataset and verbose live metric values in the progress bar

Parameters:

eval_dataset – Any sized iterable like a Hezar Dataset, HuggingFace Dataset, Torch Dataset, etc.

Returns:

A dictionary of evaluation results computed by the metrics tracker

evaluation_step(input_batch: Dict[str, Tensor]) Dict[str, Any][source]

Evaluate one batch of data and return loss and model outputs

Parameters:

input_batch – A batch of inputs to evaluate

Returns:

Evaluation step outputs including loss, logits, etc.

forward(input_batch)[source]

Perform model forward on the input batch

In special cases, one can override this method in their desired trainer.

Parameters:

input_batch – Input batch

Returns:

Model outputs

inner_training_loop(epoch_num: int)[source]

Train the model for one epoch on the whole train dataset and verbose live metric values in the progress bar

Parameters:

epoch_num – Number of the current epoch

Returns:

Metrics averages through the full iteration

load_csv_logs(logs_dir=None)[source]

Load the CSV log file :param logs_dir: Path to logs directory, defaults to self.config.logs_dir

Returns:

Logs dictionary

log(logs: Dict[str, Any], step: int)[source]

Log metrics results

lr_scheduler_file = 'lr_scheduler.pt'
optimization_step()[source]

Perform optimization step

optimizer_file = 'optimizer.pt'
prepare_input_batch(input_batch) Dict[str, Tensor][source]

Every operation required to prepare the inputs for model forward like moving to device, permutations, etc.

Parameters:

input_batch – Raw input batch from the dataloader

Returns:

The proper input batch required by model forward

print_info()[source]

Print training info

push_to_hub(repo_id: str, config_filename: str | None = None, push_model: bool = True, push_optimizer: bool = True, push_logs: bool = True, model_filename: str | None = None, model_config_filename: str | None = None, optimizer_filename: str | None = None, subfolder: str | None = None, dataset_config_filename: str | None = None, commit_message: str | None = None, private: bool = False)[source]

Push everything to the Hub

Parameters:
  • repo_id – Path to hub

  • config_filename – Trainer config file name

  • push_model – Whether to push the model

  • push_optimizer – Whether to push the optimizer

  • push_logs – Whether to push training logs

  • model_filename – Model file name

  • optimizer_filename – Optimizer file name

  • model_config_filename – Model config file name

  • subfolder – Path to Trainer files

  • dataset_config_filename – Dataset config file name

  • commit_message – Commit message for the push

  • private – Whether to create a private repo if it doesn’t exist already

save(path: str, config_filename: str | None = None, model_filename: str | None = None, model_config_filename: str | None = None, subfolder: str | None = None, dataset_config_file: str | None = None, optimizer_file: str | None = None, lr_scheduler_file: str | None = None)[source]

Save the trainer and relevant files to a path.

Files to save are train config, model weights, model config, preprocessor files and preprocessor config.

Parameters:
  • path – A directory to save everything

  • config_filename – Config file name

  • model_filename – Model file name

  • model_config_filename – Model config file name

  • subfolder – Optional sub-folder

  • dataset_config_file – Dataset config file name

  • optimizer_file – Optimizer state file name

  • lr_scheduler_file – LR scheduler file name

train()[source]

The full training process like training, evaluation, logging and saving model checkpoints.

The steps are as follows:
The following is run for self.config.num_epochs times
  • Run the training loop on the train dataset

  • Save checkpoints

  • Run evaluation on the evaluation dataset

  • Apply LR scheduling of a LR Scheduler is available

  • Gather all metrics outputs

  • Save the trainer state

  • Write logs to tensorboard, csv, etc.

trainer_config_file = 'train_config.yaml'
trainer_csv_log_file = 'training_logs.csv'
trainer_state_file = 'trainer_state.yaml'
trainer_subfolder = 'train'
training_step(input_batch: Dict[str, Tensor]) Dict[str, Any][source]

Train one batch of data and return loss and model outputs

Parameters:

input_batch – A batch of inputs to train

Returns:

Train step outputs including loss, logits, etc.