hezar.trainer.trainer module¶
- class hezar.trainer.trainer.Trainer(model: Model, config: TrainerConfig, train_dataset: Dataset, eval_dataset: Dataset | None = None, data_collator: Callable | None = None, preprocessor: Preprocessor | PreprocessorsContainer | None = None, metrics_handler: MetricsHandler | None = None, optimizer: Optimizer | None = None, lr_scheduler=None, accelerator: Accelerator | None = None)[source]¶
Bases:
object
Base trainer class for training all Hezar models and all tasks. Usually you can use this class as-is, but for special cases you can also override any of the core methods in your own custom Trainer.
- Parameters:
model (Model | torch.nn.Module) – The main model to train and evaluate
config (TrainerConfig) – Training configuration and parameters
train_dataset (Dataset) – Train dataset
eval_dataset (Dataset) – Evaluation dataset
data_collator – Collate function, usually included in the dataset object itself
preprocessor – Preprocessor object(s)
metrics_handler – Optional metrics handler
optimizer (optim.Optimizer) – Model optimizer
lr_scheduler – Optional learning-rate scheduler
accelerator (Accelerator) – Accelerator object for a customized distributed environment
- compute_loss(model_outputs: Dict, labels: Tensor, **kwargs) Tensor [source]¶
Compute loss from model outputs
- Parameters:
model_outputs – Logits from model outputs
labels – Ground truth labels
- Returns:
The loss tensor
- create_eval_dataloader(dataset) DataLoader [source]¶
Create eval data loader using a ranged sampler that can handle slicing data, shuffling, etc.
- create_train_dataloader(dataset) DataLoader [source]¶
Create train data loader using a ranged sampler that can handle slicing data, shuffling, etc.
- dataset_config_file = 'dataset_config.yaml'¶
- default_lr_scheduler = None¶
- default_optimizer = 'adam'¶
- evaluate(eval_dataset: Dataset | None = None)[source]¶
Evaluates the model on the whole eval dataset and verbose live metric values in the progress bar
- Parameters:
eval_dataset – Any sized iterable like a Hezar Dataset, HuggingFace Dataset, Torch Dataset, etc.
- Returns:
A dictionary of evaluation results computed by the metrics tracker
- evaluation_step(input_batch: Dict[str, Tensor]) Dict[str, Any] [source]¶
Evaluate one batch of data and return loss and model outputs
- Parameters:
input_batch – A batch of inputs to evaluate
- Returns:
Evaluation step outputs including loss, logits, etc.
- forward(input_batch)[source]¶
Perform model forward on the input batch
In special cases, one can override this method in their desired trainer.
- Parameters:
input_batch – Input batch
- Returns:
Model outputs
- inner_training_loop(epoch_num: int)[source]¶
Train the model for one epoch on the whole train dataset and verbose live metric values in the progress bar
- Parameters:
epoch_num – Number of the current epoch
- Returns:
Metrics averages through the full iteration
- load_csv_logs(logs_dir=None)[source]¶
Load the CSV log file :param logs_dir: Path to logs directory, defaults to self.config.logs_dir
- Returns:
Logs dictionary
- lr_scheduler_file = 'lr_scheduler.pt'¶
- optimizer_file = 'optimizer.pt'¶
- prepare_input_batch(input_batch) Dict[str, Tensor] [source]¶
Every operation required to prepare the inputs for model forward like moving to device, permutations, etc.
- Parameters:
input_batch – Raw input batch from the dataloader
- Returns:
The proper input batch required by model forward
- push_to_hub(repo_id: str, config_filename: str | None = None, push_model: bool = True, push_optimizer: bool = True, push_logs: bool = True, model_filename: str | None = None, model_config_filename: str | None = None, optimizer_filename: str | None = None, subfolder: str | None = None, dataset_config_filename: str | None = None, commit_message: str | None = None, private: bool = False)[source]¶
Push everything to the Hub
- Parameters:
repo_id – Path to hub
config_filename – Trainer config file name
push_model – Whether to push the model
push_optimizer – Whether to push the optimizer
push_logs – Whether to push training logs
model_filename – Model file name
optimizer_filename – Optimizer file name
model_config_filename – Model config file name
subfolder – Path to Trainer files
dataset_config_filename – Dataset config file name
commit_message – Commit message for the push
private – Whether to create a private repo if it doesn’t exist already
- save(path: str, config_filename: str | None = None, model_filename: str | None = None, model_config_filename: str | None = None, subfolder: str | None = None, dataset_config_file: str | None = None, optimizer_file: str | None = None, lr_scheduler_file: str | None = None)[source]¶
Save the trainer and relevant files to a path.
Files to save are train config, model weights, model config, preprocessor files and preprocessor config.
- Parameters:
path – A directory to save everything
config_filename – Config file name
model_filename – Model file name
model_config_filename – Model config file name
subfolder – Optional sub-folder
dataset_config_file – Dataset config file name
optimizer_file – Optimizer state file name
lr_scheduler_file – LR scheduler file name
- train()[source]¶
The full training process like training, evaluation, logging and saving model checkpoints.
- The steps are as follows:
- The following is run for self.config.num_epochs times
Run the training loop on the train dataset
Save checkpoints
Run evaluation on the evaluation dataset
Apply LR scheduling of a LR Scheduler is available
Gather all metrics outputs
Save the trainer state
Write logs to tensorboard, csv, etc.
- trainer_config_file = 'train_config.yaml'¶
- trainer_csv_log_file = 'training_logs.csv'¶
- trainer_state_file = 'trainer_state.yaml'¶
- trainer_subfolder = 'train'¶