10.1.6.3. engine¶

Engine of the main train loop in HAT.

10.1.6.3.1. engine¶

`trainer.Trainer`	Trainer is a tool for train, which include all pipeline for training.
`ddp_trainer.DistributedDataParallelTrainer`	DistributedDataParallelTrainer tool.
`calibrator.Calibrator`	Calibrator is a tool for calibration.
`dp_trainer.DataParallelTrainer`	DataParallelTrainer is a tool function to new a Trainer instance.
`predictor.Predictor`	Predictor is a tool for predict.

10.1.6.3.1.1. processors¶

`processor.BasicBatchProcessor`	Processor dealing with (inputs, target) batch, and the model output is a (losses, preds) pair.
`processor.MultiBatchProcessor`	Processor can forward backward multiple batches within a training step (before optimizer.step()).
`processor.MultiStageBatchProcessor`	Supports multiple stage backward.

10.1.6.3.2. API Reference¶

class hat.engine.trainer.Trainer(model: torch.nn.modules.module.Module, data_loader: Iterable, optimizer: torch.optim.optimizer.Optimizer, batch_processor, device: Optional[int], model_convert_pipeline: Optional[Union[Dict, List]] = None, resume_optimizer: bool = False, resume_epoch_or_step: bool = False, resume_dataloader: bool = False, stop_by: Optional[str] = 'epoch', num_epochs: Optional[int] = None, start_epoch: Optional[int] = 0, num_steps: Optional[int] = None, start_step: Optional[int] = 0, callbacks: Optional[Sequence[Union[dict, hat.callbacks.callbacks.CallbackMixin]]] = None, train_metrics: Optional[dict] = None, val_metrics: Optional[dict] = None, profiler: Optional[dict] = None, log_interval: int = 0, compiler: Optional[Dict] = None)¶

Trainer is a tool for train, which include all pipeline for training.

参数

model – Model config or a nn.Module instance.
data_loader – Training data loader config or a instantiated data loader.
optimizer – Optimizer config or a optimizer instance.
batch_processor – Batch processor config or a BatchProcessorMixin instance.
device – Int gpu id or None. If int, do model.cuda(device) and data.cuda(device). If None, no-op.
model_convert_pipeline – Define the process of model convert. e.g. convert float model to qat model, convert qat model to quantize model.
resume_optimizer – whether load optimizer dict when resume checkpoint.
resume_epoch_or_step – whether need to resume epoch_or_step when resume checkpoint.
resume_dataloader – whether to resume dataloader index. Only effective when stop_by==’step’.
stop_by – Stop training by counting epoch or step. If equal to ‘epoch’, stop training when epoch_id == num_epochs - 1. If equal to ‘step’, stop training when global_step_id == num_steps - 1. Default ‘epoch’.
num_epochs – Num of training epochs, should be non-negative integer. If stop_by != ‘epoch’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.
start_epoch – Training start epoch, should be non-negative integer.
num_steps – Num of training steps, should be non-negative integer. If stop_by != ‘step’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.
start_step – Training start step, should be non-negative integer.
callbacks – Callback configs or instances.
train_metrics – Metrics on training data.
val_metrics – Metrics on validation data.
profiler – To profile individual steps during training and assist in identifying bottlenecks.
log_interval – Logging output frequency.
compiler – Converter of torch.compile.

class hat.engine.ddp_trainer.DistributedDataParallelTrainer(model: torch.nn.modules.module.Module, data_loader: Iterable, optimizer: torch.optim.optimizer.Optimizer, batch_processor: hat.engine.processors.processor.BatchProcessorMixin, device: int, stop_by: Optional[str] = 'epoch', num_epochs: Optional[int] = None, start_epoch: Optional[int] = 0, num_steps: Optional[int] = None, start_step: Optional[int] = 0, callbacks: Optional[Sequence[hat.callbacks.callbacks.CallbackMixin]] = None, sync_bn: Optional[bool] = False, sync_bn_by_host: Optional[bool] = False, train_metrics: Optional[dict] = None, val_metrics: Optional[dict] = None, profiler: Optional[dict] = None, task_sampler: Optional[hat.core.task_sampler.TaskSampler] = None, convert_submodule_list: Optional[List[str]] = None, find_unused_parameters: Optional[bool] = True, assign_module_buffers: Optional[bool] = True, compiler: Optional[Dict] = None, **kwargs)¶

DistributedDataParallelTrainer tool.

DistributedDataParallelTrainer is a tool function to new a Trainer instance, which training with DistributedDataParallel method, and running on one of the GPU devices.

It can be launched by launch function below, which spawns multiple processes and each of it owns an independent Trainer.

By setting stop_by, you are able to stop training by counting epoch (default) or step.

参数

model – Model config or a nn.Module instance.
data_loader – Training data loader config or a instantiated data loader.
optimizer – Optimizer config or a optimizer instance.
batch_processor – Batch processor config or a BatchProcessorMixin instance.
device – GPU id.
stop_by – Stop training by counting epoch or step. If equal to ‘epoch’, stop training when epoch_id == num_epochs - 1. If equal to ‘step’, stop training when global_step_id == num_steps - 1. Default ‘epoch’.
num_epochs – Num of training epochs, should be non-negative integer. If stop_by != ‘epoch’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.
start_epoch – Training start epoch, should be non-negative integer.
num_steps – Num of training steps, should be non-negative integer. If stop_by != ‘step’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.
start_step – Training start step, should be non-negative integer.
callbacks – Callback configs or instances.
sync_bn – Whether to convert bn to sync_bn.
sync_bn_by_host – Whether sync bn within host node
train_metrics – Metrics on training data.
val_metrics – Metrics on validation data.
profiler – To profile individual steps during training and assist in identifying bottlenecks.
task_sampler – TaskSampler config for multitask training.
convert_submodule_list – List of submodule for converting DDP.
assign_module_buffers – Whether reassign modules buffers. For details, see: https://horizonrobotics.feishu.cn/wiki/wikcn5j6CW9MVemYwnl1n349JMf#GAyJdS # noqa: E501
find_unused_parameters – Args of DistributedDataParallel module.
compiler – Converter of torch.compile.

class hat.engine.calibrator.Calibrator(model: torch.nn.modules.module.Module, data_loader: Iterable, batch_processor: hat.engine.processors.processor.BatchProcessorMixin, device: Optional[int] = None, model_convert_pipeline: Optional[Union[Dict, List]] = None, num_steps: Optional[int] = None, callbacks: Optional[Sequence[Union[dict, hat.callbacks.callbacks.CallbackMixin]]] = None, val_metrics: Optional[dict] = None, profiler: Optional[dict] = None, log_interval: int = 0, auto_calibration: bool = False, auto_calibration_config: Optional[dict] = None, weight_reconstruction: bool = False, weight_reconstruction_config: Optional[dict] = None, compiler: Optional[Dict] = None)¶

Calibrator is a tool for calibration.

The abundant callbacks in trainer is also supported.

参数

model – nn.Module instance.
data_loader – Validation data loader.
batch_processor – Batch processor config.
device – Int gpu id or None.
model_convert_pipeline – Define the process of model convert. e.g. convert float model to qat model, convert qat model to quantize model.
num_steps – Num of calibration steps, should be non-negative integer.
callbacks – Callbacks.
val_metrics – Metrics on validation data.
profiler – To profile individual steps during training and assist in identifying bottlenecks.
log_interval – Logging output frequency.
auto_calibration – Whether to enable auto calibration to search optimal observer automatically.

auto_calibration_config –

Custom config for auto calibration. Default keys are:

auto_calibration_config = {
    # candidate observers to search
    "observer_list": ["percentile", "mse", "kl", "min_max"],

    # candidate parameters for percentile observer
    percentile_list: [99.995],

    # whether to load data to device in advance.
    # greatly boost performance at cost of
    # more memory occupation
    "preload_data": False,

}

weight_reconstruction – Whether to enable weight reconstruction after calibration.

weight_reconstruction_config –

Custom config for weight reconstruction. Default keys are:

weight_reconstruction_config = {
    # whether to load data to device in advance.
    # greatly boost performance at cost of
    # more memory occupation
    "preload_data": False,

}

class hat.engine.dp_trainer.DataParallelTrainer(model: torch.nn.modules.module.Module, data_loader: Iterable, optimizer: torch.optim.optimizer.Optimizer, batch_processor: hat.engine.processors.processor.BatchProcessorMixin, device: Union[int, Sequence[int]], stop_by: Optional[str] = 'epoch', num_epochs: Optional[int] = None, start_epoch: Optional[int] = 0, num_steps: Optional[int] = None, start_step: Optional[int] = 0, callbacks: Optional[Sequence[Union[dict, hat.callbacks.callbacks.CallbackMixin]]] = None, train_metrics: Optional[dict] = None, val_metrics: Optional[dict] = None, profiler: Optional[dict] = None, compiler: Optional[dict] = None, **kwargs)¶

DataParallelTrainer is a tool function to new a Trainer instance.

which training with DataParallel method, and running on multiple gpu devices.

It can be launched by launch function below.

By setting stop_by, you are able to stop training by counting epoch (default) or step.

参数

model – Model config or a nn.Module instance.
data_loader – Training data loader config or a instantiated data loader.
optimizer – Optimizer config or a optimizer instance.
batch_processor – Batch processor config or a BatchProcessorMixin instance.
device – GPU ids.
stop_by – Stop training by counting epoch or step. If equal to ‘epoch’, stop training when epoch_id == num_epochs - 1. If equal to ‘step’, stop training when global_step_id == num_steps - 1. Default ‘epoch’.
num_epochs – Num of training epochs, should be non-negative integer. If stop_by != ‘epoch’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.
start_epoch – Training start epoch, should be non-negative integer.
num_steps – Num of training steps, should be non-negative integer. If stop_by != ‘step’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.
start_step – Training start step, should be non-negative integer.
callbacks – Callback configs or instances.
train_metrics – Metrics on training data.
val_metrics – Metrics on validation data.
profiler – To profile individual steps during training and assist in identifying bottlenecks.
compiler – Converter of torch.compile.

class hat.engine.predictor.Predictor(model: torch.nn.modules.module.Module, data_loader: Iterable, batch_processor: hat.engine.processors.processor.BatchProcessorMixin, device: Optional[int] = None, num_epochs: int = 1, model_convert_pipeline: Optional[Union[Dict, List]] = None, callbacks: Optional[Sequence[Union[dict, hat.callbacks.callbacks.CallbackMixin]]] = None, metrics: Optional[dict] = None, profiler: Optional[dict] = None, log_interval: int = 0, share_callbacks: bool = True, compiler: Optional[dict] = None, **kwargs)¶

Predictor is a tool for predict.

The abundant callbacks in trainer is also supported.

Predictor supports to launch multi-process on single gpu. Predictor supports multi dataloaders.

参数

model – nn.Module instance.
data_loader – Validation data loader.
batch_processor – Batch processor config.
model_convert_pipeline – Define the process of model convert. e.g. convert float model to qat model, convert qat model to quantize model.
callbacks – Callbacks.
metrics – Metrics on predict data.
profiler – To profile individual steps during predicting and assist in identifying bottlenecks.
log_interval – Logging output frequency.
share_callbacks – Whether to share callbacks on different dataloader.
compiler – Converter of torch.compile.

fit()¶

Do model fitting on data from data_loader.

self.batch_processor responsible for model forward, loss backward and parameters update.

self.callbacks responsible for metric update, checkpoint, logging and so on.

class hat.engine.processors.processor.BasicBatchProcessor(need_grad_update: bool, batch_transforms: Optional[List] = None, inverse_transforms: Optional[List] = None, loss_collector: Optional[Callable] = None, enable_amp: bool = False, enable_amp_dtype: torch.dtype = torch.float16, enable_apex: bool = False, enable_channels_last: bool = False, channels_last_keys: Optional[Sequence[str]] = None, grad_scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler] = None, grad_accumulation_step: int = 1)¶

Processor dealing with (inputs, target) batch, and the model output is a (losses, preds) pair.

It is suitable for training (need_grad_update) or validation (not need_grad_update).

参数

need_grad_update – Whether need gradient update, True for training, False for Validation.
batch_transforms – Config of batch transforms.
inverse_transforms – Config of transforms, used for infer results transform inverse.
loss_collector – A callable object used to collect loss Tensors in model outputs.
enable_amp – Whether training with Automatic Mixed Precision.
enable_amp_dtype – The dtype of amp, float16 or bfloat16.
enabel_apex – Whether training with Apex.
enable_channels_last – Whether to use channel_last memory_format.
channels_last_keys – Keys in batch need to convert to channels_last. if None, all 4d-tensor in batch data will convert to channels_last.
grad_scaler – An instance scaler of GradScaler helps perform the steps of gradient scaling conveniently.
grad_accumulation_step – The step of grad accumulation. Gradient accumulation refers to multiple backwards passes are performed before updating the parameters. The goal is to update the model’s parameters based on different steps, instead of performing an update after every single batch.

class hat.engine.processors.processor.MultiBatchProcessor(need_grad_update: bool, batch_transforms: Optional[List] = None, inverse_transforms: Optional[List] = None, loss_collector: Optional[Callable] = None, enable_amp: bool = False, enable_amp_dtype: torch.dtype = torch.float16, enable_apex: bool = False, enable_channels_last: bool = False, channels_last_keys: Optional[Sequence[str]] = None, delay_sync: bool = False, empty_cache: bool = False, grad_scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler] = None, grad_accumulation_step: Union[int, str] = 1)¶

Processor can forward backward multiple batches within a training step (before optimizer.step()).

It is useful for:

(1) Training a multitask model on single task annotation samples, of which each task forward backward its batch sequentially within a multitask training step

(2) Training on a memory shortage GPU and want to increase batch size, you are able to forward backward multiple batches within a training step

注解

Example multitask: vehicle, person and traffic light detection. Single task annotation means only annotate vehicle bounding boxes on an image with vehicle, person, and traffic light objects.

注解

Multiple batches should be organized in tuple format, e.g.

batch = (batch1, batch2, …)

If not, it will be treated as a single batch, e.g.

batch = dict(inputs=xx, target=xx)
batch = [inputs, target]

See code below for extra explanation.

It is much general in usage than BasicBatchProcessor , batch and model outputs can be in any format, but note that if batch is a tuple means it contains multiple batches.

It is Hardware independent, run on cpu (device None) or gpu (device is gpu id).

It is suitable for training (need_grad_update) and validation (not need_grad_update).

参数

need_grad_update – Whether need gradient update, True for training, False for Validation.
batch_transforms – Config of batch transforms.
inverse_transforms – Config of transforms, used for infer results transform inverse.
loss_collector – A callable object used to collect loss Tensors in model outputs.
enable_amp – Whether training with Automatic Mixed Precision.
enable_amp_dtype – The dtype of amp, float16 or bfloat16.
enabel_apex – Whether training with Apex.
enable_channels_last – Whether training with channels_last.
channels_last_keys – Keys in batch need to convert to channels_last. if None, all 4d-tensor in batch data will convert to channels_last.
delay_sync – Whether delay sync grad when train on DDP. Refer to: DDP.no_sync() API
empty_cache – Whether to execute torch.cuda.empty_cache() after each forward and backward run
grad_scaler – An instance scaler of GradScaler helps perform the steps of gradient scaling conveniently.
grad_accumulation_step – The step of grad accumulation. Gradient accumulation refers to multiple backwards passes are performed before updating the parameters. The goal is to update the model’s parameters based on different steps, instead of performing an update after every single batch.

class hat.engine.processors.processor.MultiStageBatchProcessor(need_grad_update: bool, batch_transforms: Optional[List] = None, inverse_transforms: Optional[List] = None, loss_collector: Optional[Callable] = None, enable_amp: bool = False, enable_amp_dtype: torch.dtype = torch.float16, enable_apex: bool = False, enable_channels_last: bool = False, channels_last_keys: Optional[Sequence[str]] = None, delay_sync: bool = False, empty_cache: bool = False, grad_scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler] = None, split_node_name: Optional[str] = None, grad_accumulation_step: Union[int, str] = 1)¶

Supports multiple stage backward.

It is a memory saving processor by forward-backward each split task individually, if more than one task is trained in a single step.

参数

need_grad_update – Whether need gradient update, True for training, False for Validation.
batch_transforms – Config of batch transforms.
inverse_transforms – Config of transforms, used for infer results transform inverse.
loss_collector – A callable object used to collect loss Tensors in model outputs.
enable_amp – Whether training with Automatic Mixed Precision.
enabel_apex – Whether training with Apex.
enable_channels_last – Whether training with channels_last.
channels_last_keys – Keys in batch need to convert to channels_last. if None, all 4d-tensor in batch data will convert to channels_last.
delay_sync – Whther delay sync grad when train on DDP. Refer to: DDP.no_sync() API
grad_scaler – An instance scaler of GradScaler helps perform the steps of gradient scaling conveniently.
grad_accumulation_step – The frequence of grad accumulation. Gradient accumulation refers to multiple backwards passes are performed before updating the parameters. The goal is to update the model’s parameters based on different steps, instead of performing an update after every single batch.