6.1.5.3. hat.engine

Engine of the main train loop in HAT.

6.1.5.3.1. Engine

build_launcher

LoopBase

LoopBase controls the data flow from data_loader to model, including model forward, loss backward and parameters update.

Predictor

Predictor is a tool for predict.

Calibrator

Calibrator is a tool for calibration.

Trainer

Trainer is a tool for train, which include all pipeline for training.

DistributedDataParallelTrainer

DistributedDataParallelTrainer tool.

DataParallelTrainer

DataParallelTrainer is a tool function to new a Trainer instance.

6.1.5.3.1.1. processors

BatchProcessorMixin

Batch Processor Interface.

BasicBatchProcessor

Processor dealing with (inputs, target) batch, and the model output is a (losses, preds) pair.

MultiBatchProcessor

Processor can forward backward multiple batches within a training step (before optimizer.step()).

collect_loss_by_index

Collect loss by specific indexes of loss Tensors in model outputs like: (losses, preds), (...loss1, ...loss2, ...) and so on.

collect_loss_by_regex

Flatten model outputs into an OrderedDict, then using re regex to match the keys of loss Tensors.

6.1.5.3.2. API Reference

class hat.engine.Calibrator(model: torch.nn.modules.module.Module, data_loader: Iterable, batch_processor: hat.engine.processors.processor.BatchProcessorMixin, device: Optional[int] = None, model_convert_pipeline: Optional[Union[Dict, List]] = None, num_steps: Optional[int] = None, callbacks: Optional[Sequence[Union[dict, hat.callbacks.callbacks.CallbackMixin]]] = None, val_metrics: Optional[dict] = None, profiler: Optional[dict] = None, log_interval: int = 0, **kwargs)

Calibrator is a tool for calibration.

The abundant callbacks in trainer is also supported.

参数
  • modelnn.Module instance.

  • data_loader – Validation data loader.

  • batch_processor – Batch processor config.

  • device – Int gpu id or None.

  • model_convert_pipeline – Define the process of model convert. e.g. convert float model to qat model, convert qat model to quantize model.

  • num_steps – Num of calibration steps, should be non-negative integer.

  • callbacks – Callbacks.

  • val_metrics – Metrics on validation data.

  • profiler – To profile individual steps during training and assist in identifying bottlenecks.

  • log_interval – Logging output frequency.

class hat.engine.DataParallelTrainer(model: torch.nn.modules.module.Module, data_loader: Iterable, optimizer: torch.optim.optimizer.Optimizer, batch_processor: hat.engine.processors.processor.BatchProcessorMixin, device: Union[int, Sequence[int]], stop_by: Optional[str] = 'epoch', num_epochs: Optional[int] = None, start_epoch: Optional[int] = 0, num_steps: Optional[int] = None, start_step: Optional[int] = 0, callbacks: Optional[Sequence[Union[dict, hat.callbacks.callbacks.CallbackMixin]]] = None, train_metrics: Optional[dict] = None, val_metrics: Optional[dict] = None, profiler: Optional[dict] = None, **kwargs)

DataParallelTrainer is a tool function to new a Trainer instance.

which training with DataParallel method, and running on multiple gpu devices.

It can be launched by launch function below.

By setting stop_by, you are able to stop training by counting epoch (default) or step.

参数
  • model – Model config or a nn.Module instance.

  • data_loader – Training data loader config or a instantiated data loader.

  • optimizer – Optimizer config or a optimizer instance.

  • batch_processor – Batch processor config or a BatchProcessorMixin instance.

  • device – GPU ids.

  • stop_by – Stop training by counting epoch or step. If equal to ‘epoch’, stop training when epoch_id == num_epochs - 1. If equal to ‘step’, stop training when global_step_id == num_steps - 1. Default ‘epoch’.

  • num_epochs – Num of training epochs, should be non-negative integer. If stop_by != ‘epoch’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.

  • start_epoch – Training start epoch, should be non-negative integer.

  • num_steps – Num of training steps, should be non-negative integer. If stop_by != ‘step’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.

  • start_step – Training start step, should be non-negative integer.

  • callbacks – Callback configs or instances.

  • train_metrics – Metrics on training data.

  • val_metrics – Metrics on validation data.

  • profiler – To profile individual steps during training and assist in identifying bottlenecks.

class hat.engine.DistributedDataParallelTrainer(model: torch.nn.modules.module.Module, data_loader: Iterable, optimizer: torch.optim.optimizer.Optimizer, batch_processor: hat.engine.processors.processor.BatchProcessorMixin, device: int, stop_by: Optional[str] = 'epoch', num_epochs: Optional[int] = None, start_epoch: Optional[int] = 0, num_steps: Optional[int] = None, start_step: Optional[int] = 0, callbacks: Optional[Sequence[hat.callbacks.callbacks.CallbackMixin]] = None, sync_bn: Optional[bool] = False, sync_bn_by_host: Optional[bool] = False, train_metrics: Optional[dict] = None, val_metrics: Optional[dict] = None, profiler: Optional[dict] = None, task_sampler: Optional[hat.core.task_sampler.TaskSampler] = None, convert_submodule_list: Optional[List[str]] = None, find_unused_parameters: Optional[bool] = True, assign_module_buffers: Optional[bool] = True, **kwargs)

DistributedDataParallelTrainer tool.

DistributedDataParallelTrainer is a tool function to new a Trainer instance, which training with DistributedDataParallel method, and running on one of the GPU devices.

It can be launched by launch function below, which spawns multiple processes and each of it owns an independent Trainer.

By setting stop_by, you are able to stop training by counting epoch (default) or step.

参数
  • model – Model config or a nn.Module instance.

  • data_loader – Training data loader config or a instantiated data loader.

  • optimizer – Optimizer config or a optimizer instance.

  • batch_processor – Batch processor config or a BatchProcessorMixin instance.

  • device – GPU id.

  • stop_by – Stop training by counting epoch or step. If equal to ‘epoch’, stop training when epoch_id == num_epochs - 1. If equal to ‘step’, stop training when global_step_id == num_steps - 1. Default ‘epoch’.

  • num_epochs – Num of training epochs, should be non-negative integer. If stop_by != ‘epoch’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.

  • start_epoch – Training start epoch, should be non-negative integer.

  • num_steps – Num of training steps, should be non-negative integer. If stop_by != ‘step’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.

  • start_step – Training start step, should be non-negative integer.

  • callbacks – Callback configs or instances.

  • sync_bn – Whether to convert bn to sync_bn.

  • sync_bn_by_host – Whether sync bn within host node

  • train_metrics – Metrics on training data.

  • val_metrics – Metrics on validation data.

  • profiler – To profile individual steps during training and assist in identifying bottlenecks.

  • task_sampler – TaskSampler config for multitask training.

  • convert_submodule_list – List of submodule for converting DDP.

  • assign_module_buffers – Whether reassign modules buffers. For details, see: https://horizonrobotics.feishu.cn/wiki/wikcn5j6CW9MVemYwnl1n349JMf#GAyJdS # noqa: E501

  • find_unused_parameters – Args of DistributedDataParallel module.

class hat.engine.LoopBase(model: torch.nn.modules.module.Module, data_loader: Iterable, optimizer: torch.optim.optimizer.Optimizer, batch_processor: hat.engine.processors.processor.BatchProcessorMixin, device: Optional[int], model_convert_pipeline: Optional[Union[Dict, List]] = None, resume_optimizer: bool = False, resume_epoch_or_step: bool = False, stop_by: Optional[str] = 'epoch', num_epochs: Optional[int] = None, start_epoch: Optional[int] = 0, num_steps: Optional[int] = None, start_step: Optional[int] = 0, callbacks: Optional[Sequence[Union[dict, hat.callbacks.callbacks.CallbackMixin]]] = None, train_metrics: Optional[dict] = None, val_metrics: Optional[dict] = None, profiler: Optional[dict] = None, log_interval: int = 0)

LoopBase controls the data flow from data_loader to model, including model forward, loss backward and parameters update.

It is hardware independent, run on cpu (device is None) or gpu (device is int gpu id).

By setting stop_by, you are able to stop loop by counting epoch (default) or step.

参数
  • model – Model config or a nn.Module instance.

  • data_loader – Training data loader config or a instantiated data loader.

  • optimizer – Optimizer config or a optimizer instance.

  • batch_processor – Batch processor config or a BatchProcessorMixin instance.

  • device – Int gpu id or None. If int, do model.cuda(device) and data.cuda(device). If None, no-op.

  • model_convert_pipeline – Define the process of model convert. e.g. convert float model to qat model, convert qat model to quantize model.

  • resume_optimizer – whether load optimizer dict when resume checkpoint.

  • resume_epoch_or_step – whether need to resume epoch_or_step when resume checkpoint.

  • stop_by – Stop loop by counting epoch or step. If equal to ‘epoch’, stop loop when epoch_id == num_epochs - 1. If equal to ‘step’, stop loop when global_step_id == num_steps - 1. Default ‘epoch’.

  • num_epochs – Num of loop epochs, should be non-negative integer. If stop_by != ‘epoch’, no-op. Set 0 to skip loop epochs and run self.on_*_loop_begin/end only.

  • start_epoch – Training start epoch, should be non-negative integer.

  • num_steps – Num of loop steps, should be non-negative integer. If stop_by != ‘step’, no-op. Set 0 to skip loop steps and run self.on_*_loop_begin/end only.

  • start_step – Training start step, should be non-negative integer.

  • callbacks – Callback configs or instances.

  • train_metrics – Metrics on training data.

  • val_metrics – Metrics on validation data.

  • profiler – To profile individual steps during loop and assist in identifying bottlenecks.

  • log_interval – Logging output frequency.

fit()

Do model fitting on data from data_loader.

self.batch_processor responsible for model forward, loss backward and parameters update.

self.callbacks responsible for metric update, checkpoint, logging and so on.

class hat.engine.Predictor(model: torch.nn.modules.module.Module, data_loader: Iterable, batch_processor: hat.engine.processors.processor.BatchProcessorMixin, device: Optional[int] = None, num_epochs: int = 1, model_convert_pipeline: Optional[Union[Dict, List]] = None, callbacks: Optional[Sequence[Union[dict, hat.callbacks.callbacks.CallbackMixin]]] = None, metrics: Optional[dict] = None, profiler: Optional[dict] = None, log_interval: int = 0, share_callbacks: bool = True)

Predictor is a tool for predict.

The abundant callbacks in trainer is also supported.

Predictor supports to launch multi-process on single gpu. Predictor supports multi dataloaders.

参数
  • modelnn.Module instance.

  • data_loader – Validation data loader.

  • batch_processor – Batch processor config.

  • model_convert_pipeline – Define the process of model convert. e.g. convert float model to qat model, convert qat model to quantize model.

  • callbacks – Callbacks.

  • metrics – Metrics on predict data.

  • profiler – To profile individual steps during predicting and assist in identifying bottlenecks.

  • log_interval – Logging output frequency.

  • share_callbacks – Whether to share callbacks on different dataloader.

fit()

Do model fitting on data from data_loader.

self.batch_processor responsible for model forward, loss backward and parameters update.

self.callbacks responsible for metric update, checkpoint, logging and so on.

class hat.engine.Trainer(model: torch.nn.modules.module.Module, data_loader: Iterable, optimizer: torch.optim.optimizer.Optimizer, batch_processor, device: Optional[int], model_convert_pipeline: Optional[Union[Dict, List]] = None, resume_optimizer: bool = False, resume_epoch_or_step: bool = False, stop_by: Optional[str] = 'epoch', num_epochs: Optional[int] = None, start_epoch: Optional[int] = 0, num_steps: Optional[int] = None, start_step: Optional[int] = 0, callbacks: Optional[Sequence[Union[dict, hat.callbacks.callbacks.CallbackMixin]]] = None, train_metrics: Optional[dict] = None, val_metrics: Optional[dict] = None, profiler: Optional[dict] = None, log_interval: int = 0)

Trainer is a tool for train, which include all pipeline for training.

参数
  • model – Model config or a nn.Module instance.

  • data_loader – Training data loader config or a instantiated data loader.

  • optimizer – Optimizer config or a optimizer instance.

  • batch_processor – Batch processor config or a BatchProcessorMixin instance.

  • device – Int gpu id or None. If int, do model.cuda(device) and data.cuda(device). If None, no-op.

  • model_convert_pipeline – Define the process of model convert. e.g. convert float model to qat model, convert qat model to quantize model.

  • resume_optimizer – whether load optimizer dict when resume checkpoint.

  • resume_epoch_or_step – whether need to resume epoch_or_step when resume checkpoint.

  • stop_by – Stop training by counting epoch or step. If equal to ‘epoch’, stop training when epoch_id == num_epochs - 1. If equal to ‘step’, stop training when global_step_id == num_steps - 1. Default ‘epoch’.

  • num_epochs – Num of training epochs, should be non-negative integer. If stop_by != ‘epoch’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.

  • start_epoch – Training start epoch, should be non-negative integer.

  • num_steps – Num of training steps, should be non-negative integer. If stop_by != ‘step’, no-op. Set 0 to skip training and run self.on_loop_begin/end only.

  • start_step – Training start step, should be non-negative integer.

  • callbacks – Callback configs or instances.

  • train_metrics – Metrics on training data.

  • val_metrics – Metrics on validation data.

  • profiler – To profile individual steps during training and assist in identifying bottlenecks.

  • log_interval – Logging output frequency.

class hat.engine.processors.BasicBatchProcessor(need_grad_update: bool, batch_transforms: Optional[List] = None, inverse_transforms: Optional[List] = None, loss_collector: Optional[Callable] = None, enable_amp: bool = False, enable_amp_dtype: torch.dtype = torch.float16, enable_apex: bool = False, enable_channels_last: bool = False, channels_last_keys: Optional[Sequence[str]] = None, grad_scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler] = None, grad_accumulation_step: int = 1)

Processor dealing with (inputs, target) batch, and the model output is a (losses, preds) pair.

It is suitable for training (need_grad_update) or validation (not need_grad_update).

参数
  • need_grad_update – Whether need gradient update, True for training, False for Validation.

  • batch_transforms – Config of batch transforms.

  • inverse_transforms – Config of transforms, used for infer results transform inverse.

  • loss_collector – A callable object used to collect loss Tensors in model outputs.

  • enable_amp – Whether training with Automatic Mixed Precision.

  • enable_amp_dtype – The dtype of amp, float16 or bfloat16.

  • enabel_apex – Whether training with Apex.

  • enable_channels_last – Whether to use channel_last memory_format.

  • channels_last_keys – Keys in batch need to convert to channels_last. if None, all 4d-tensor in batch data will convert to channels_last.

  • grad_scaler – An instance scaler of GradScaler helps perform the steps of gradient scaling conveniently.

  • grad_accumulation_step – The step of grad accumulation. Gradient accumulation refers to multiple backwards passes are performed before updating the parameters. The goal is to update the model’s parameters based on different steps, instead of performing an update after every single batch.

class hat.engine.processors.BatchProcessorMixin

Batch Processor Interface.

class hat.engine.processors.MultiBatchProcessor(need_grad_update: bool, batch_transforms: Optional[List] = None, inverse_transforms: Optional[List] = None, loss_collector: Optional[Callable] = None, enable_amp: bool = False, enable_amp_dtype: torch.dtype = torch.float16, enable_apex: bool = False, enable_channels_last: bool = False, channels_last_keys: Optional[Sequence[str]] = None, delay_sync: bool = False, empty_cache: bool = False, grad_scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler] = None, grad_accumulation_step: Union[int, str] = 1)

Processor can forward backward multiple batches within a training step (before optimizer.step()).

It is useful for:

(1) Training a multitask model on single task annotation samples, of which each task forward backward its batch sequentially within a multitask training step

(2) Training on a memory shortage GPU and want to increase batch size, you are able to forward backward multiple batches within a training step

注解

Example multitask: vehicle, person and traffic light detection. Single task annotation means only annotate vehicle bounding boxes on an image with vehicle, person, and traffic light objects.

注解

Multiple batches should be organized in tuple format, e.g.

  • batch = (batch1, batch2, …)

If not, it will be treated as a single batch, e.g.

  • batch = dict(inputs=xx, target=xx)

  • batch = [inputs, target]

See code below for extra explanation.

It is much general in usage than BasicBatchProcessor , batch and model outputs can be in any format, but note that if batch is a tuple means it contains multiple batches.

It is Hardware independent, run on cpu (device None) or gpu (device is gpu id).

It is suitable for training (need_grad_update) and validation (not need_grad_update).

参数
  • need_grad_update – Whether need gradient update, True for training, False for Validation.

  • batch_transforms – Config of batch transforms.

  • inverse_transforms – Config of transforms, used for infer results transform inverse.

  • loss_collector – A callable object used to collect loss Tensors in model outputs.

  • enable_amp – Whether training with Automatic Mixed Precision.

  • enable_amp_dtype – The dtype of amp, float16 or bfloat16.

  • enabel_apex – Whether training with Apex.

  • enable_channels_last – Whether training with channels_last.

  • channels_last_keys – Keys in batch need to convert to channels_last. if None, all 4d-tensor in batch data will convert to channels_last.

  • delay_sync – Whether delay sync grad when train on DDP. Refer to: DDP.no_sync() API

  • empty_cache – Whether to execute torch.cuda.empty_cache() after each forward and backward run

  • grad_scaler – An instance scaler of GradScaler helps perform the steps of gradient scaling conveniently.

  • grad_accumulation_step – The step of grad accumulation. Gradient accumulation refers to multiple backwards passes are performed before updating the parameters. The goal is to update the model’s parameters based on different steps, instead of performing an update after every single batch.

hat.engine.processors.collect_loss_by_index(indexes: Union[int, Sequence[int]]) Callable

Collect loss by specific indexes of loss Tensors in model outputs like: (losses, preds), (…loss1, …loss2, …) and so on.

参数

indexes – Indexes of loss Tensors in model outputs.

返回

A function with model outputs as input, return loss Tensors collected by indexes.

Examples:

>>> model_outs = [
...     [torch.tensor(1.0), torch.tensor(2.0)],  # losses
...     [torch.tensor(3.0), torch.tensor(4.0)]   # preds
... ]
>>> collector = collect_loss_by_index(0)
>>> collector(model_outs)
[tensor(1.), tensor(2.)]
hat.engine.processors.collect_loss_by_regex(loss_name_pattern: str) Callable

Flatten model outputs into an OrderedDict, then using re regex to match the keys of loss Tensors.

参数

loss_name_patternre regex, e.g. ‘^.*loss.*’ .

返回

A function with model outputs as input, return loss Tensors matched by loss_name_pattern.

Example:

>>> model_outs = dict(
...     toy_loss_1=torch.tensor(1.0),
...     toy_predict=torch.tensor(2.0),
...     toy_loss_2=torch.tensor(3.0),
... )
>>> collector = collect_loss_by_regex('^.*loss.*')
>>> collector(model_outs)
[tensor(1.), tensor(3.)]