6.1.5.2. hat.callbacks

Callbacks widely used while training in HAT.

6.1.5.2.1. Callbacks

CallbackMixin

Callback interface class.

Checkpoint

Checkpoint Callback is used for saving model after training and resume model before training as the same times.

CosLrUpdater

Lr Updater Callback for adjusting lr with warmup and cos decay.

CyclicLrUpdater

Lr Updater Callback for adjusting lr with OneCycle Updater.

PolyLrUpdater

Reduce the learning rate according to a polynomial of given power.

StepDecayLrUpdater

Lr Updater Callback for adjusting lr with warmup and decay.

OneCycleUpdater

Lr Updater Callback for adjusting lr with OneCycle Updater.

NoamLrUpdater

Noam LR Updater.

CosineAnnealingLrUpdater

Lr Updater Callback for adjusting lr with warmup and cos decay by epoch.

SaveTraced

SaveTraced is used to trace a model and save it to a file.

MetricUpdater

Callback used to reset, update, logging metrics.

StatsMonitor

StatsMonitor Callback is used for some monitors of training including epoch time, batch time and so on.

FreezeModule

Freeze module parameter while training.

FuseBN

Fuse batchnorm layer in float training.

TensorBoard

TensorBoard Callback is used for recording somethings durning training, such as loss, image and other visualization.

Validation

Callbacks of validation.

MeanTeacherValidation

ExponentialMovingAverage

GradScale

Set gradient scale for different module.

FreezeBNStatistics

Freeze BatchNorm module every epoch while training.

6.1.5.2.2. API Reference

class hat.callbacks.CallbackMixin

Callback interface class.

on_batch_begin(**kwargs)

There may be multiple batches in a multitask training step.

on_batch_end(**kwargs)

There may be multiple batches in a multitask training step.

class hat.callbacks.Checkpoint(save_dir: str, name_prefix: Optional[str] = '', save_interval: Optional[int] = 1, interval_by: Optional[str] = 'epoch', save_on_train_end: Optional[bool] = True, strict_match: Optional[bool] = False, mode: Optional[str] = None, monitor_metric_key: Optional[str] = None, best_refer_metric: Optional[Union[dict, hat.metrics.metric.EvalMetric]] = None, task_sampler=None, save_hash: bool = True)

Checkpoint Callback is used for saving model after training and resume model before training as the same times.

参数
  • save_dir – Directory to save checkpoints.

  • name_prefix – Checkpoint name prefix.

  • save_interval – Save checkpoint every save_interval epoch or step.

  • interval_by – Set save_interval unit to step or epoch. Default is epoch.

  • save_on_train_end – Whether save checkpoint when on_loop_end is triggered.

  • strict_match – Whether to strictly enforce that the keys in model.state_dict() (train model) match the keys in test_model.state_dict(). Default: False

  • mode – State of monitor for saving model.

  • monitor_metric_key – Monitor metric for saving best checkpoint.

  • best_refer_metric – Metric that evaluate which epoch is the best.

  • save_hash – Whether to save the hash value to the name of the Checkpoint file. Default is True.

class hat.callbacks.CosLrUpdater(max_steps: int = - 1, stop_lr: float = 0.0, warmup_by: Optional[str] = 'step', warmup_len: Optional[int] = 0, warmup_mode: Optional[str] = 'linear', warmup_begin_lr: Optional[float] = 0.0, warmup_lr_ratio: Optional[float] = 1.0, step_log_interval: Optional[int] = 1)

Lr Updater Callback for adjusting lr with warmup and cos decay.

参数
  • max_steps – the formal training steps you want set. If it is None, max_steps = num_epochs * self.step_per_epoch - self.warmup_steps

  • stop_lr – the lr of last epoch/step.

get_lr(begin_lr: float, num_update: int)

Calculate formal training lr for each step or epoch.

参数
  • begin_lr – Beginning lr of formal training, commonly equal to optimizer’s initial lr.

  • num_update – Current num of lr updates.

on_loop_begin(optimizer, data_loader, num_epochs, **kwargs)

Prepare some vars for lr updater.

class hat.callbacks.CosineAnnealingLrUpdater(stop_lr: float = 0.0, warmup_by: Optional[str] = 'step', warmup_len: Optional[int] = 0, warmup_mode: Optional[str] = 'linear', warmup_begin_lr: Optional[float] = 0.0, warmup_lr_ratio: Optional[float] = 1.0, step_log_interval: Optional[int] = 1)

Lr Updater Callback for adjusting lr with warmup and cos decay by epoch.

参数
  • stop_lr – the lr of last epoch.

  • warmup_by – Among warmup training, update lr on ‘step’ begin or on ‘epoch’ begin, similar to update_by. Default ‘step’.

  • warmup_len – Num of warmup steps or epochs. If warmup_by==’step’, it means warmup steps. If warmup_by==’epoch’, it means warmup epochs.

  • warmup_mode – Type of warmup used. It can be ‘constant’, ‘linear’ now.

  • warmup_begin_lr – Beginning lr used to calculated warmup lr. If warmup_mode is ‘constant’, no-op.

  • warmup_lr_ratio – Used to calculate warmup ending lr though two steps: (1) Achieve begging lr (init_lr) of formal training from optimizer. (2) warmup_end_lr = init_lr * warmup_lr_ratio.

  • step_log_interval – lr logging interval on step begin, only work when warmup_by == ‘step’. If warmup_by == ‘epoch’, logging lr on each epoch begin.

get_lr(begin_lr: float, num_update: int)

Calculate new lr after warmup according to the decay epoch.

参数
  • begin_lr – Beginning lr of formal training, commonly equal to optimizer’s initial lr.

  • num_update – Current epochs of lr updates.

返回

learning rate.

返回类型

lr

on_loop_begin(optimizer, data_loader, num_epochs, **kwargs)

Prepare some vars for lr updater.

class hat.callbacks.CyclicLrUpdater(target_ratio: Tuple[float] = (10, 0.0001), cyclic_times: Optional[int] = 1, step_ratio_up: Optional[float] = 0.4, step_log_interval: Optional[int] = 1)

Lr Updater Callback for adjusting lr with OneCycle Updater.

参数
  • target_ratio – Relative ratio of the highest LR and the lowest LR to the initial LR.

  • cyclic_times – Number of cycles during training

  • step_ratio_up – The ratio of the increasing process of LR in the total cycle.

get_lr(begin_lr: float, num_update: int)

Calculate formal training lr for each step or epoch.

参数
  • begin_lr – Beginning lr of formal training, commonly equal to optimizer’s initial lr.

  • num_update – Current num of lr updates.

on_loop_begin(optimizer, data_loader, num_epochs, **kwargs)

Prepare some vars for lr updater.

class hat.callbacks.ExponentialMovingAverage(decay: Optional[float] = 0.9999, decay_base: Optional[float] = 2000)
class hat.callbacks.FreezeBNStatistics

Freeze BatchNorm module every epoch while training.

Different from FreezeModule with step_or_epoch being [0] and only_batchnorm being True when resume training.

class hat.callbacks.FreezeModule(modules: List[List[str]], step_or_epoch: List[int], update_by: str, only_batchnorm: bool = False)

Freeze module parameter while training. Useful in finetune case.

参数
  • module – sub model names.

  • step_or_epoch – when to freeze module, same length as module.

  • update_by – by step or by epoch.

  • only_batchnorm – Only freeze batchnorm, with valid gradient. Default is False.

示例

>>> freeze_module_callback = FreezeModule(
...    modules=[['backbone'], ['neck']],
...    step_or_epoch=[10000, 15000],
...    update_by='step',
...    only_batchnorm=True,
... )
process(model, name)

Freeze module inplace.

class hat.callbacks.FuseBN(modules: List[List[str]], step_or_epoch: List[int], update_by: str, inplace: bool = False)

Fuse batchnorm layer in float training.

Usually batchnorm is fused in QAT, but sometimes you can do it float training.

参数
  • module – sub model names to fuse bn.

  • step_or_epoch – when to fusebn, same length as module.

  • update_by – by step or by epoch.

  • inplace – if fuse bn inplace

注解

Only Conv+BN inside nn.Sequential or nn.ModuleList can be merged.

示例

>>> fuse_bn_callback = FuseBN(
...    modules=[['backbone'], ['neck']],
...    step_or_epoch=[10000, 15000],
...    update_by='step',
... )
on_loop_begin(model, **kwargs)

Check module name.

process(model, name, inplace=False)

Fuse bn inplace.

class hat.callbacks.GradScale(module_and_scale: List, clip_grad_norm: Optional[float] = None, clip_norm_type: Optional[int] = 2)

Set gradient scale for different module.

When training multitask, gradient of each task might be different. Comparing to changing loss weight, another more efficient method is to adjust gradient.

示例

>>> grad_scale_callback = dict(
...    type="GradScale",
...    module_and_scale=[
...        ("backbone", 0.1, "real3d_fov120"),
...        ("bifpn", 0.1, "real3d_fov120"),
...        ("real3d_fov120", 1.0, "real3d_fov120"),
...    ],
...    clip_grad_norm=None,
...)
参数
  • module_and_scale – module name, gradient scale and task name. Task name can be none if you don’t need.

  • clip_grad_norm – Max norm for torch.nn.utils.clip_grad_norm_.

  • clip_norm_type – Norm type for torch.nn.utils.clip_grad_norm_.

on_backward_end(model, batch, optimizer, **kwargs)

Task-wise backward_end.

class hat.callbacks.MeanTeacherValidation(inference_on='student', **kwargs)
class hat.callbacks.MetricUpdater(metric_update_func: Callable, metrics: Optional[Sequence] = None, filter_condition: Optional[Callable] = None, step_log_freq: Optional[int] = 1, reset_metrics_by: Optional[str] = 'epoch', epoch_log_freq: Optional[int] = 1, log_prefix: Optional[str] = '', step_storage_freq: Optional[int] = - 1, epoch_storage_freq: Optional[int] = - 1, storage_key: Optional[str] = 'monitor_obj')

Callback used to reset, update, logging metrics.

参数
  • metric_update_func – Function with metrics, batch and model_outs as inputs, filter out labels, predictions then update corresponding metric.

  • metrics – Metric configs or metric instances, for multi-task.

  • filter_condition – Function to filter current task metric inputs on batch end, including model_outs and batch. Useful in multitask training.

  • step_log_freq – Logging every step_log_freq steps. If < 1, disable step log output.

  • reset_metrics_by – When are metrics reset during training, can be either one of ‘step’, ‘log’ and ‘epoch’.

  • epoch_log_freq – Logging every epoch_log_freq epochs. This argument works only when reset_metric_by == ‘epoch’.

  • log_prefix – Logging info prefix.

  • step_storage_freq – Storage every step_storage_freq steps. If < 1, disable step storage.

  • epoch_storage_freq – Storage every epoch_storage_freq epochs. If < 1, disable epoch storage.

  • storage_key – key name for monitor to storage. default: monitor_obj.

on_batch_end(batch, model_outs, train_metrics, **kwargs)

There may be multiple batches in a multitask training step.

class hat.callbacks.NoamLrUpdater(d_model: Union[int, float] = 256, warmup_step: Union[int, float] = 4000, update_by: Optional[str] = 'step', step_log_interval=1)

Noam LR Updater.

NoamLrUpdater 是Transformer训练中常用的学习率管理器. 论文出处 (Attention Is All You Need)[https://arxiv.org/pdf/1706.03762.pdf]. 学习率更新公式如下:

\[lr = lr_{base} * d_{model}^{-0.5} * \min( num_update^{-0.5}, num_update * warmup\_step^{-1.5} )\]

注解

NoamLrUpdater 的 warmup_step 跟 warmup_by 的使用方式略有不同, 因此使用了不同的参数命名避免混淆.

参数
  • d_model – 模型的维度数, 参见计算公式指定. 默认是 256.

  • warmup_step – 学习率预热的步数. 默认是 4000.

  • update_by – 学习率根据 ‘step’ 更新还是根据 ‘epoch’ 更新. 默认是 “step”.

  • step_log_interval – 每隔多少次num_update打印一次学习率log. 默认是 1.

get_lr(begin_lr: float, num_update: int)

Calculate formal training lr for each step or epoch.

参数
  • begin_lr – Beginning lr of formal training, commonly equal to optimizer’s initial lr.

  • num_update – Current num of lr updates.

get_warmup_lr(warmup_end_lr: float, num_update: int)

get_warmup_lr.

在NoamLrUpdater中 get_warmup_lr 已经被吸收进 get_lr 过程中. 这里实现``get_warmup_lr``是避免某些调用异常;

class hat.callbacks.OneCycleUpdater(lr_max: float = 0.001, div_factor: float = 10.0, pct_start: float = 0.4, stop_lr: float = 0.0, warmup_by: Optional[str] = 'step', step_log_interval: Optional[int] = 1)

Lr Updater Callback for adjusting lr with OneCycle Updater.

参数
  • max_steps – the formal training steps you want set. If it is None, max_steps = num_epochs * self.step_per_epoch - self.warmup_steps

  • stop_lr – the lr of last epoch/step.

static annealing_cos(start, end, pct)

Cosine anneal from start to end as pct goes from 0.0 to 1.0.

get_lr(begin_lr: float, num_update: int)

Calculate formal training lr for each step or epoch.

参数
  • begin_lr – Beginning lr of formal training, commonly equal to optimizer’s initial lr.

  • num_update – Current num of lr updates.

get_warmup_lr(warmup_end_lr: float, num_update: int)

Calculate warmup training lr for each step or epoch.

参数
  • warmup_end_lr – lr when warmup ending.

  • num_update – Current num of lr updates.

on_loop_begin(optimizer, data_loader, num_epochs, **kwargs)

Prepare some vars for lr updater.

set_warmup_training_lr(optimizer: torch.optim.optimizer.Optimizer, num_update: int)

Calculate current lr then assign to optimizer among warmup training.

参数
  • optimizer – Optimizer instance.

  • num_update – Current num of lr updates.

class hat.callbacks.PolyLrUpdater(max_update: int, update_by: Optional[str] = 'step', power: Optional[float] = 1.0, final_lr: Optional[float] = 0.0, warmup_by: Optional[str] = 'step', warmup_mode: Optional[str] = 'linear', warmup_len: Optional[int] = 0, warmup_begin_lr: Optional[float] = 0.0, warmup_lr_ratio: Optional[float] = 1.0, step_log_interval: Optional[int] = 1)

Reduce the learning rate according to a polynomial of given power.

Calculate the new learning rate, after warmup by:

if (num_update - self.warmup_steps) < max_update:
    new_lr = final_lr + (begin_lr - final_lr) * (
    1 - (num_update - self.warmup_steps) / max_update
    ) ^ power
else:
    new_lr = final_lr.
参数
  • max_update

    Times of lr update among formal training. Used to calculate formal training lr.

    注解

    max_update should not includes wamup steps or epochs, they can be specific independently by warmup_len.

  • update_by – Among formal training, update lr on ‘step’ begin or on ‘epoch’ begin. If equal to ‘step’, update lr according to ‘global_step_id’. If equal to ‘epoch’, update lr according to ‘epoch_id’. Default ‘step’.

  • power – Power of the decay term as a function of the current number of updates.

  • final_lr – Final learning rate after all steps.

  • warmup_by – Among warmup training, update lr on ‘step’ begin or on ‘epoch’ begin, similar to update_by. Default ‘step’.

  • warmup_len – Num of warmup steps or epochs. If warmup_by==’step’, it means warmup steps. If warmup_by==’epoch’, it means warmup epochs.

  • warmup_mode – Type of warmup used. It can be ‘constant’, ‘linear’ now.

  • warmup_begin_lr – Beginning lr used to calculated warmup lr. If warmup_mode is ‘constant’, no-op.

  • warmup_lr_ratio – Used to calculate warmup ending lr though two steps: (1) Achieve begging lr (init_lr) of formal training from optimizer. (2) warmup_end_lr = init_lr * warmup_lr_ratio.

  • step_log_interval – lr logging interval on step begin, only work when warmup_by == ‘step’. If warmup_by == ‘epoch’, logging lr on each epoch begin.

get_lr(begin_lr: float, num_update: int)

Calculate new lr after warmup according to a polynomial.

参数
  • begin_lr – Beginning lr of formal training, commonly equal to optimizer’s initial lr.

  • num_update – Current num of lr updates.

class hat.callbacks.SaveTraced(save_dir: str, trace_inputs: Union[Tuple, Dict], name_prefix: str = '', allow_anno_miss: bool = True, save_hash: bool = True, forward_before_trace: bool = False)

SaveTraced is used to trace a model and save it to a file.

参数
  • save_dir – Directory to save traced model.

  • trace_inputs – Example inputs for tracing.

  • name_prefix – name prefix of saved model.

  • allow_anno_miss – Whether to allow annotation attr missed in outputs of traced model.

  • save_hash – Whether to save the hash value to the name of the pt file. Default is True.

  • forward_before_trace – Whether to have a forward before trace. It can check model, tigger some process and so on.

class hat.callbacks.StatsMonitor(log_freq=200, batch_size=None)

StatsMonitor Callback is used for some monitors of training including epoch time, batch time and so on.

参数
  • log_freq – Freq of monitor whether to output to log.

  • batch_size – manually set ‘batchsize’ to deal with the fact that the batchsize of the dataloader is not practical

class hat.callbacks.StepDecayLrUpdater(update_by: Optional[str] = 'epoch', warmup_by: Optional[str] = 'epoch', warmup_len: Optional[int] = 0, warmup_mode: Optional[str] = 'linear', warmup_begin_lr: Optional[float] = 0.0, warmup_lr_ratio: Optional[float] = 1.0, step_log_interval: Optional[int] = 1, lr_decay_id: Optional[list] = None, lr_decay_factor: float = 0.1)

Lr Updater Callback for adjusting lr with warmup and decay.

参数
  • lr_decay_id (List(int)) – The epoch(step) list for lr decay. It means the epoch(step) id you want to decay after warmup.

  • lr_decay_factor (float) – Factor for lr decay.

get_lr(begin_lr: float, num_update: int)

Calculate new lr after warmup according to the decay epoch list lr_decay_id.

参数
  • begin_lr – Beginning lr of formal training, commonly equal to optimizer’s initial lr.

  • num_update – Current epochs or steps of lr updates.

返回

learning rate

返回类型

lr

class hat.callbacks.TensorBoard(save_dir: str, overwrite: bool = False, loss_name_reg: str = '^.*loss.*', update_freq: int = 1, update_by: str = 'step', tb_update_funcs: Optional[Sequence[Callable]] = None, **tb_writer_kwargs)

TensorBoard Callback is used for recording somethings durning training, such as loss, image and other visualization.

参数
  • save_dir (str) – Directory to save tensorboard.

  • overwrite (bool) – Whether overwriting existed save_dir.

  • loss_name_reg (str) – Specific loss pattern.

  • update_freq (int) – Freq of tensorboard whether to output to file.

  • update_by – Set update_freq unit to step or epoch. Default is step.

  • tb_update_funcs (list(callable)) – list of function with writer, model_outs, step_id as input and then update tensorboard.

on_batch_end(model_outs=None, global_step_id=None, **kwargs)

There may be multiple batches in a multitask training step.

class hat.callbacks.Validation(data_loader: Union[Sequence[Iterable], Iterable], batch_processor: Callable, callbacks: Sequence[Union[dict, hat.callbacks.callbacks.CallbackMixin]], val_interval: Optional[int] = 1, interval_by: Optional[str] = 'epoch', val_model: Optional[Union[Dict, torch.nn.modules.module.Module]] = None, model_convert_pipeline: Optional[Union[Dict, List]] = None, init_with_train_model: Optional[bool] = True, strict_match: Optional[bool] = False, val_on_train_end: Optional[bool] = True, profiler: Optional[dict] = None, share_callbacks: bool = True, log_interval: Optional[int] = 10)

Callbacks of validation.

Validation Callback does the following: (1) use train model to forward batches from validation data loader. (2) use callbacks like MetricUpdater to record validation results. You are suppose to think about two questions then provide below arguments: (1) Whether use train model as val model or provide one by val_model? (2) whether init val model with train model or not (you can init it by yourself)?

参数
  • data_loader – Validation data loader.

  • batch_processor – Validation batch processor (not need_grad_update).

  • callbacks – Callbacks you want to run when doing validation, commonly it should contain callbacks like MetricUpdater, which is used to reset, update, logging validation metrics. If is empty list, no-op.

  • val_interval – Validation interval.

  • interval_by – Set val_interval unit to step or epoch. Default is epoch.

  • val_model – Model used for validation. If None, use train model as val model.

  • model_convert_pipeline – Define the process of model convert. e.g. convert float model to qat model, convert qat model to quantize model.

  • init_with_train_model – Whether init val model with train model. If val_model is None, no-op.

  • strict_match – Whether to strictly enforce that the keys in model.state_dict() (train model) match the keys in val_model.state_dict(). Default: False

  • val_on_train_end – Whether do validation when on_loop_end is triggered. Default is True.

  • profiler – To profile individual steps during validation and assist in identifying bottlenecks.

  • share_callbacks – Whether to share callbacks on different dataloader.

  • log_interval – Logging output frequency.