10.1.6.6. profiler

Profilers widely used for perf in HAT.

10.1.6.6.1. profiler

profilers.PassThroughProfiler

This class should be used when you don't want the (small) overhead of profiling.

profilers.SimpleProfiler

This profiler simply records the duration of actions (in seconds) and reports the mean duration of each action and the total time spent over the entire training run.

memory_profiler.GPUMemoryProfiler

memory_profiler.CPUMemoryProfiler

memory_profiler.StageCPUMemoryProfiler

model_profiler.BaseModelProfiler

Base class for defining the process of model analysis.

model_profiler.FeaturemapSimilarity

Compute the similarity of two models.

model_profiler.ProfileFeaturemap

Profile featuremap value with log or tensorboard.

model_profiler.CheckShared

Checking if model has shared ops.

model_profiler.CheckFused

Checking if model has unfused ops.

model_profiler.CompareWeights

Compare weights of float/qat/quantized models.

model_profiler.CheckDeployDevice

Check deploy device(BPU or CPU) of hybrid model.

model_profiler.ModelProfilerv2

Run model and save each op info.

model_profiler.HbirModelProfiler

Run hbir model and save each op info.

10.1.6.6.2. API Reference

This file is modified from pytorch-lightning.

checking if there are any bottlenecks in your code.

class hat.profiler.profilers.PassThroughProfiler(dirpath: Optional[Union[str, pathlib.Path]] = None, filename: Optional[str] = None, auto_discribe: bool = False)

This class should be used when you don’t want the (small) overhead of profiling. The Trainer uses this class by default.

start(action_name: str) None

Define how to start recording an action.

stop(action_name: str) None

Define how to record the duration once an action is complete.

summary() str

Create profiler summary in text format.

class hat.profiler.profilers.SimpleProfiler(dirpath: Optional[Union[str, pathlib.Path]] = None, filename: Optional[str] = None, warmup_step: int = 1, use_real_duration: bool = False, auto_discribe: bool = True)

This profiler simply records the duration of actions (in seconds) and reports the mean duration of each action and the total time spent over the entire training run.

start(action_name: str) None

Define how to start recording an action.

stop(action_name: str) None

Define how to record the duration once an action is complete.

summary() str

Create profiler summary in text format.

Memory Profiling.

Help profiling the GPU or CPU memory bottleneck in the process of model training.

class hat.profiler.memory_profiler.CPUMemoryProfiler(dirpath: Optional[Union[str, pathlib.Path]] = None, filename: Optional[str] = None, auto_discribe: bool = False)
start(action_name: str) None

Define how to start recording an action.

stop(action_name: str) None

Define how to record the duration once an action is complete.

summary()

Create profiler summary in text format.

class hat.profiler.memory_profiler.GPUMemoryProfiler(dirpath: Optional[Union[str, pathlib.Path]] = None, filename: Optional[str] = None, record_snapshot: bool = False, snapshot_interval: int = 1, record_functions: Optional[Set[str]] = None, auto_discribe: bool = False)
save_snapshots()

Dump all snapshots.

snapshots format (dict(dict)):

{

step_id : {

action_name: […]

},

}

start(action_name: str) None

Define how to start recording an action.

stop(action_name: str) None

Define how to record the duration once an action is complete.

summary()

Create profiler summary in text format.

class hat.profiler.memory_profiler.StageCPUMemoryProfiler(**kwargs)
describe() None

Log a profile report after the conclusion of run.

profile(action_name: str)

Yield a context manager to encapsulate the scope of a profiled action.

Example:

with self.profile('load training data'):
    # load training data code

The profiler will start once you’ve entered the context and will automatically stop once you exit the code block.

class hat.profiler.model_profiler.BaseModelProfiler(**kwargs)

Base class for defining the process of model analysis.

class hat.profiler.model_profiler.CheckDeployDevice(print_tabulate: bool = True, out_dir: Optional[str] = None)

Check deploy device(BPU or CPU) of hybrid model.

参数
  • print_tabulate (bool, optional) – Whether print the result as tabulate. Defaults to True.

  • out_dir – path to save the result txt ‘deploy_device.txt’. If None, will save in the current directory. Default: None

返回

A dict of model deploy infos with schema
  • KEY (str): module name

  • VALUE (Tuple): (deploy device(BPU or CPU), module type)

class hat.profiler.model_profiler.CheckFused(print_tabulate: bool = True)

Checking if model has unfused ops.

Check unfused modules in a model. NOTE: This function is only capable to check unfused modules. For the correctness of fusion, please use featuremap_similarity to compare the feature between fused and unfused model.

参数

print_tabulate (bool) – Whether print the result as tabulate. Default: True.

返回

The qualified name of modules that can be fused.

返回类型

List[List[str]]

class hat.profiler.model_profiler.CheckShared(check_leaf_module: Optional[callable] = None, print_tabulate: bool = True)

Checking if model has shared ops.

Count called times for all leaf modules in a model.

参数
  • check_leaf_module (callable, optional) – A function to check if a module is leaf. Pass None to use pre-defined is_leaf_module. Default: None.

  • print_tabulate (bool, optional) – Whether print the result as tabulate. Default: True.

返回

The qualified name and called times of each leaf module.

返回类型

Dict[str, int]

class hat.profiler.model_profiler.CompareWeights(similarity_func='Cosine', with_tensorboard: bool = False, tensorboard_dir: Optional[str] = None, out_dir: Optional[str] = None)

Compare weights of float/qat/quantized models.

This function compares weights of each layer based on torch.quantization._numeric_suite.compare_weights. The weight similarity and atol will be print on the screen and save in “weight_comparison.txt”. If you want to see histogram of weights, set with_tensorboard=True.

参数
  • similarity_func – similarity computation function. Support “Cosine”, “MSE”, “L1”, “KL”, “SQNR” or any user-defined Callable object. If it is a user-defined object, it should return a scalar or tensor with only one number. Otherwise the result shown may be unexpected. Default: “Cosine”

  • with_tensorboard – whether to use tensorboard. Default: False

  • tensorboard_dir – tensorboard log file path. Default: None

  • out_dir – path to save the result txt and picture. If None, will save in the current directory. Default: None

返回

  • KEY (str): module name (Eg. layer1.0.conv.weight)
    • VALUE (dict): a dict of the corresponding weights in two models:

      ”float”: weight value in float model “quantized”: weight value in qat/quantized model

A list of list. Each list is each layer weight similarity in format [module name, similarity, atol(N scale)]

返回类型

A weight comparison dict with schema

class hat.profiler.model_profiler.FeaturemapSimilarity(similarity_func: Union[str, callable] = 'Cosine', threshold: Optional[numbers.Real] = None, devices: Optional[Union[torch.device, tuple]] = None, out_dir: Optional[str] = None)

Compute the similarity of two models.

Compute the similarity of feature maps. The input models can be float/ fused/calibration/qat/quantized model.

参数
  • similarity_func – similarity computation function. Support “Cosine”, “MSE”, “L1”, “KL”, “SQNR”, or any user-defined Callable object. If it is a user-defined object, it should return a scalar or tensor with only one number. Otherwise the result shown may be unexpected. Default: “Cosine”

  • threshold – if similarity value exceeds or less than this threshold, the featuremap name will be shown in red color.If threshold is none, it will be set to different values according to different similarity functions. Default: None

  • devices – run model on which devices (cpu, gpu). If can be - None. Run model with given inputs - torch.device. Both models and given inputs will be moved on this specified device - tuple. A tuple of 2 torch.devices. The two models will be moved on specified devices seperatedly. It may be used to compare the CPU and GPU results difference.

  • out_dir – path to save the result txt and picture. If None, will save in the current directory. Default: None

返回

A List of list. Each list is each layer similarity info in format [index, module name, module type, similarity, scale, atol, atol(N scale), single op error(N scale)]

class hat.profiler.model_profiler.HbirModelProfiler(show_table: bool = True, show_tensorboard: bool = False, prefixes: Optional[Tuple[str, ...]] = None, types: Optional[Tuple[Type, ...]] = None, with_stack: bool = False, force_per_channel: bool = False, out_dir: Optional[str] = None)

Run hbir model and save each op info.

This function runs hbir model and save each op info on disk, which can be show in a table or in tensorboard.

参数
  • show_table – whether show each op info in a table, which will also be saved in statistic.txt

  • show_tensorboard – whether show each op histogram in tensorboard.

  • prefixes – only show ops with the prefix of qualified name

  • types – only show ops with given types

  • with_stack – whether show op location in code

  • force_per_channel – whether show data in per channel in tensorboard

  • out_dir – dir to save op infos and result files

class hat.profiler.model_profiler.ModelProfilerv2(show_table: bool = True, show_tensorboard: bool = False, prefixes: Optional[Tuple[str, ...]] = None, types: Optional[Tuple[Type, ...]] = None, with_stack: bool = False, force_per_channel: bool = False, out_dir: Optional[str] = None)

Run model and save each op info.

This function runs model and save each op info on disk, which can be show in a table or in tensorboard.

参数
  • show_table – whether show each op info in a table, which will also be saved in statistic.txt

  • show_tensorboard – whether show each op histogram in tensorboard.

  • prefixes – only show ops with the prefix of qualified name

  • types – only show ops with given types

  • with_stack – whether show op location in code

  • force_per_channel – whether show data in per channel in tensorboard

  • out_dir – dir to save op infos and result files

class hat.profiler.model_profiler.ProfileFeaturemap(prefixes: Tuple = (), types: Tuple = (), device: Optional[torch.device] = None, preserve_int: bool = False, use_class_name: bool = False, skip_identity: bool = False, with_tensorboard: bool = False, tensorboard_dir: Optional[str] = None, print_per_channel_scale: bool = False, show_per_channel: bool = False, out_dir: Optional[str] = None, file_name: Optional[str] = None, profile_func: Optional[callable] = None)

Profile featuremap value with log or tensorboard.

Print min/max/mean/var/scale of each feature profiled by get_raw_features by default. If with_tensorboard set True, the histogram of each feature will be shown in tensorboard, which is useful to see the data distribution.

If you want to get more info about features, you can define your custom profile functions to process the results of get_raw_features.

参数
  • prefixes – get features info by the prefix of qualified name Default: tuple().

  • types – get features info by module type. Default: tuple().

  • device – model run on which device. Default: None

  • preserve_int – if True, record each op result in int type. Default: False

  • use_class_name – if True, record class name not class type. Default: False

  • skip_identity – if True, will not record the result of Identity module. Default: False

  • with_tensorboard – whether to use tensorboard. Default: False

  • tensorboard_dir – tensorboard log file path. Default: None

  • print_per_channel_scale – whether to print per channel scales. Default: False

  • show_per_channel – show each featuremap in per channel ways in tensorboard. Default: False

  • out_dir – path to save the result txt and picture. If None, will save in the current directory. Default: None

  • file_name – result file name. If None, will save result and fig with name ‘statistic’.(statistic.txt and statistic.html). Default: None

  • profile_func (callable, None) – you custom featuremap profiler function. Default: None

返回

A List of list. Each list is each layer statistic in format [index, module name, module type, attr, min, max, mean, var, scale]