7.7.2. 支持的公版算子¶

import horizon_plugin_pytorch.nn as horizon

7.7.2.1. 综合说明¶

除特别说明，Bernoulli2 架构限制算子的输入输出均为 4 维。
在 eager 模式中，部分算子需要手动替换，fx 模式无需手动替换算子。
以下支持的算子默认为不进行算子融合，对于可进行融合的算子（如 ((conv,bn),relu)），参考算子融合章节。
在预测阶段，透传的算子（例如 Identity，Dropout），在部署时会被优化掉。
不支持 select

7.7.2.2. torch function 类¶

算子	eager 模式替换算子	Bernoulli2			Bayes/Bayes-e
		输入	输出	其它限制	输入	输出	其它限制
torch.abs		不支持			qint8, qint16	同输入
torch.acos	horizon.nn.Acos	不支持			qint8, qint16	qint8, qint16	底层查表实现，有精度风险
torch.acosh	horizon.nn.Acosh	不支持			参考 torch.acos
torch.add	torch.nn.quantized.FloatFunctional 或 horizon.nn.quantized.FloatFunctional	qint8, qint16	qint8, qint16	in_channel<=2048，不支持操作数为常数	qint8, qint16	qint8, qint16	支持除 N 维以外的广播，只能有一个 input 广播，如果其中一个操作数为 scalar，需要调用 add_scalar
torch.argmax		参考 torch.max			参考 torch.max
torch.argmin		参考 torch.max			参考 torch.max
torch.asin	horizon.nn.Asin	不支持			参考 torch.acos
torch.asinh	horizon.nn.Asinh	不支持			参考 torch.acos
torch.atan	horizon.nn.Atan	不支持			参考 torch.acos
torch.atanh	horizon.nn.Atanh	不支持			参考 torch.acos
torch.cat	torch.nn.quantized.FloatFunctional 或 horizon.nn.quantized.FloatFunctional	qint8, qint16	qint8, qint16		qint8, qint16	qint8, qint16	input shape: [N, C, H, W], N<=4096, HWC<=65536, 2<=input number<=1024
torch.ceil	horizon.nn.Ceil	不支持			qint8, qint16	同输入	int8 下输入数量级不要超过 1e6, int16 下输入数量级不要超过 1e8。
torch.clamp		不支持			qint8, qint16	同输入	支持 min 和 max 的输入为 Tensor/常量 Tensor/标量/None。为常量 Tensor 时，min 和 max 的输入数据范围最好和 input 一致，否则有精度风险。不支持 min、max 同时为 None
torch.clip		不支持			参考 torch.clamp
torch.cos	horizon.nn.Cos	不支持			参考 torch.acos
torch.cosh	horizon.nn.Cosh	不支持			参考 torch.acos
torch.div	horizon.nn.Div	不支持			qint16	qint16
torch.eq		不支持			qint8, qint16
torch.erf	horizon.nn.Erf	不支持			参考 torch.acos
torch.exp	horizon.nn.Exp	qint8	qint8	使用查表拼凑，有精度风险	参考 torch.acos
torch.floor	horizon.nn.Floor	不支持			qint8, qint16	同输入	int8 下输入数量级不要超过 1e6, int16 下输入数量级不要超过 1e8。
torch.gather		不支持			qint8, qint16, qint32	同输入
torch.ge		不支持			参考 torch.eq
torch.greater		不支持			参考 torch.eq
torch.greater_equal		不支持			参考 torch.eq
torch.gt		不支持			参考 torch.eq
torch.le		不支持			参考 torch.eq
torch.less		不支持			参考 torch.eq
torch.less_equal		不支持			参考 torch.eq
torch.log	horizon.nn.HardLog	不支持			参考 torch.acos
torch.lt		不支持			参考 torch.eq
torch.matmul	horizon.nn.quantized.FloatFunctional	qint8	qint8, qint32		input: qint8 other: qint8	qint8, qint16, qint32	input shape: [N, C, H, W], input_size<1 G bytes, N<=4096, C, H, W<=8192.
torch.max		qint8	同输入		qint8, qint16	out: qint8, qint16 index: int32	index 只能作为模型输出。input_shape: [N, C, H, W], 1<=N<=4096, 1<=H, W, C<=65535 支持 min 和 max 的输入为 Tensor/常量 Tensor/标量/None。为常量 Tensor 时，min 和 max 的输入数据范围最好和 input 一致，否则有精度风险。index 输出精度与输入相同且不能小于输入精度，只支持 int8 和 int16
torch.maximum	horizon.nn.quantized.FloatFunctional	不支持			input: qint8, qint16 other: qint8, qint16	qint8, qint16
torch.mean	horizon.nn.quantized.FloatFunctional	qint8, qint16	qint8, qint16	只支持在 channel 方向的 mean。QAT 有训练参数，不要单独在预测中使用。	qint8, qint16	qint8, qint16	支持在 CHW 上求 mean.QAT 有量化参数
torch.min		不支持			参考 torch.max
torch.minimum	horizon.nn.quantized.FloatFunctional	不支持			参考 torch.maximum
torch.mul	torch.nn.quantized.FloatFunctional 或 horizon.nn.quantized.FloatFunctional	参考 torch.add			参考 torch.add
torch.pow	horizon.nn.Pow	不支持			参考 torch.acos
torch.reciprocal	horizon.nn.Reciprocal	不支持			参考 torch.acos
torch.selu	horizon.nn.SeLU	不支持			参考 torch.acos
torch.sin	horizon.nn.Sin	不支持			参考 torch.acos
torch.sinh	horizon.nn.Sinh	不支持			参考 torch.acos
torch.split		qint8, qint16	同输入		qint8, qint16	同输入
torch.sqrt	horizon.nn.Sqrt	不支持			参考 torch.acos
torch.sub	horizon.nn.quantized.FloatFunctional	qint8, qint16	qint8, qint16	in_channel<=2048	qint8, qint16	qint8, qint16	支持除 N 维以外的广播，只能有一个 input 广播。不支持第二个输入为 int
torch.sum	horizon.nn.quantized.FloatFunctional	qint8	qint8, qint32	只支持 batch 和 channel 方向的 sum。	qint8, qint16	qint8, qint16	仅支持 HWC 三个维度的 sum
torch.tan	horizon.nn.Tan	不支持			参考 torch.acos
torch.topk		不支持			qint8, qint16, qint32	同输入

7.7.2.3. torch.nn.functional function 类¶

算子	eager 模式替换算子	Bernoulli2			Bayes/Bayes-e
		输入	输出	其它限制	输入	输出	其它限制
torch.nn.functional.grid_sample		不支持			input：qint8 grid： qint8, qint16	qint8	输入 shape: [N, C, H, W], 1<=H, W<=1024 且 HW<=7201024; grid 支持 qint8 和 qint16，只支持 bilinear 和 nearest 插值 padding 模式只支持 zeros 和 border;
torch.nn.functional.interpolate		qint8	qint8	支持 nearest 和 billinear 插值模式。 1/256<缩放比例<=256	qint8	qint8	只支持 nearest 和 billinear 插值模式。input_shape: [N, C, H, W], 1<=C, H, W<=8192，align_corners 支持 False 和 None， scale=[] 时要求 recompute_scale_factors 为 True
torch.nn.functional.pad		不支持			qint8, qint16	同输入	不支持 reflect 模式
torch.nn.functional.relu	torch.nn.ReLU	qint8	qint8		qint8	同输入	Conv2d+BN+ReLU 这种模式会自动 fuse
torch.nn.functional.relu6(fused)	torch.nn.ReLU6				qint8	同输入

7.7.2.4. torch.nn Module 类¶

算子	Bernoulli2			Bayes/Bayes-e
	输入	输出	其它限制	输入	输出	其它限制
torch.nn.AdaptiveAvgPool2d	不支持			qint8	同输入	使用 AvgPool2d 非等价拼凑，有精度问题
torch.nn.AvgPool2d	qint8	同输入	1<=kernel<=7，1<=stride<=185			1<=kernel, stride, padding<=256;
torch.nn.BatchNorm2d			BatchNorm2d 在 QAT 阶段被吸收，不体现在预测模型中。由于编译器限制，独立使用的 BatchNorm2d 底层调用 BpuConvolution 实现	qint8	qint8	BatchNorm2d 在 QAT 阶段被吸收，因此，不体现在模型中。独立使用限制参考 Conv2d
torch.nn.BatchNorm3d			BatchNorm3d 在 QAT 阶段被吸收，不体现在预测模型中。由于编译器限制，独立使用的 BatchNorm3d 底层调用 BpuConvolution 实现	qint8	qint8	BatchNorm3d 在 QAT 阶段被吸收，因此，不体现在模型中。独立使用限制参考 Conv2d
torch.nn.ChannelShuffle	qint8	同输入		qint8, qint16	同输入	shuffle_index 中的数值不能重复
torch.nn.ConstantPad2d	参考 torch.nn.ZeroPad2d			参考 torch.nn.ZeroPad2d
torch.nn.Conv2d	qint8	qint8 ，qint32	kernel<=7.channel(one group) <= 2048. dilation=(1, 1)/(2, 2)/(4, 4)，当 dilation!=(1, 1) 时，stride 必须为 (1, 1). HxWxC <= 32768	input: qint8, qint16 weight: qint8 bias: qint32	qint8, qint16,qint32	out_channel<=8192，作为模型输出时，out_channel <= 16384. 输入 channel<=8192, kernel<32, dilation<=16, 当 dilation!=1 时，stride 只能为 1. 支持 sumin, 带 sumin 的 conv 只支持 stride 为 (1, 1) 或 (2, 2). weight_shape: [N, C, H, W], N, C<=8192, H, W<=31, 作为模型输出 C<=16384, weight_size < 65535. padding<=256 qint16 输入时累加和不能超过 int32 范围
torch.nn.Conv3d	不支持			input: qint8, weight: qint8 bias: qint32	qint8, qint16,qint32	input: [N, C, D, H, W] int8, N<=128; H, W, D, C<=65536; weight: [C_o, C_i, D, H, W] int8, N, C<=65536, D, H<=9, W<=8191; bias: int32; output: [N, C, D, H, W] int8, int16, int32; stride: [D, H, W], D, H, W 等于 1 或 2, 并且 D, H, W 相同; padding: [D, H, W], D<=kernel_d/2, H<=kernel_h/2, W<=kernel_w/2(kernel_w 指 weight W 维大小) group, dilation: 暂不支持
torch.nn.ConvTranspose2d	qint8	qint8	2<=kernel<= 14.channel<=2048. padding HW=[0, (kernel_h-1)/2] [0, (kernel_w-1)/2] 2<=stride<=4, dilation=(1, 1)	qint8	qint8	输入 shape: [N, C, H, W], 1<=N<=128, 1<=channel<=2048; weight_shape: [N, C, H, W], 1<=N, C<=2048, 2<=H, W<=14, weight_size<=65535; kernel>=stride, 1<=stride<=14, 1<=out_channel<=2048, in_channel<=2048 pad<=kernel/stride, 0<=out_pad<=1; bias 类型为 int32; 支持 sumin, sumin 输入类型为 int8; 0<=output_padding<=1; 支持 group, 要求 weight_n 和输入 channel 均能被 group 整除; dilation=1
torch.nn.Dropout	qint8, qint16，qint32	同输入		qint8, qint16，qint32	同输入
torch.nn.Dropout2d	qint8, qint16，qint32	同输入		qint8, qint16，qint32	同输入
torch.nn.ELU	不支持			参考 torch.acos
torch.nn.GELU	参考 torch.exp			参考 torch.acos
torch.nn.GLU	不支持			参考 torch.acos
torch.nn.HardSigmoid	不支持			参考 torch.acos
torch.nn.Identity	qint8, qint16，qint32	同输入		qint8, qint16，qint32	同输入
torch.nn.Layernorm	不支持			qint8	qint8, qint16	底层使用多次查表拼凑，精度风险较高。可通过 rsqrt_kwargs 属性来控制内部 rsqrt 查表的参数若遇到 convert 精度降低的问题可以尝试 layernorm_op.rsqrt_kwargs = {“auto_divide_strategy”: “curvature”}. H * W <= 16384, normalized_shape H * W < 16384
torch.nn.LeakyReLU	不支持			参考 torch.acos
torch.nn.Linear	不支持			input: qint8 weight:qint8 bias: qint32	qint8	in_features <= 8192, out_features <= 8192.
torch.nn.LSTMCell	不支持			qint8, qint16	qint8, qint16	输入是 2 维
torch.nn.MaxPool2d	qint8	同输入	1<=kernel<=64, 1<=stride<=256, padding>=0	qint8	同输入	input_shape: [N, C, H, W], 1<=H, W, C<=8192;1<=kernel, stride<=256; 0<=padding<=255;
torch.nn.MultiheadAttention	不支持			qint8, qint16	qint8, qint16	不支持 add_bias_kv、add_zero_attn 和 q k v embed_dim 不一致的情况，支持输入输出 int8/int16，底层查表算子与 mask 量化可能带来精度风险
torch.nn.PixelShuffle	qint8	同输入		qint8	同输入
torch.nn.PixelUnshuffle	qint8	同输入		qint8	同输入
torch.nn.PReLU	不支持			参考 torch.acos
torch.nn.ReLU	qint8	同输入		qint8,qint16	同输入
torch.nn.ReLU6	qint8	同输入		qint8,qint16	同输入
torch.nn.ReplicationPad2d	参考 torch.nn.ZeroPad2d			参考 torch.nn.ZeroPad2d
torch.nn.Sigmoid	参考 torch.exp			参考 torch.acos
torch.nn.SiLU	参考 torch.exp			参考 torch.acos
torch.nn.Softmax	不支持			qint8	qint8, qint16	使用多次查表、求和等算子拼凑，精度风险较高
torch.nn.Softplus	不支持			参考 torch.acos
torch.nn.SyncBatchNorm	qint8	qint8	使用 torch.nn.Conv2d 拼凑	qint8	qint8	使用 torch.nn.Conv2d 拼凑
torch.nn.Tanh	参考 torch.exp			参考 torch.acos
torch.nn.Upsample	参考 torch.nn.functional.interpolate			参考 torch.nn.functional.interpolate
torch.nn.UpsamplingBilinear2d	参考 torch.nn.functional.interpolate			参考 torch.nn.functional.interpolate
torch.nn.UpsamplingNearest2d	参考 torch.nn.functional.interpolate			参考 torch.nn.functional.interpolate
torch.nn.ZeroPad2d	qint8	同输入		qint8, qint16	同输入

7.7.2.5. torch.quantization Module 类¶

算子	eager 模式替换算子	Bernoulli2			Bayes/Bayes-e
		输入	输出	其它限制	输入	输出	其它限制
torch.quantization.DeQuantStub		qint8,qint16,qint32	float32	典型使用场景：网络模型分段的场景，需要把数据从 BPU 传输到 CPU，在 CPU 上进行反量化，方便 CPU 上处理	qint8,qint16,qint32	float32	典型使用场景：网络模型分段的场景，需要把数据从 BPU 传输到 CPU，在 CPU 上进行反量化，方便 CPU 上处理
torch.quantization.QuantStub	horizon.quantization.QuantStub	float32	qint8,qint16	典型使用场景：整个网络模型的输入。模型分段的场景：数据从 CPU 送入到 BPU 之前需要把数据进行量化。scale 参数设置方法：scale 的设置和具体的输入有关。设置目标是使得输入的 float 类型的数据尽量高精度地量化到 int8 类型这就有两个方面的要求：可以覆盖所有的（至少是绝大部分）输入数据，量化精度高。例如：输入 float 的范围是 (-1, 1), 那么，我们可以设置 scale = 1 / 128。Float 预训练模型：在预训练模型中，由于模型已经训练好，不一定遵循上述 scale 参数设置方法，这时，可以通过插入一个特殊的 conv 的方法来解决。要求输入 QuantStub 的数据的分布是均匀的	float32	qint8,qint16	典型使用场景：整个网络模型的输入。模型分段的场景，数据从 CPU 送入到 BPU 之前需要把数据进行量化。scale 参数设置方法：scale 的设置和具体的输入有关。设置目标是使得输入的 float 类型的数据尽量高精度地量化到 int8 类型，这就有两个方面的要求：可以覆盖所有的（至少是绝大部分）输入数据，量化精度高。例如：输入 float 的范围是 (-1, 1), 那么，我们可以设置 scale = 1 / 128。Float 预训练模型：在预训练模型中，由于模型已经训练好，不一定遵循上述 scale 参数设置方法，这时，可以通过插入一个特殊的 conv 的方法来解决。要求输入 QuantStub 的数据的分布是均匀的

7.7.2.6. torch.Tensor method 类¶

算子	Bernoulli2			Bayes/Bayes-e
	Tensor 类型	输出	其它限制	Tensor 类型	输出	其它限制
torch.Tensor.__getitem__	qint8, qint16, qint32	同输入
torch.Tensor.transpose	不支持			qint8, qint16, qint32	Tensor.dtype	不支持对 N 维的 transpose
torch.Tensor.argmax	参考 torch.max			参考 torch.max
torch.Tensor.argmin	参考 torch.max			参考 torch.max
torch.Tensor.clamp	不支持			qint8, qint16	Tensor.dtype	dim <= 10, 1 <= each_dim_size < 65536
torch.Tensor.clip	不支持			参考 torch.Tensor.clip
torch.Tensor.eq	不支持			参考 torch.eq
torch.Tensor.expand	不支持			qint8, qint16	Tensor.dtype
torch.Tensor.ge	不支持			参考 torch.eq
torch.Tensor.greater	不支持			参考 torch.eq
torch.Tensor.greater_equal	不支持			参考 torch.eq
torch.Tensor.gt	不支持			参考 torch.eq
torch.Tensor.le	不支持			参考 torch.eq
torch.Tensor.less	不支持			参考 torch.eq
torch.Tensor.less_equal	不支持			参考 torch.eq
torch.Tensor.max	不支持			参考 torch.max
torch.Tensor.min	不支持			参考 torch.max
torch.Tensor.repeat	不支持			qint8, qint16	Tensor.dtype
torch.Tensor.reshape	不支持				Tensor.dtype
torch.Tensor.tile	不支持			qint8, qint16	Tensor.dtype
torch.Tensor.abs	不支持			qint8, qint16	Tensor.dtype

7.7.2.7. torchvision 类¶

算子	eager 模式替换算子	Bernoulli2			Bayes/Bayes-e
		输入	输出	其它限制	输入	输出	其它限制
torchvision.models.detection.rpn.AnchorGenerator	horizon.nn.AnchorGenerator	qint8,qint16,qint32,float32	float32	仅支持 Tensor.shape 可以离线确定的情况	qint8,qint16,qint32,float32	float32	支持输入 int8/int16/int32/float32, 输出 float32
torchvision.ops.MultiScaleRoIAlign	horizon.nn.MultiScaleRoIAlign	参考 torchvision.ops.RoIAlign			参考 torchvision.ops.RoIAlign
torchvision.ops.RoIAlign		qint8	qint8		qint8	qint8	1<=feature number<=5;bbox 仅支持 List[Tensor] 格式 shape:[1, box_num, 4], bbox 最后一维 4 个数分别为：[left, top, right, bottom]