10.1.5. Modelzoo¶
10.1.5.1. Classification¶
network |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|
MobileNetV1 |
74.12 |
73.92 |
73.61 |
ImageNet |
1x3x224x224 |
0.77 |
3547.90 |
MobileNetV2 |
72.65 |
72.51 |
72.11 |
ImageNet |
1x3x224x224 |
0.69 |
4191.89 |
ResNet 18 |
72.04 |
72.03 |
72.03 |
ImageNet |
1x3x224x224 |
1.53 |
1496.13 |
ResNet 50 |
77.37 |
76.99 |
76.94 |
ImageNet |
1x3x224x224 |
3.06 |
682.83 |
VargNetV2 |
73.94 |
73.56 |
73.64 |
ImageNet |
1x3x224x224 |
0.78 |
3513.86 |
EfficientNet-B0 |
74.31 |
74.23 |
74.18 |
ImageNet |
1x3x224x224 |
0.91 |
2854.65 |
SwinTransformer |
80.24 |
80.15 |
80.05 |
ImageNet |
1x3x224x224 |
14.50 |
140.48 |
MixVarGENet |
71.33 |
71.23 |
71.04 |
ImageNet |
1x3x224x224 |
0.56 |
5772.66 |
VargConvert |
78.98 |
78.92 |
78.89 |
ImageNet |
1x3x224x224 |
1.51 |
3513.86 |
EfficieNasNetm |
80.24 |
79.99 |
79.94 |
ImageNet |
1x3x300x300 |
2.06 |
1080.72 |
EfficieNasNets |
76.63 |
76.23 |
76.03 |
ImageNet |
1x3x280x280 |
1.00 |
2568.50 |
Torchvision(浮点模型来自社区):
network |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|
ResNet 18 |
69.76 |
69.71 |
69.73 |
ImageNet |
1x3x224x224 |
1.59 |
1492.23 |
ResNet 50 |
76.13 |
76.07 |
76.06 |
ImageNet |
1x3x224x224 |
3.18 |
665.43 |
MobileNetV2 |
71.88 |
71.27 |
71.27 |
ImageNet |
1x3x224x224 |
0.85 |
3494.91 |
10.1.5.2. Detection¶
RetinaNet
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
Retinanet-vargnetv2 |
vargnetv2 |
31.51 |
31.21 |
31.20 |
MS COCO |
1x3x1024x1024 |
24.34 |
81.88 |
YOLOv3
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
YOLOv3-MobileNetv1 |
mobilenetv1 |
76.57 |
75.62 |
75.61 |
VOC |
1x3x416x416 |
4.21 |
493.33 |
YOLOv3-VarGDarknet |
VarGDarknet |
33.90 |
33.60 |
33.36 |
COCO |
1x3x416x416 |
6.51 |
307.22 |
FCOS
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
FCOS-efficientnet |
efficientnetb0 |
36.26 |
35.79 |
35.59 |
MS COCO |
1x3x512x512 |
1.35 |
1734.48 |
FCOS-efficientnet |
efficientnetb1 |
41.37 |
41.21 |
40.71 |
MS COCO |
1x3x640x640 |
2.78 |
786.97 |
FCOS-efficientnet |
efficientnetb2 |
45.35 |
45.10 |
45.00 |
MS COCO |
1x3x768x768 |
4.32 |
478.41 |
FCOS-efficientnet |
efficientnetb3 |
48.03 |
47.65 |
47.58 |
MS COCO |
1x3x896x896 |
7.14 |
282.20 |
DETR
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
DETR-resnet50 |
resnet50 |
35.70 |
31.42 |
31.31 |
MS COCO |
1x3x800x1333 |
41.36 |
47.31 |
DETR-efficientnetb3 |
efficientnetb3 |
37.21 |
35.95 |
35.99 |
MS COCO |
1x3x800x1333 |
32.33 |
62.14 |
FCOS3D
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
FCOS3D-efficientnetb0 |
efficientnetb0 |
30.62 |
30.27 |
30.38 |
nuscenes |
1x3x512x896 |
3.51 |
630.70 |
10.1.5.3. Segmentation¶
UNet
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
UNet |
MobileNetV1 |
68.02 |
67.56 |
67.53 |
Cityscapes |
1x3x1024x2048 |
2.10 |
1014.32 |
Deeplab
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
Deeplab |
EfficientNet-M0 |
76.30 |
76.22 |
76.12 |
Cityscapes |
1x3x1024x2048 |
4.78 |
434.36 |
Deeplab |
EfficientNet-M1 |
77.94 |
77.64 |
77.65 |
Cityscapes |
1x3x1024x2048 |
11.23 |
179.65 |
Deeplab |
EfficientNet-M2 |
78.82 |
78.65 |
78.63 |
Cityscapes |
1x3x1024x2048 |
17.37 |
114.19 |
FastScnn
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
FastScnn |
EfficientNet-B0lite |
69.97 |
69.90 |
69.88 |
Cityscapes |
1x3x1024x2048 |
1.92 |
1173.71 |
10.1.5.4. OpticalFlow¶
PwcNet
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
PwcNet-lg |
PwcNet |
1.4117 |
1.4112 |
1.4075 |
FlyingChairs |
1x6x384x512 |
12.62 |
161.35 |
10.1.5.5. Lidar¶
PointPillars
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
PointPillars |
SequentialBottleNeck |
77.31 |
76.86 |
76.76 |
KITTI3D |
150000x4 |
32.55 |
111.83 |
CenterPoint
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
CenterPoint |
SequentialBottleNeck |
58.32 |
58.11 |
58.14 |
nuscenes |
1x5x20x40000, 40000x4 |
23.38 |
101.10 |
LidarMultiTask
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
LidarMultiTask |
MixVarGENet |
58.09 |
57.72 |
57.62 |
nuscenes |
1x5x20x40000, 40000x4 |
22.07 |
107.76 |
注解
PointPillars 的指标是 Box3d Moderate
这项。
10.1.5.6. Lane Detection¶
GaNet
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
GaNet |
MixVarGENet |
79.49 |
78.72 |
78.72 |
CuLane |
1x3x320x800 |
1.053 |
2420.32 |
10.1.5.7. Multiple Object Track¶
Motr
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
Motr |
efficientnetb3 |
58.02 |
57.62 |
57.76 |
Mot17 |
1x3x800x1422, 1x256x2x128, 1x1x1x256, 1x4x2x128 |
26.40 |
75.76 |
10.1.5.8. Binocular depth estimation¶
StereoNet
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
StereoNet |
StereoNeck |
1.1270 |
1.1677 |
1.1685 |
SceneFlow |
1x6x540x960 |
28.33 |
69.56 |
StereoNetPlus |
MixVarGENet |
1.1270 |
1.1329 |
1.1351 |
SceneFlow |
2x3x544x960 |
6.50 |
325.68 |
10.1.5.9. Bev¶
BevIPM
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
BevIPM |
efficientnetb0 |
30.59 |
30.80 |
30.41 |
nuscenes det |
6x3x512x960, 6x128x128x2 |
9.76 |
205.27 |
BevIPM |
efficientnetb0 |
51.47 |
51.41 |
50.98 |
nuscenes seg |
6x3x512x960, 6x128x128x2 |
9.76 |
205.27 |
BevLSS |
efficientnetb0 |
30.09 |
30.05 |
30.01 |
nuscenes det |
6x3x256x704, 10x128x128x2, 10x128x128x2 |
7.35 |
286.00 |
BevLSS |
efficientnetb0 |
51.78 |
51.47 |
51.46 |
nuscenes seg |
6x3x256x704, 10x128x128x2, 10x128x128x2 |
7.35 |
286.00 |
BevGKT |
MixVarGENet |
28.11 |
28.12 |
27.90 |
nuscenes det |
6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2 |
23.07 |
86.49 |
BevGKT |
MixVarGENet |
48.53 |
48.02 |
48.37 |
nuscenes seg |
6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2 |
23.07 |
86.49 |
BevIPM4D |
efficientnetb0 |
37.24 |
37.19 |
37.31 |
nuscenes det |
6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x2 |
10.05 |
191.32 |
BevIPM4D |
efficientnetb0 |
52.90 |
53.80 |
53.86 |
nuscenes seg |
6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x2 |
10.05 |
191.32 |
Detr3d |
efficientnetb3 |
33.04 |
32.78 |
32.83 |
nuscenes det |
6x3x512x1408, 6x2x4x256, 6x2x4x256, 6x2x4x256, 6x2x4x256, 1x24x4x256 |
68.54 |
27.00 |
PETR |
efficientnetb3 |
37.60 |
37.32 |
37.31 |
nuscenes det |
6x3x512x1408, 1x256x96x44 |
228.50 |
8.46 |
BevCFT |
efficientnetb3 |
32.93 |
32.68 |
32.63 |
nuscenes det |
6x3x512x1408 |
53.91 |
37.10 |
10.1.5.10. Keypoint Detection¶
HeatmapKeypointModel
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
HeatmapKeypointModel |
efficientnetb0 |
94.33 |
94.30 |
94.31 |
carfusion |
1x3x128x128 |
0.83 |
3260.03 |
10.1.5.11. Trajectory Prediction¶
DenseTNT
network |
backbone |
float |
qat |
quantization |
dataset |
input shape |
bpu latency (ms) |
FPS (dual core) |
---|---|---|---|---|---|---|---|---|
DenseTNT |
vectornet |
1.2974 |
1.2989 |
1.3038 |
argoverse 1 |
30x9x19x32, 30x11x9x64, 30x1x1x96, 30x2x1x2048, 30x1x1x2048 |
26.45 |
86.50 |