10.1.5. Modelzoo

10.1.5.1. Classification

network

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

MobileNetV1

74.12

73.92

73.61

ImageNet

1x3x224x224

0.77

3547.90

MobileNetV2

72.65

72.51

72.11

ImageNet

1x3x224x224

0.69

4191.89

ResNet 18

72.04

72.03

72.03

ImageNet

1x3x224x224

1.53

1496.13

ResNet 50

77.37

76.99

76.94

ImageNet

1x3x224x224

3.06

682.83

VargNetV2

73.94

73.56

73.64

ImageNet

1x3x224x224

0.78

3513.86

EfficientNet-B0

74.31

74.23

74.18

ImageNet

1x3x224x224

0.91

2854.65

SwinTransformer

80.24

80.15

80.05

ImageNet

1x3x224x224

14.50

140.48

MixVarGENet

71.33

71.23

71.04

ImageNet

1x3x224x224

0.56

5772.66

VargConvert

78.98

78.92

78.89

ImageNet

1x3x224x224

1.51

3513.86

EfficieNasNetm

80.24

79.99

79.94

ImageNet

1x3x300x300

2.06

1080.72

EfficieNasNets

76.63

76.23

76.03

ImageNet

1x3x280x280

1.00

2568.50

Torchvision(浮点模型来自社区):

network

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

ResNet 18

69.76

69.71

69.73

ImageNet

1x3x224x224

1.59

1492.23

ResNet 50

76.13

76.07

76.06

ImageNet

1x3x224x224

3.18

665.43

MobileNetV2

71.88

71.27

71.27

ImageNet

1x3x224x224

0.85

3494.91

10.1.5.2. Detection

RetinaNet

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

Retinanet-vargnetv2

vargnetv2

31.51

31.21

31.20

MS COCO

1x3x1024x1024

24.34

81.88

YOLOv3

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

YOLOv3-MobileNetv1

mobilenetv1

76.57

75.62

75.61

VOC

1x3x416x416

4.21

493.33

YOLOv3-VarGDarknet

VarGDarknet

33.90

33.60

33.36

COCO

1x3x416x416

6.51

307.22

FCOS

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

FCOS-efficientnet

efficientnetb0

36.26

35.79

35.59

MS COCO

1x3x512x512

1.35

1734.48

FCOS-efficientnet

efficientnetb1

41.37

41.21

40.71

MS COCO

1x3x640x640

2.78

786.97

FCOS-efficientnet

efficientnetb2

45.35

45.10

45.00

MS COCO

1x3x768x768

4.32

478.41

FCOS-efficientnet

efficientnetb3

48.03

47.65

47.58

MS COCO

1x3x896x896

7.14

282.20

DETR

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

DETR-resnet50

resnet50

35.70

31.42

31.31

MS COCO

1x3x800x1333

41.36

47.31

DETR-efficientnetb3

efficientnetb3

37.21

35.95

35.99

MS COCO

1x3x800x1333

32.33

62.14

FCOS3D

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

FCOS3D-efficientnetb0

efficientnetb0

30.62

30.27

30.38

nuscenes

1x3x512x896

3.51

630.70

10.1.5.3. Segmentation

UNet

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

UNet

MobileNetV1

68.02

67.56

67.53

Cityscapes

1x3x1024x2048

2.10

1014.32

Deeplab

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

Deeplab

EfficientNet-M0

76.30

76.22

76.12

Cityscapes

1x3x1024x2048

4.78

434.36

Deeplab

EfficientNet-M1

77.94

77.64

77.65

Cityscapes

1x3x1024x2048

11.23

179.65

Deeplab

EfficientNet-M2

78.82

78.65

78.63

Cityscapes

1x3x1024x2048

17.37

114.19

FastScnn

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

FastScnn

EfficientNet-B0lite

69.97

69.90

69.88

Cityscapes

1x3x1024x2048

1.92

1173.71

10.1.5.4. OpticalFlow

PwcNet

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

PwcNet-lg

PwcNet

1.4117

1.4112

1.4075

FlyingChairs

1x6x384x512

12.62

161.35

10.1.5.5. Lidar

PointPillars

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

PointPillars

SequentialBottleNeck

77.31

76.86

76.76

KITTI3D

150000x4

32.55

111.83

CenterPoint

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

CenterPoint

SequentialBottleNeck

58.32

58.11

58.14

nuscenes

1x5x20x40000, 40000x4

23.38

101.10

LidarMultiTask

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

LidarMultiTask

MixVarGENet

58.09

57.72

57.62

nuscenes

1x5x20x40000, 40000x4

22.07

107.76

注解

PointPillars 的指标是 Box3d Moderate 这项。

10.1.5.6. Lane Detection

GaNet

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

GaNet

MixVarGENet

79.49

78.72

78.72

CuLane

1x3x320x800

1.053

2420.32

10.1.5.7. Multiple Object Track

Motr

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

Motr

efficientnetb3

58.02

57.62

57.76

Mot17

1x3x800x1422, 1x256x2x128, 1x1x1x256, 1x4x2x128

26.40

75.76

10.1.5.8. Binocular depth estimation

StereoNet

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

StereoNet

StereoNeck

1.1270

1.1677

1.1685

SceneFlow

1x6x540x960

28.33

69.56

StereoNetPlus

MixVarGENet

1.1270

1.1329

1.1351

SceneFlow

2x3x544x960

6.50

325.68

10.1.5.9. Bev

BevIPM

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

BevIPM

efficientnetb0

30.59

30.80

30.41

nuscenes det

6x3x512x960, 6x128x128x2

9.76

205.27

BevIPM

efficientnetb0

51.47

51.41

50.98

nuscenes seg

6x3x512x960, 6x128x128x2

9.76

205.27

BevLSS

efficientnetb0

30.09

30.05

30.01

nuscenes det

6x3x256x704, 10x128x128x2, 10x128x128x2

7.35

286.00

BevLSS

efficientnetb0

51.78

51.47

51.46

nuscenes seg

6x3x256x704, 10x128x128x2, 10x128x128x2

7.35

286.00

BevGKT

MixVarGENet

28.11

28.12

27.90

nuscenes det

6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2

23.07

86.49

BevGKT

MixVarGENet

48.53

48.02

48.37

nuscenes seg

6x3x512x960, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2, 6x64x64x2

23.07

86.49

BevIPM4D

efficientnetb0

37.24

37.19

37.31

nuscenes det

6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x2

10.05

191.32

BevIPM4D

efficientnetb0

52.90

53.80

53.86

nuscenes seg

6x3x512x960, 6x128x128x2, 1x64x128x128, 1x128x128x2

10.05

191.32

Detr3d

efficientnetb3

33.04

32.78

32.83

nuscenes det

6x3x512x1408, 6x2x4x256, 6x2x4x256, 6x2x4x256, 6x2x4x256, 1x24x4x256

68.54

27.00

PETR

efficientnetb3

37.60

37.32

37.31

nuscenes det

6x3x512x1408, 1x256x96x44

228.50

8.46

BevCFT

efficientnetb3

32.93

32.68

32.63

nuscenes det

6x3x512x1408

53.91

37.10

10.1.5.10. Keypoint Detection

HeatmapKeypointModel

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

HeatmapKeypointModel

efficientnetb0

94.33

94.30

94.31

carfusion

1x3x128x128

0.83

3260.03

10.1.5.11. Trajectory Prediction

DenseTNT

network

backbone

float

qat

quantization

dataset

input shape

bpu latency (ms)

FPS (dual core)

DenseTNT

vectornet

1.2974

1.2989

1.3038

argoverse 1

30x9x19x32, 30x11x9x64, 30x1x1x96, 30x2x1x2048, 30x1x1x2048

26.45

86.50