简单来说:模型量化(Model Quantization)就是通过某些方法将浮点模型转为定点模型。
一般通过Tensorflow、PyTorch、Caffe等工具训练出的模型权重等都是float32类型,其模型精度较高但模型尺寸较大,在一些内存受限的场景,需要尽量减小模型尺寸,模型量化就是利用一些方法将浮点模型转为定点模型,这样内存占用可以减少数倍,运算速度也有较大提升,并且精度只有稍许损失,可以把模型量化理解为一种有损压缩,虽然会损失一些精度,但是会显著降低模型尺寸,同时提高运行效率。
如下表:
参数量 | 计算速度 | 内存占用 | 精度对比 | |
---|---|---|---|---|
量化前 | 参数量大 | 计算量大 | 内存占用多 | 精度高 |
量化后 | 压缩参数 | 提升速度 | 降低内存占用 | 精度有损失 |
在嵌入式AI场景,其对内存以及计算速度有较高要求,而可以接受一些精度损失,那么就可以使用模型量化技术来降低模型的复杂性。
TensorFlow提供两种量化方式:
其区别如下:
训练后的量化技术迭代快,易于使用,但是模型精度损失较大;
训练中的量化技术相对难于使用,需要重新训练模型,但是模型精度保持较好。
用户可以在使用难易程度、迭代时间、模型精度之间权衡,选择适合的一种量化方式。
训练后量化比较简单一些,这篇文档以mnist为例讲述训练后量化方法(训练中量化以后有机会再写文档)。
使用以下脚本,训练获得mnist_train.h5模型
# mnist_train.py
import tensorflow as tf
from tensorflow import keras
print("TensorFlow version {}".format(tf.__version__))
(train_images, train_labels),(test_images, test_labels) = tf.keras.datasets.mnist.load_data()
class_num = 10
# Train the model
model = tf.keras.models.Sequential([
keras.layers.Flatten(input_shape=(train_images.shape[1], train_images.shape[2])), # input_shape=(28,28)
keras.layers.Dense(512, activation=tf.nn.relu),
keras.layers.Dense(64, activation=tf.nn.relu),
keras.layers.Dense(class_num, activation=tf.nn.softmax)
])
model.compile(optimizer='Adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
print("model stucture:")
model.summary()
# train
model.fit(train_images, train_labels, epochs=5, batch_size=64)
# evaluate accuracy
loss, acc = model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}% loss: {}".format(100*acc, loss))
# Convert the model to tflite
model.save('mnist_train.h5')
del model # 删除现有模型
日志如下:
TensorFlow version 2.12.0
model stucture:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
dense (Dense) (None, 512) 401920
dense_1 (Dense) (None, 64) 32832
dense_2 (Dense) (None, 10) 650
=================================================================
Total params: 435,402
Trainable params: 435,402
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
938/938 [==============================] - 3s 3ms/step - loss: 1.4964 - accuracy: 0.7708
Epoch 2/5
938/938 [==============================] - 3s 3ms/step - loss: 0.5160 - accuracy: 0.8773
Epoch 3/5
938/938 [==============================] - 3s 3ms/step - loss: 0.2994 - accuracy: 0.9254
Epoch 4/5
938/938 [==============================] - 3s 3ms/step - loss: 0.1852 - accuracy: 0.9519
Epoch 5/5
938/938 [==============================] - 3s 3ms/step - loss: 0.1400 - accuracy: 0.9615
313/313 [==============================] - 1s 2ms/step - loss: 0.1657 - accuracy: 0.9590
Restored model, accuracy: 95.90% loss: 0.17
最终获得mnist_train.h5模型。
1. 混合量化:
# mnist_quant_hybrid.py
import tensorflow as tf
from tensorflow import keras
# load h5模型,并评估其精度
(train_images, train_labels),(test_images, test_labels) = tf.keras.datasets.mnist.load_data()
model = keras.models.load_model('mnist_train.h5') # 创建 HDF5 文件 'mnist_train.h5'
# evaluate accuracy
loss, acc = model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}% loss: {:5.2f}".format(100*acc, loss))
tflite_mnist_model = 'mnist_quant_hybrid.tflite'
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
flatbuffer_size = open(tflite_mnist_model, "wb").write(tflite_model)
print('hybrid: The size of the converted flatbuffer is: %d bytes' % flatbuffer_size)
# 评估量化后模型的准确度
#在PC python中测试tf lite 模型的准确率
def evaluate(interpreter_path):
#加载模型并分配张量
interpreter = tf.lite.Interpreter(model_path=interpreter_path)
interpreter.allocate_tensors()
#获得输入输出张量.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
import numpy as np
index = input_details[0]['index']
shape = input_details[0]['shape']
acc_count = 0
image_count = test_images.shape[0]
for i in range(image_count):
interpreter.set_tensor(index, test_images[i].reshape(shape).astype("float32"))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
label = np.argmax(output_data)
if label == test_labels[i]:
acc_count += 1
print("test_images accuracy is {:.2%}".format(acc_count/(image_count)))
evaluate(tflite_mnist_model)
2. 全整形量化:
# mnist_quant_full_integer.py
import tensorflow as tf
from tensorflow import keras
import numpy as np
print("TensorFlow version {}".format(tf.__version__))
# load h5模型,并评估其精度
(train_images, train_labels),(test_images, test_labels) = tf.keras.datasets.mnist.load_data()
model = keras.models.load_model('mnist_train.h5') # 创建 HDF5 文件 'mnist_train.h5'
# evaluate accuracy
loss, acc = model.evaluate(test_images, test_labels)
print("Restored model, accuracy: {:5.2f}% loss: {:5.2f}".format(100*acc, loss))
tflite_mnist_model = 'mnist_quant_full_integer.tflite'
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_data_gen():
for image in train_images[0:100,:,:]:
yield[image.reshape(-1,train_images.shape[1],train_images.shape[2]).astype("float32")]
# 设置representative_dataset
converter.representative_dataset = representative_data_gen
# 设置ops量化类型
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# 设置模型输入输出数据格式
converter.inference_input_type = tf.uint8 # or tf.uint8
converter.inference_output_type = tf.uint8 # or tf.uint8
tflite_model = converter.convert()
flatbuffer_size = open(tflite_mnist_model, "wb").write(tflite_model)
print('full_integer: The size of the converted flatbuffer is: %d bytes' % flatbuffer_size)
# 评估量化后模型的准确度
#在PC python中测试tf lite 模型的准确率
def evaluate(interpreter_path):
#加载模型并分配张量
interpreter = tf.lite.Interpreter(model_path=interpreter_path)
interpreter.allocate_tensors()
#获得输入输出张量.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
import numpy as np
index = input_details[0]['index']
shape = input_details[0]['shape']
acc_count = 0
image_count = test_images.shape[0]
for i in range(image_count):
interpreter.set_tensor(index, test_images[i].reshape(shape).astype("uint8"))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
label = np.argmax(output_data)
if label == test_labels[i]:
acc_count += 1
print("test_images accuracy is {:.2%}".format(acc_count/(image_count)))
evaluate(tflite_mnist_model)
这样得到:
h5: 5258320 bytes accuracy: 95.80%
hybrid: The size of the converted flatbuffer is: 441576 bytes accuracy is 95.78%
full_integer: The size of the converted flatbuffer is: 440312 bytes accuracy is 95.69%
参考: