TensorFlow 2.x Version (vai_q_tensorflow2)

Installing vai_q_tensorflow2

You can install vai_q_tensorflow2 in the following two ways:

Install Using Docker Container

Vitis AI provides a Docker container for quantization tools, including vai_q_tensorflow. After running a container, activate the Conda environment vitis-ai-tensorflow2.

conda activate vitis-ai-tensorflow2

If there is a patch package, install the vitis-ai-tensorflow2 patch package inside the Docker container.

# [optional]
$ sudo env CONDA_PREFIX=/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/ PATH=/opt/vitis_ai/conda/bin:$PATH conda install patch_package.tar.bz2

Install from Source Code with the Wheel Package

vai_q_tensorflow2 is a fork of TensorFlow Model Optimization Toolkit. It is open source in Vitis_AI_Quantizer. To build vai_q_tensorflow2, run the following command:

$ sh build.sh
$ pip install pkgs/*.whl

Install from Source Code with the Conda Package

IMPORTANT: This requires Anaconda.


# CPU-only version
$ conda build vai_q_tensorflow2_cpu_feedstock --output-folder ./conda_pkg/
# GPU version
$ conda build vai_q_tensorflow2_gpu_feedstock --output-folder ./conda_pkg/
# Install conda package on your machine
$ conda install --use-local ./conda_pkg/linux-64/*.tar.bz2

Running vai_q_tensorflow2

The TensorFlow2 quantizer supports two different approaches to quantize a deep learning model:

Post-training quantization (PTQ): PTQ is a technique to convert a pre-trained float model into a quantized model with little degradation in model accuracy. A representative dataset is needed to run a few batches of inference on the float model to obtain the distributions of the activations. This is also called quantize calibration.
Quantization aware training (QAT): QAT models the quantization errors in both the forward and backward passes during model quantization. For QAT, starting from a float-point pre-trained model with good accuracy is recommended over starting from scratch.

Preparing the Float Model and Calibration Set

Before running vai_q_tensorflow2, prepare the float model and calibration set, including the files listed in the following table.

Table 1. Input Files for vai_q_tensorflow2
No.	Name	Description
1	float model	Floating-point TensorFlow 2 models, either in h5 format or saved model format.
2	calibration dataset	A subset of the training dataset or validation dataset to represent the input data distribution, usually 100 to 1000 images are enough.

Quantizing Using the vai_q_tensorflow2 API

The following code shows how to do post-training quantization with vai_q_tensorflow2 API. You can find a full example here.


float_model = tf.keras.models.load_model(‘float_model.h5’)
from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(float_model)
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_step=100, calib_batch_size=10)

calib_dataset: "calib_dataset" is used as a representative calibration dataset for calibration. You can use full or part of the eval_dataset, train_dataset, or other datasets.
calib_steps: calib_steps is the total number of steps for calibration. It has a default value of None. If "calib_dataset" is a tf.data dataset, generator, or keras.utils.Sequence instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs.
calib_batch_size: calib_batch_size is the number of samples per batch for calibration. If the "calib_dataset" is in the form of a dataset, generator, or keras.utils.Sequence instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of a numpy.array object, the default batch size is 32.

vai_q_tensorflow2 Fast Finetuning

Generally, there is a small accuracy loss after quantization, but for some networks such as MobileNets, the accuracy loss can be large. Fast finetuning uses the AdaQuant algorithm to adjust the weights and quantize parameters layer-by-layer with the unlabeled calibration dataset to improve accuracy for some models. It takes longer than normal PTQ (still much shorter than QAT as the calib_dataset is smaller than the training dataset). Fast finetuning is disabled, by default. It can be turned on to improve the performance if you meet accuracy issues. A recommended workflow is to first try PTQ without fast finetuning and then try quantization with fast finetuning if the accuracy is not acceptable. QAT is another method to improve the accuracy, but it takes more time and needs the training dataset. You can activate fast finetuning by setting include_fast_ft=True during post-training quantization.

quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_step=None, calib_batch_size=None, include_fast_ft=True, fast_ft_epochs=10)

Here,

include_fast_ft indicates whether to do fast finetuning or not.
fast_ft_epochs indicates the number of finetuning epochs for each layer.

Saving the Quantized Model

The quantized model object is a standard tf.keras model object. You can save it by running the following command:

quantized_model.save('quantized_model.h5')

The generated quantized_model.h5 file can be fed to the vai_c_tensorflow compiler and then deployed on the DPU.

(Optional) Evaluating the Quantized Model

If you have scripts to evaluate float models, like the models in Xilinx Model Zoo, you can replace the float model file with the quantized model for evaluation. To support the customized quantize layers, the vitis_quantize module should be imported, for example:

from tensorflow_model_optimization.quantization.keras import vitis_quantize quantized_model = tf.keras.models.load_model('quantized_model.h5')

After that, evaluate the quantized model just as the float model, for example:

quantized_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
	metrics= keras.metrics.SparseTopKCategoricalAccuracy())
quantized_model.evaluate(eval_dataset)

(Optional) Dumping the Simulation Results

Sometimes after deploying the quantized model, it is necessary to compare the simulation results on the CPU/GPU and the output values on the DPU. You can use the VitisQuantizer.dump_model API of vai_q_tensorflow2 to dump the simulation results with the quantized model.

from tensorflow_model_optimization.quantization.keras import vitis_quantize quantized_model = keras.models.load_model('./quantized_model.h5') vitis_quantize.VitisQuantizer.dump_model(quantized_model, 
                                         dump_dataset, 
                                         output_dir='./dump_results')

Note: The batch_size of the dump_dataset should be set to 1 for DPU debugging.

Dump results are generated in ${dump_output_dir} after the command has successfully executed. Results for weights and activation of each layer are saved separately in the folder. For each quantized layer, results are saved in *.bin and *.txt formats. If the output of the layer is not quantized (such as for the softmax layer), the float activation results are saved in the *_float.bin and *_float.txt files. The / symbol is replaced by _ for simplicity. Examples for dumping results are shown in the following table.

Table 2. Example of Dumping Results
Batch No.	Quantized	Layer Name	Saved files
Batch No.	Quantized	Layer Name	Weights	Biases	Activation
1	Yes	resnet_v1_50/conv1	{output_dir}/dump_results_weights/quant_resnet_v1_50_conv1_kernel.bin {output_dir}/dump_results_weights/quant_resnet_v1_50_conv1_kernel.txt	{output_dir}/dump_results_weights/quant_resnet_v1_50_conv1_bias.bin {output_dir}/dump_results_weights/quant_resnet_v1_50_conv1_bias.txt	{output_dir}/dump_results_0/quant_resnet_v1_50_conv1.bin {output_dir}/dump_results_0/quant_resnet_v1_50_conv1.txt
2	No	resnet_v1_50/softmax	N/A	N/A	{output_dir}/dump_results_0/quant_resnet_v1_50_softmax_float.bin {output_dir}/dump_results_0/quant_resnet_v1_50_softmax_float.txt

vai_q_tensorflow2 Quantization Aware Training

Generally, there is a small accuracy loss after quantization but for some networks such as MobileNets, the accuracy loss can be large. In this situation, quantization aware training (QAT) can be used to further improve the accuracy of quantized models.

QAT is similar to the float model training/finetuning except that vai_q_tensorflow2 rewrites the float graph to convert it to a quantized model before the training starts. The typical workflow is as follows. You can find a complete example here.

Preparing the float model, dataset, and training scripts:

Before QAT, prepare the following files:

Table 3. Input Files for vai_q_tensorflow2 QAT
No.	Name	Description
1	Float model	Floating-point model files to start from. Can be omitted if training from scratch.
2	Dataset	The training dataset with labels.
3	Training Scripts	The Python scripts to run float train/finetuning of the model.

(Optional) Evaluate the float model.

Evaluate the float model first before QAT to check the correctness of the scripts and dataset. The accuracy and loss values of the float checkpoint can also be a baseline for QAT.

Modify the training scripts and run QAT.

Use the vai_q_tensorflow2 API, VitisQuantizer.get_qat_model, to convert the model to a quantized model and then proceed to training/finetuning with it. The following is an example:


model = tf.keras.models.load_model(‘float_model.h5’)


# *Call Vai_q_tensorflow2 api to create the quantize training model
from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(model, quantize_strategy='8bit_tqt')
qat_model = quantizer.get_qat_model(
    init_quant=True, # Do init PTQ quantization will help us to get a better initial state for the quantizers, especially for `8bit_tqt` strategy. Must be used together with calib_dataset
    calib_dataset=calib_dataset)

# Then run the training process with this qat_model to get the quantize finetuned model.
# Compile the model
model.compile(
        optimizer= RMSprop(learning_rate=lr_schedule), 
        loss=tf.keras.losses.SparseCategoricalCrossentropy(),
        metrics=keras.metrics.SparseTopKCategoricalAccuracy())


# Start the training/finetuning
model.fit(train_dataset)

Note: Vitis AI 1.4 supports 8bit_tqt. It uses trained threshold in quantizers and may result in better results for QAT. By default, the Straight-Through-Estimator is used. 8bit_tqt strategy should only be used in QAT with 'init_quant=True' to get best performance. Initialization with PTQ quantization can generate a better initial state for quantizer parameters, especially for 8bit_tqt. Otherwise, the training may not converge.

Save the model.

Call model.save() to save the trained model or use callbacks in model.fit() to save the model periodically. For example:

# save model manually
model.save(‘trained_model.h5’)

# save the model periodically during fit using callbacks
model.fit(
	train_dataset, 
	callbacks = [
      		keras.callbacks.ModelCheckpoint(
          	filepath=’./quantize_train/’
          	save_best_only=True,
          	monitor="sparse_categorical_accuracy",
          	verbose=1,
      )])

Convert to deployable quantized model.

Modify the trained/finetuned model to meet the compiler requirements. For example, if "train_with_bn" is set to TRUE, it means that the bn layers and the dropout layers are not folded during training and must be folded before deployment. Some of the quantizer parameters may vary during training and exceed the compiler permitted ranges. These must be corrected before deployment.

A get_deploy_model() function is provided to perform these conversions and generate a deployable model as shown in the following example.
```
quantized_model = vitis_quantizer.get_deploy_model(model) quantized_model.save('quantized_model.h5') 
```

(Optional) Evaluate the quantized model

Call model.evaluate() on the eval_dataset to evaluate the quantized model, just like evaluation of the float model.


from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantized_model = tf.keras.models.load_model('quantized_model.h5')

quantized_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
        metrics= keras.metrics.SparseTopKCategoricalAccuracy())
quantized_model.evaluate(eval_dataset)

Note: Use the float model training and finetuning before proceeding to QAT.

Quantizing with Custom Layers

vai_q_tensorflow2 provides interfaces to load the custom layers that are available in some models. For example:


class MyCustomLayer(keras.layers.Layer):

    def __init__(self, units=32, **kwargs):
        super(MyLayer, self).__init__(kwargs)
        self.units = units


    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True,
            name='w')
        self.b = self.add_weight(
            shape=(self.units,), initializer="zeros", trainable=True, name='b')


    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b


    def get_config(self):
        base_config = super(MyLayer, self).get_config()
        config = {"units": self.units}
        return dict(list(base_config.items()) + list(config.items()))


# Here is a float model with custom layer "MyCustomLayer", use custom_objects argument in tf.keras.models.load_model to load it.
float_model = tf.keras.models.load_model(‘float_model.h5’, custom_objects={'MyCustomLayer': MyCustomLayer})

Here, a float model contains a custom layer named "MyCustomLayer". To load it into memory, use the custom_objects argument in the tf.keras.model.load_model API. Similarly, the VitisQuantizer class provides the 'custom_objects' argument to handle the custom layers. The following code is an example.


from tensorflow_model_optimization.quantization.keras import vitis_quantize
# Register the custom layer to VitisQuantizer by custom_objects argument.
quantizer = vitis_quantize.VitisQuantizer(float_model, custom_objects={'MyCustomLayer': MyCustomLayer})
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_step=100, calib_batch_size=10)

You can find a complete example here.

With the default quantize strategy, the custom layers are not quantized and continue to exist as a float model during the quantization as they are not in the list of supported APIs of vai_q_tensorflow2. An interface named 'custom_quantize_strategy' is provided for advanced users to build custom quantize strategies to run quantize experiments. The custom quantize strategy is a Dict object containing the quantize strategy items as a JSON file of the Dict.

The default quantize strategy provides an example of the quantize strategy. The custom quantize strategy follows the same format. However, the same item in the custom quantize strategy will override the one in the default strategy, but new items will be added to the quantize strategy.

With this feature, you can quantize the 'MyCustomLayer' layer from the previous example:


# Define quantizer with custom quantize strategy, which quantizes w,b and outputs 0 of MyCustomLayer objects.
my_quantize_strategy = {
    "quantize_registry_config": {
        "layer_quantize_config": [{
            "layer_type": "__main__.MyCustomLayer",
            "quantizable_weights": ["w", "b"],
            "weight_quantizers": [
                "quantizer_type": "LastValueQuantPosQuantizer","quantizer_params": {"bit_width": 8, "method": 1, "round_mode": 0}, 
                "quantizer_type": "LastValueQuantPosQuantizer", "quantizer_params": {"bit_width": 8, "method": 1, "round_mode": 0}
            ],
            "quantizable_outputs": ["0"],
            "output_quantizers": [
                "quantizer_type": "LastValueQuantPosQuantizer", "quantizer_params": {"bit_width": 8, "method": 1, "round_mode": 1}
            ]
        }]
    }
}
quantizer = vitis_quantize.VitisQuantizer(model, custom_objects={'MyLayer': MyLayer}, custom_quantize_strategy=my_quantize_strategy)


# The following quantization process are all the same as before, here we do normal PTQ as an example
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_step=100, calib_batch_size=10)

vai_q_tensorflow2 Supported Operations and APIs

The following table lists the supported operations and APIs for vai_q_tensorflow2.

Table 4. vai_q_tensorflow2 Supported Layers
Layer Types	Supported Layers	Description
Core	tf.keras.layers.InputLayer
Core	tf.keras.layers.Dense
Core	tf.keras.layers.Activation	If 'activation' is 'relu' or 'linear', will be quantized. If 'activation' is 'sigmoid' or 'swish', will be converted to hard-sigmoid or hard-swish and then be quantized. Otherwise will not be quantized.
Convolution	tf.keras.layers.Conv2D
Convolution	tf.keras.layers.DepthwiseConv2D
Convolution	tf.keras.layers.Conv2DTranspose
Pooling	tf.keras.layers.AveragePooling2D
Pooling	tf.keras.layers.MaxPooling2D
Pooling	tf.keras.layers.GlobalAveragePooling
Normalization	tf.keras.layers.BatchNormalization	By default, BatchNormalization layers are fused with the previous convolution layers. If they cannot be fused, they are converted to depthwise convolutions. In the QAT mode, BatchNormalization layers are pseudo fused if train_with_bn is set to TRUE. They are fused when the get_deploy_model function is called.
Regularization	tf.keras.layers.Dropout	By default, the dropout layers are removed. In the QAT mode, dropout layers are retained if remove_dropout is set FALSE. It is removed when the get_deploy_model function is called.
Reshaping	tf.keras.layers.Reshape
Reshaping	tf.keras.layers.Flatten
Reshaping	tf.keras.UpSampling2D
Reshaping	tf.keras.ZeroPadding2D
Merging	tf.keras.layers.Concatenate
Merging	tf.keras.layers.Add
Merging	tf.keras.layers.Muliply
Activation	tf.keras.layers.ReLU
Activation	tf.keras.layers.Softmax	The input for the Softmax layer is quantized. It can run on the standalone Softmax IP for acceleration.
Activation	tf.keras.layers.LeakyReLU	Only 'alpha'=0.1 is supported on the DPU. For other values, the model is not quantized and mapped to the CPU.

vai_q_tensorflow2 Usage

vitis_quantize.VitisQuantizer

The construction function of class VitisQuantizer.


vitis_quantize.VitisQuantizer(
    float_model, 
    quantize_strategy='8bit', 
    custom_quantize_strategy=None, 
    custom_objects={})

Arguments

model: A tf.keras.Model object, containing the configurations for quantization.
quantize_strategy: A string object of the quantize strategy type. Available values are 8bit and 8bit_tqt. 8bit is the default strategy that uses the Straight-Through-Estimator. 8bit_tqt is a new strategy introduced in Vitis AI 1.4 which uses Trained-Threshold in quantizers and may better results for QAT.
Note: 8bit_tqt strategy should only be used in QAT and be used together with init_quant=True to get the best performance.
custom_quantize_strategy: A string object, the file path of custom quantize strategy JSON file.
custom_objects: A Dict object, mapping names (strings) to custom classes or functions.

vitis_quantize.VitisQuantizer.quantize_model

This function performs the post-training quantization (PTQ) of the float model, including model optimization, weights quantization, and activation quantize calibration.


vitis_quantize.VitisQuantizer.quantize_model(
    calib_dataset=None,
    calib_batch_size=None,
    calib_steps=None,
    verbose=0,
    fold_conv_bn=True,
    fold_bn=True,
    replace_sigmoid=True,
    replace_relu6=True,
    include_cle=True,
    cle_steps=10,
    forced_cle=False,
    include_fast_ft=False,
    fast_ft_epochs=10)

Arguments

calib_dataset: A tf.data.Dataset, keras.utils.Sequence, or np.numpy object, the representative dataset for calibration. You can use full or part of eval_dataset, train_dataset, or other datasets as calib_dataset.
calib_steps: An int object, the total number of steps for calibration. Ignored with the default value of None. If "calib_dataset" is a tf.data dataset, generator, or keras.utils.Sequence instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs.
calib_batch_size: An int object, the number of samples per batch for calibration. If the "calib_dataset" is in the form of a dataset, generator, or keras.utils.Sequence instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of a numpy.array object, the default batch size is 32.
fold_conv_bn: A bool object, whether to fold the batch norm layers into previous Conv2D/DepthwiseConv2D/TransposeConv2D/Dense layers.
fold_bn: A bool object whether to convert the standalone batch norm layer into DepthwiseConv2D layers.
replace_sigmoid: A bool object, whether to replace the Activation(activation='sigmoid') layers into hard sigmoid layers and do quantization. If not, the sigmoid layers will be left unquantized and will be scheduled on CPU.
replace_relu6: A bool object, whether to replace the ReLU6 layers with ReLU layers.
include_cle: A bool object, whether to do Cross-Layer Equalization before quantization.
cle_steps: A int object, the iteration steps to do Cross-Layer Equalization.
forced_cle: A bool object, whether to do forced Cross-Layer Equalization for ReLU6 layers.
include_fast_ft: A bool object, whether to do fast fine-tuning or not. Fast fine-tuning adjust the weights layer by layer with calibration dataset and may get better accuracy for some models. Fast fine-tuning is disabled by default. It takes longer than normal PTQ (still much shorter than QAT as calib_dataset is much smaller than the training dataset). Turn on to improve the performance if you meet accuracy issues.
fast_ft_epochs: An int object, the iteration epochs to do fast fine-tuning for each layer.

vitis_quantize.VitisQuantizer.dump_model

This function dumps the simulation results of the quantized model, including weights and activation results.


vitis_quantize.VitisQuantizer.dump_model(
    model,
    dataset=None,
    output_dir=’./dump_results’,
    dump_float=False,
    weights_only=False)

Arguments

model: A tf.keras.Model object, the quantized model to dump.
dataset: A tf.data.Dataset, keras.utils.Sequence or np.numpy object, the dataset used to dump, not needed if weights_only is set to True.
output_dir: A string object, the directory to save the dump results.
weights_only: A bool object, set to True to only dump the weights, set to False will also dump the activation results.

vitis_quantize.VitisQuantizer.get_qat_model

This function gets the float model for QAT.


vitis_quantize.VitisQuantizer.get_qat_model(
    init_quant=False,
    calib_dataset=None,
    calib_batch_size=None,
    calib_steps=None,
    train_with_bn=False,
    freeze_bn_delay=-1,
    replace_sigmoid=True,
    replace_relu6=True,
    include_cle=True,
    cle_steps=10,
    forced_cle=False)

Arguments

init_quant: A bool object to notify whether or not to run initial quantization before QAT. Running an initial PTQ quantization yields an improved initial state for the quantizer parameters, especially for 8bit_tqt strategy. Otherwise, the training may not converge.
calib_dataset: A tf.data.Dataset, keras.utils.Sequence or np.numpy object, the representative dataset for calibration. Must be set when "init_quant" is set True. You can use full or part of eval_dataset, train_dataset or other datasets as calib_dataset.
calib_steps: An int object, the total number of steps for initial PTQ. Ignored with the default value of None. If "calib_dataset" is a tf.data dataset, generator or keras.utils.Sequence instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs.
calib_batch_size: An int object, the number of samples per batch for initial PTQ. If the "calib_dataset" is in the form of a dataset, generator or keras.utils.Sequence instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of a numpy.array object, the default batch size is 32.
train_with_bn: A bool object, whether to keep bn layers during QAT.
freeze_bn_delay: An int object, the train steps before freezing the bn parameters. Default value is -1, which means never do bn freezing.
replace_sigmoid: A bool object, whether to replace the Activation(activation='sigmoid') layers into hard sigmoid layers and do quantization. If not, the sigmoid layers will be left unquantized and will be scheduled on CPU.
replace_relu6: A bool object, whether to replace the Relu6 layers with Relu layers.
include_cle: A bool object, whether to do Cross Layer Equalization before quantization.
cle_steps: An int object, the iteration steps to do Cross Layer Equalization.
forced_cle: A bool object, whether to do forced Cross Layer Equalization for relu6 layers.

vitis_quantize.VitisQuantizer.get_deploy_model

This function converts the QAT model and generates the deployable model. The results can be fed into the vai_c_tensorflow compiler.

vitis_quantize.VitisQuantizer.get_deploy_model(model)

Arguments

model: A tf.keras.Model object, the QAT model to deploy.

Examples

Quantize

from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(model)
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset)

Evaluate the Quantized Model

quantized_model.compile(loss=your_loss, metrics=your_metrics)
quantized_model.evaluate(eval_dataset)

Load the Quantized Model

from tensorflow_model_optimization.quantization.keras import vitis_quantize
with vitis_quantize.quantize_scope():
    model = keras.models.load_model('./quantized_model.h5')

Dump the Quantized Model

from tensorflow_model_optimization.quantization.keras import vitis_quantize
with vitis_quantize.quantize_scope():
    quantized_model = keras.models.load_model('./quantized_model.h5')
    vitis_quantize.VitisQuantizer.dump_model(quantized_model, dump_dataset)