Caffe Version - vai_p_caffe

Creating a Configuration File

Most vai_p_caffe tasks require a configuration file as an input argument. A typical configuration file is shown below:

workspace: "examples/decent_p/"
gpu: "0,1,2,3"
test_iter: 100
acc_name: "top-1"
 
model: "examples/decent_p/float.prototxt"
weights: "examples/decent_p/float.caffemodel"
solver: "examples/decent_p/solver.prototxt"
 
rate: 0.1
 
pruner {
  method: REGULAR
}

The definition for the terms used are:

workspace
Directory for saving temporary and output files.
gpu
Use the given GPU devices IDS separated by ',' for acceleration.
test_iter
The number of iterations to use in a test phase. A larger value improves the analysis results but increases the run time. The maximum value of this parameter is determined by the size of the validation dataset/batch_size, i.e, all data in the validation dataset will be used for testing.
acc_name
The accuracy measure used to determine the "goodness" of the model.
model
The model definition protocol buffer text file. If there are two separate model definition files used in training and testing, merge them into a single file.
weights
The model weights to be pruned.
solver
The solver definition protocol buffer text file used for finetuning.
rate
The weight reduction parameter sets the amount by which the number of computations is reduced relative to the baseline model. For example, with a setting of "0.1," the tool attempts to reduce the number of multiply-add operations by 10% relative to the baseline model.
method
Pruning method is used. Currently, REGULAR is the only valid value.

Performing Model Analysis

This is the first stage of the pruning process. This task attempts to find a suitable pruning strategy. Create a suitable configuration file named config.prototxt, as described in the previous section, and execute the following command:

$ ./vai_p_caffe ana –config config.prototxt
Figure 1: Model Analysis

Starting Pruning Loop

Pruning can begin after the analysis task completed. The prune command uses the same configuration file:

$ ./vai_p_caffe prune –config config.prototxt

vai_p_caffe prunes the model using the rate parameter specified in the configuration file. Upon completion, the tool generates a report that includes the accuracy, the number of weights, and the required number of operations before and after pruning. The following figure shows a sample report.

Figure 2: Pruning Report

A file named final.prototxt which describes the pruned network is generated in the workspace.

Finetuning the Pruned Model

Run the following command to recover the accuracy loss from pruning:

$ ./vai_p_caffe finetune -config config.prototxt

Finetuning a pruned model is essentially the same as training the model from scratch. The solver parameters such as initial learning rate, learning rate decay type, etc. may be different. A pruning iteration is composed of the prune and finetune tasks executed sequentially. In general, to achieve a greater weight reduction without significant accuracy loss, several pruning iterations must be performed.

The configuration file needs to be modified after every pruning iteration:

  1. Increase the rate parameter relative to the baseline model.
  2. Modify the weights parameter to the best model obtained in the previous finetuning step.

A modified configuration file is shown below:

workspace: "examples/decent_p/"
gpu: "0,1,2,3"
test_iter: 100
acc_name: "top-1"
 
model: "examples/decent_p/float.prototxt"
 
#weights: "examples/decent_p/float.caffemodel"
weights: "examples/decent_p/regular_rate_0.1/_iter_10000.caffemodel"
 
solver: "examples/decent_p/solver.prototxt"
 
# change rate from 0.1 to 0.2
#rate: 0.1
rate: 0.2
pruner {
  method: REGULAR
}

Generating the Final Model

After a few pruning iterations, a model with fewer weights is generated. The following transformation step is required to finalize the model:

$ ./vai_p_caffe transform –model float.prototxt –weights finetuned_model.caffemodel

If you fail to specify the name of the output file, a default file named transformed.caffemodel is generated. The corresponding model file is the final.prototxt generated by the prune command.

To get the FLOPs of a model, you can use the stat command:

$ ./vai_p_caffe stat –model final.prototxt
IMPORTANT: The transformation should only be executed after all pruning iterations have been completed.

vai_p_caffe Usage

The following arguments are available when running vai_p_caffe:

Table 1. vai_p_caffe Arguments
Argument Attribute Default Description
ana
config required “” The configuration file path.
prune
config required “” The configuration file path.
finetune
config required “” The configuration file path.
transform
model required “” Baseline model definition protocol buffer text file
weights required “” Model weights file path.
output optional “” The output transformed weights.
Table 2. vai_p_caffe Configuration File Parameters
Argument Type Attribute Default Description
workspace string required None Directory for saving output files.
gpu string optional “0” GPU device IDs used for compression and fine-tuning, separated by ‘,’.
test_iter int optional 100 The number of iterations to run in test phase.
acc_name string required None The accuracy measure of interest. This parameter is the layer_top of the layer used to evaluate network performance. If the network has multiple evaluation metrics, choose the one that you think is most important. For classification tasks, this parameter may be top-1 or top-5 accuracy; for detection tasks, this parameter is generally mAP; for segmentation tasks, typically the layer for calculating mIOU is set here.
model string required None The model definition protocol buffer text file. If there are two different model definition files for training and testing, it is recommended to merge them into a single file.
weights string required None The trained weights to compress.
solver string required None The solver definition protocol buffer text file.
rate float optional None The expected model pruning ratio.
method enum optional REGULAR Pruning method to be used. Currently REGULAR is the only valid value.
ssd_ap_version string optional None The ap_version setting for SSD network compression. Must be one of 11point, MaxIntegral and Integral.
exclude repeated optional None Used to exclude some layers from pruning. You can use this parameter to prevent specified convolutional layers from being pruned.
kernel_batch int optional 2 The number of output channels is a multiple of this value after pruning.