Caffe Version - vai_p_caffe

Creating a Configuration File

Most vai_p_caffe tasks require a configuration file as an input argument. A typical configuration file is shown below:

workspace: "examples/decent_p/"
gpu: "0,1,2,3"
test_iter: 100
acc_name: "top-1"
 
model: "examples/decent_p/float.prototxt"
weights: "examples/decent_p/float.caffemodel"
solver: "examples/decent_p/solver.prototxt"
 
rate: 0.1
 
pruner {
  method: REGULAR
}

The definition for the terms used are:

workspace: Directory for saving temporary and output files.
gpu: Use the given GPU devices IDS separated by ',' for acceleration.
test_iter: The number of iterations to use in a test phase. A larger value improves the analysis results but increases the run time. The maximum value of this parameter is determined by the size of the validation dataset/batch_size, i.e, all data in the validation dataset will be used for testing.
acc_name: The accuracy measure used to determine the "goodness" of the model.
model: The model definition protocol buffer text file. If there are two separate model definition files used in training and testing, merge them into a single file.
weights: The model weights to be pruned.
solver: The solver definition protocol buffer text file used for finetuning.
rate: The weight reduction parameter sets the amount by which the number of computations is reduced relative to the baseline model. For example, with a setting of "0.1," the tool attempts to reduce the number of multiply-add operations by 10% relative to the baseline model.
method: Pruning method is used. Currently, REGULAR is the only valid value.

Performing Model Analysis

This is the first stage of the pruning process. This task attempts to find a suitable pruning strategy. Create a suitable configuration file named config.prototxt, as described in the previous section, and execute the following command:

$ ./vai_p_caffe ana –config config.prototxt

Starting Pruning Loop

Pruning can begin after the analysis task completed. The prune command uses the same configuration file:

$ ./vai_p_caffe prune –config config.prototxt

vai_p_caffe prunes the model using the rate parameter specified in the configuration file. Upon completion, the tool generates a report that includes the accuracy, the number of weights, and the required number of operations before and after pruning. The following figure shows a sample report.

A file named final.prototxt which describes the pruned network is generated in the workspace.

Finetuning the Pruned Model

Run the following command to recover the accuracy loss from pruning:

$ ./vai_p_caffe finetune -config config.prototxt

Finetuning a pruned model is essentially the same as training the model from scratch. The solver parameters such as initial learning rate, learning rate decay type, etc. may be different. A pruning iteration is composed of the prune and finetune tasks executed sequentially. In general, to achieve a greater weight reduction without significant accuracy loss, several pruning iterations must be performed.

The configuration file needs to be modified after every pruning iteration:

Increase the rate parameter relative to the baseline model.
Modify the weights parameter to the best model obtained in the previous finetuning step.

A modified configuration file is shown below:

workspace: "examples/decent_p/"
gpu: "0,1,2,3"
test_iter: 100
acc_name: "top-1"
 
model: "examples/decent_p/float.prototxt"
 
#weights: "examples/decent_p/float.caffemodel"
weights: "examples/decent_p/regular_rate_0.1/_iter_10000.caffemodel"
 
solver: "examples/decent_p/solver.prototxt"
 
# change rate from 0.1 to 0.2
#rate: 0.1
rate: 0.2
pruner {
  method: REGULAR
}

Generating the Final Model

After a few pruning iterations, a model with fewer weights is generated. The following transformation step is required to finalize the model:

$ ./vai_p_caffe transform –model float.prototxt –weights finetuned_model.caffemodel

If you fail to specify the name of the output file, a default file named transformed.caffemodel is generated. The corresponding model file is the final.prototxt generated by the prune command.

To get the FLOPs of a model, you can use the stat command:

$ ./vai_p_caffe stat –model final.prototxt

IMPORTANT: The transformation should only be executed after all pruning iterations have been completed.

vai_p_caffe Usage

The following arguments are available when running vai_p_caffe:

Table 1. vai_p_caffe Arguments
Argument	Attribute	Default	Description
ana
config	required	“”	The configuration file path.
prune
config	required	“”	The configuration file path.
finetune
config	required	“”	The configuration file path.
transform
model	required	“”	Baseline model definition protocol buffer text file
weights	required	“”	Model weights file path.
output	optional	“”	The output transformed weights.

Table 2. vai_p_caffe Configuration File Parameters
Argument	Type	Attribute	Default	Description
workspace	string	required	None	Directory for saving output files.
gpu	string	optional	“0”	GPU device IDs used for compression and fine-tuning, separated by ‘,’.
test_iter	int	optional	100	The number of iterations to run in test phase.
acc_name	string	required	None	The accuracy measure of interest. This parameter is the layer_top of the layer used to evaluate network performance. If the network has multiple evaluation metrics, choose the one that you think is most important. For classification tasks, this parameter may be top-1 or top-5 accuracy; for detection tasks, this parameter is generally mAP; for segmentation tasks, typically the layer for calculating mIOU is set here.
model	string	required	None	The model definition protocol buffer text file. If there are two different model definition files for training and testing, it is recommended to merge them into a single file.
weights	string	required	None	The trained weights to compress.
solver	string	required	None	The solver definition protocol buffer text file.
rate	float	optional	None	The expected model pruning ratio.
method	enum	optional	REGULAR	Pruning method to be used. Currently REGULAR is the only valid value.
ssd_ap_version	string	optional	None	The ap_version setting for SSD network compression. Must be one of 11point, MaxIntegral and Integral.
exclude	repeated	optional	None	Used to exclude some layers from pruning. You can use this parameter to prevent specified convolutional layers from being pruned.
kernel_batch	int	optional	2	The number of output channels is a multiple of this value after pruning.