Caffe Version - vai_p_caffe
Creating a Configuration File
Most vai_p_caffe
tasks require a configuration file as an input
argument. A typical configuration file is shown below:
workspace: "examples/decent_p/"
gpu: "0,1,2,3"
test_iter: 100
acc_name: "top-1"
model: "examples/decent_p/float.prototxt"
weights: "examples/decent_p/float.caffemodel"
solver: "examples/decent_p/solver.prototxt"
rate: 0.1
pruner {
method: REGULAR
}
The definition for the terms used are:
- workspace
- Directory for saving temporary and output files.
- gpu
- Use the given GPU devices IDS separated by ',' for acceleration.
- test_iter
- The number of iterations to use in a test phase. A larger value improves the analysis results but increases the run time. The maximum value of this parameter is determined by the size of the validation dataset/batch_size, i.e, all data in the validation dataset will be used for testing.
- acc_name
- The accuracy measure used to determine the "goodness" of the model.
- model
- The model definition protocol buffer text file. If there are two separate model definition files used in training and testing, merge them into a single file.
- weights
- The model weights to be pruned.
- solver
- The solver definition protocol buffer text file used for finetuning.
- rate
- The weight reduction parameter sets the amount by which the number of computations is reduced relative to the baseline model. For example, with a setting of "0.1," the tool attempts to reduce the number of multiply-add operations by 10% relative to the baseline model.
- method
- Pruning method is used. Currently, REGULAR is the only valid value.
Performing Model Analysis
This is the first stage of the pruning process. This task attempts to find a suitable pruning strategy. Create a suitable configuration file named config.prototxt, as described in the previous section, and execute the following command:
$ ./vai_p_caffe ana –config config.prototxt
Starting Pruning Loop
Pruning can begin after the analysis task completed. The prune command uses the same configuration file:
$ ./vai_p_caffe prune –config config.prototxt
vai_p_caffe prunes the model using the rate parameter specified in the configuration file. Upon completion, the tool generates a report that includes the accuracy, the number of weights, and the required number of operations before and after pruning. The following figure shows a sample report.
A file named final.prototxt which describes the pruned network is generated in the workspace.
Finetuning the Pruned Model
Run the following command to recover the accuracy loss from pruning:
$ ./vai_p_caffe finetune -config config.prototxt
Finetuning a pruned model is essentially the same as training the model from scratch. The solver parameters such as initial learning rate, learning rate decay type, etc. may be different. A pruning iteration is composed of the prune and finetune tasks executed sequentially. In general, to achieve a greater weight reduction without significant accuracy loss, several pruning iterations must be performed.
The configuration file needs to be modified after every pruning iteration:
- Increase the rate parameter relative to the baseline model.
- Modify the weights parameter to the best model obtained in the previous finetuning step.
A modified configuration file is shown below:
workspace: "examples/decent_p/"
gpu: "0,1,2,3"
test_iter: 100
acc_name: "top-1"
model: "examples/decent_p/float.prototxt"
#weights: "examples/decent_p/float.caffemodel"
weights: "examples/decent_p/regular_rate_0.1/_iter_10000.caffemodel"
solver: "examples/decent_p/solver.prototxt"
# change rate from 0.1 to 0.2
#rate: 0.1
rate: 0.2
pruner {
method: REGULAR
}
Generating the Final Model
After a few pruning iterations, a model with fewer weights is generated. The following transformation step is required to finalize the model:
$ ./vai_p_caffe transform –model float.prototxt –weights finetuned_model.caffemodel
If you fail to specify the name of the output file, a default file named transformed.caffemodel is generated. The corresponding model file is the final.prototxt generated by the prune command.
To get the FLOPs of a model, you can use the stat command:
$ ./vai_p_caffe stat –model final.prototxt
vai_p_caffe Usage
The following arguments are available when running vai_p_caffe:
Argument | Attribute | Default | Description |
---|---|---|---|
ana | |||
config | required | “” | The configuration file path. |
prune | |||
config | required | “” | The configuration file path. |
finetune | |||
config | required | “” | The configuration file path. |
transform | |||
model | required | “” | Baseline model definition protocol buffer text file |
weights | required | “” | Model weights file path. |
output | optional | “” | The output transformed weights. |
Argument | Type | Attribute | Default | Description |
---|---|---|---|---|
workspace | string | required | None | Directory for saving output files. |
gpu | string | optional | “0” | GPU device IDs used for compression and fine-tuning, separated by ‘,’. |
test_iter | int | optional | 100 | The number of iterations to run in test phase. |
acc_name | string | required | None | The accuracy measure of interest. This parameter is the layer_top of the layer used to evaluate network performance. If the network has multiple evaluation metrics, choose the one that you think is most important. For classification tasks, this parameter may be top-1 or top-5 accuracy; for detection tasks, this parameter is generally mAP; for segmentation tasks, typically the layer for calculating mIOU is set here. |
model | string | required | None | The model definition protocol buffer text file. If there are two different model definition files for training and testing, it is recommended to merge them into a single file. |
weights | string | required | None | The trained weights to compress. |
solver | string | required | None | The solver definition protocol buffer text file. |
rate | float | optional | None | The expected model pruning ratio. |
method | enum | optional | REGULAR | Pruning method to be used. Currently REGULAR is the only valid value. |
ssd_ap_version | string | optional | None | The ap_version setting for SSD network compression. Must be one of 11point, MaxIntegral and Integral. |
exclude | repeated | optional | None | Used to exclude some layers from pruning. You can use this parameter to prevent specified convolutional layers from being pruned. |
kernel_batch | int | optional | 2 | The number of output channels is a multiple of this value after pruning. |