TensorFlow Version - vai_p_tensorflow
Exporting an Inference Graph
TensorFlow Model
First, build a TensorFlow graph for training and evaluation. Each part must be written in a separate script. If you have trained a baseline model before and you have the training codes, then you only need to prepare the codes for evaluation.
The evaluation script must contain a function named model_fn
that creates all the needed nodes from input
to output. The function should return a dictionary that maps the names of output
nodes to their operations or a tf.estimator.Estimator
. For example,
if your network is an image classifier, the returned dictionary usually includes
operations to calculate top-1 and top-5 accuracy as shown in the following
snippet:
def model_fn():
# graph definition codes here
# ……
return {
'top-1': slim.metrics.streaming_accuracy(predictions, labels),
'top-5': slim.metrics.streaming_recall_at_k(logits, org_labels, 5)
}
Or, if you use TensorFlow Estimator API to train and evaluate your
network, your model_fn
must return an instance of
the tf.estimator
. At the same time, you also need to provide a
function called eval_input_fn
, which the Estimator
uses to get the data used in the evaluation.
def cnn_model_fn(features, labels, mode):
# codes for building graph here
…
eval_metric_ops = {
"accuracy": tf.metrics.accuracy(
labels=labels, predictions=predictions["classes"])}
return tf.estimator.EstimatorSpec(
mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)
def model_fn():
return tf.estimator.Estimator(
model_fn=cnn_model_fn, model_dir="./models/train/")
mnist = tf.contrib.learn.datasets.load_dataset("mnist")
train_data = mnist.train.images # Returns np.array
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)
def eval_input_fn():
return tf.estimator.inputs.numpy_input_fn(
x={"x": eval_data},
y=eval_labels,
num_epochs=1,
shuffle=False)
The evaluation codes are used to export an inference GraphDef file and evaluate network performance during pruning.
To export a GraphDef proto file, use the following code:
import tensorflow as tf
from google.protobuf import text_format
from tensorflow.python.platform import gfile
with tf.Graph().as_default() as graph:
# your graph definition here
# ……
graph_def = graph.as_graph_def()
with gfile.GFile(‘inference_graph.pbtxt’, 'w') as f:
f.write(text_format.MessageToString(graph_def))
Keras Model
For the Keras model, there is no explicit graph definition. You must
get a GraphDef object first and then export it.
An example of tf.keras
pre-defined ResNet50 is
given here:
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.python.framework import graph_util
tf.keras.backend.set_learning_phase(0)
model = tf.keras.applications.ResNet50(weights=None,
include_top=True,
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy())
graph_def = K.get_session().graph.as_graph_def()
# "probs/Softmax": Output node of ResNet50 graph.
graph_def = graph_util.extract_sub_graph(graph_def, ["probs/Softmax"])
tf.train.write_graph(graph_def,
"./",
"inference_graph.pbtxt",
as_text=True)
Preparing a Baseline Model
TensorFlow Model
TensorFlow saves variables in binary checkpoint files that map variable
names to tensor values. vai_p_tensorflow takes a
checkpoint file as input to load trained weights. The tf.train.Saver
provides methods to specify paths for the checkpoint files
to write to or read from.
Code snippet to call the tf.train.Saver.save
method to save variables to checkpoint files:
with tf.Session() as sess:
# your graph building codes here
# ……
sess.run(train_op)
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in path: %s" % save_path)
The saved checkpoint files look like this:
model.ckpt.data-00000-of-00001
model.ckpt.index
model.ckpt.meta
Keras Model
tf.keras
allows model weights to be
saved in two formats: HDF5 and TensorFlow format. Currently only TensorFlow format is
supported by the tool. If the model weights have been saved in HDF5, then you have to
convert it to TensorFlow format.
import tensorflow as tf
tf.keras.backend.set_learning_phase(0)
model = tf.keras.applications.ResNet50(weights="imagenet",
include_top=True,
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
model.save_weights("model.ckpt", save_format='tf')
The converted checkpoint files look like this:
model.ckpt.data-00000-of-00001
model.ckpt.data-00001-of-00002
model.ckpt.index
Performing Model Analysis
Before conducting model pruning, you need to analyze the model first. The main purpose of this process is to find a suitable pruning strategy to prune the model.
To run model analysis, you need to provide a Python script containing the
functions that evaluate model performance. Assuming that your script is eval_model.py
, you must provide the required functions in
one of three ways:
- A function named
model_fn()
that returns a Python dict of metric ops:def model_fn(): tf.logging.set_verbosity(tf.logging.INFO) img, labels = get_one_shot_test_data(TEST_BATCH) logits = net_fn(img, is_training=False) predictions = tf.argmax(logits, 1) labels = tf.argmax(labels, 1) eval_metric_ops = { 'accuracy': tf.metrics.accuracy(labels, predictions), 'recall_5': tf.metrics.recall_at_k(labels, logits, 5) } return eval_metric_ops
- A function named
model_fn()
that returns an instance oftf.estimator.Estimator
and a function namedeval_input_fn()
that feeds test data to the estimator:def model_fn(): return tf.estimator.Estimator( model_fn=cnn_model_fn, model_dir="./models/train/") def eval_input_fn(): return tf.estimator.inputs.numpy_input_fn( x={"x": eval_data}, y=eval_labels, num_epochs=1, shuffle=False)
- A function named
evaluate()
that takes a single parameter as argument that returns the metric score:def evaluate(checkpoint_path): with tf.Graph().as_default(): net = ConvNet(False) net.build(test_only=True) score = net.evaluate(checkpoint_path) return score
If you are using tf.keras
API, this is the recommended way:
import tensorflow as tf
def evaluate(checkpoint_path):
net = tf.keras.applications.ResNet50(weights=None,
include_top=True,
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
net.load_weights(checkpoint_path)
metric_top_5 = tf.keras.metrics.SparseTopKCategoricalAccuracy()
accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
loss = tf.keras.losses.SparseCategoricalCrossentropy()
# eval_data: validation dataset. You can refer to ‘tf.keras.Model.evaluate’ method to generate your validation dataset.
# EVAL_NUM: the number of validation dataset
res = net.evaluate(eval_data,
steps=EVAL_NUM/batch_size,
workers=16,
verbose=1)
eval_metric_ops = {'Recall_5': res[-1]}
return eval_metric_ops
Suppose you use the first way to write the script, the following snippet
shows how to call vai_p_tensorflow
to perform model analysis.
vai_p_tensorflow \
--action=ana \
--input_graph=inference_graph.pbtxt \
--input_ckpt=model.ckpt \
--eval_fn_path=eval_model.py \
--target="recall_5" \
--max_num_batches=500 \
--workspace:/tmp \
--exclude="conv node names that excluded from pruning" \
--output_nodes="output node names of the network"
Following are the arguments in this command. See vai_p_tensorflow Usage for a full list of options.
- --action
- The action to perform.
- --input_graph
- A GraphDef proto file that represents the inference graph of the network.
- --input_ckpt
- The path to a checkpoint to use for pruning.
- --eval_fn_path
- The path to a Python script defining an evaluation graph.
- --target
- The target score that evaluates the performance of the network. If there is more than one score in the network, you should choose the one that is most important.
- --max_num_batches
- The number of batches to run in the evaluation phase. This parameter affects the time taken to analyze the model. The larger this value, the more time required for the analysis and the more accurate the analysis is. The maximum value of this parameter is the size of the validation set or the batch_size, that is, all the data in the validation set is used for evaluation.
- --workspace
- Directory for saving output files.
- --exclude
- Convolution nodes excluded from pruning.
- --output_nodes
- Output nodes of the inference graph.
Starting Pruning Loop
Once the command ana
has ended, you can start
pruning the model. The command prune
is very similar to
command ana
, requiring the same configuration file:
vai_p_tensorflow \
--action=prune \
--input_graph=inference_graph.pbtxt \
--input_ckpt=model.ckpt \
--output_graph=sparse_graph.pbtxt \
--output_ckpt=sparse.ckpt \
--workspace=/home/deephi/tf_models/research/slim \
--sparsity=0.1 \
--exclude="conv node names that excluded from pruning" \
--output_nodes="output node names of the network"
There is one new argument in this command:
- --sparsity
- The sparsity of network after pruning. It it a value between 0 and 1. The larger the value, the sparser the model is after pruning.
When the prune command finishes, the vai_p_tensorflow
outputs FLOPs of network before and after pruning.
Finetuning the Pruned Model
The performance of the pruned model has a certain degree of decline and you need to fine-tune it to improve its performance. Finetuning a pruned model is basically the same as training model from scratch, except that the hype-parameters, such as the initial learning rate and the learning rate decay type, are different.
When pruning and fine-tuning is done, an iteration of pruning is completed. In general, to achieve higher pruning rate without significant loss of performance, the model needs to be pruned several times. After every iteration of "prune-finetune", you need to make two changes to the commands before you run the next pruning:
- Modify the
--input_ckpt
flag to a checkpoint file generated in previous fine-tuning process. - Increase the value of
--sparsity
flag to prune more in the next iteration.
Generating Dense Checkpoints
After a few iterations of pruning, you get a model that is smaller than its original size. To get a final model, perform a transformation of the model.
vai_p_tensorflow \
--action=transform \
--input_ckpt=model.ckpt-10000 \
--output_ckpt=dense.ckpt
Transformation is only required after all iterations of pruning are completed. Do not run the transform command between each iteration of pruning.
Freezing the Graph
Now, you have a GraphDef
file containing
the architecture of the pruned model and a checkpoint file saving trained weights. For
prediction or quantization, merge these two files into a single pb file.
Freeze the graph using the following command:
freeze_graph \
--input_graph=sparse_graph.pbtxt \
--input_checkpoint=dense.ckpt \
--input_binary=false \
--output_graph=frozen.pb \
--output_node_names=”vgg_16/fc8/squeezed”
vai_p_tensorflow --action=flops --input_graph=frozen.pb --input_nodes=input --input_node_shapes=1,224,224,3 --output_nodes=vgg_16/fc8/squeezed
vai_p_tensorflow Usage
The following arguments are available when running vai_p_tensorflow:
Argument | Type | Action | Default | Description |
---|---|---|---|---|
action | string | - | "" | Which action to run. Valid actions include ‘ana', 'prune', 'transform', and 'flops'. |
workspace | string | [‘ana’, ‘prune’] | "" | Directory for saving output files. |
input_graph | string | [‘ana’, ‘prune’, ‘flops’] | "" | Path of a GraphDef protobuf file that defines the network’s architecture. |
input_ckpt | string | [‘ana’, ‘prune’, ‘transform’] | "" | Path of a checkpoint file. It is the prefix of filenames created for the checkpoint. |
eval_fn_path | string | [‘ana’] | "" | A Python file path used for model evaluation. |
target | string | [‘ana’] | "" | The output node name that indicates the performance of the model. |
max_num_batches | int | [‘ana’] | None | Maximum number of batches to evaluate.By default, use all. |
output_graph | string | [‘prune’] | "" | Path of a GraphDef protobuf file for saving the pruned network. |
output_ckpt | string | [‘prune’, ‘transform’’] | "" | Path of a checkpoint file for saving weights. |
gpu | string | [‘ana’] | "" | GPU device IDs to use separated by ‘,’. |
sparsity | float | [‘prune’] | None | The desired sparsity of network after pruning. |
exclude | repeated | [‘ana’, ‘prune’] | None | Convolution nodes excluded from pruning. |
input_nodes | repeated | [‘flops’] | None | Input nodes of the inference graph. |
input_node_shapes | repeated | [‘flops’] | None | Shape of input nodes. |
output_nodes | repeated | [‘ana’, ‘prune’, ‘flops’] | None | Output nodes of the inference graph. |
channel_batch | int | [‘prune’] | 2 | The number of output channels is a multiple of this value after pruning. |