Neural Networks

Fastinference offers a limited support for Deep Learning and Neural Network architectures. The current focus is on feed-forward MLPs and ConvNets in the context of small, embedded systems and FPGAs, but we are always open to enhance our support for new Deep Learning architectures.

Important: ONNX is the open standard for machine learning interoperability and supported by all major Deep Learning frameworks. However, the ONNX format is still under development and a given deep architecture can often be represented with various computational graphs. Hence, this standard is sometimes ambiguous. This implementation has been tested with PyTorch and visualized with Netron. For exporting a Neural Net we usually use

dummy_x = torch.randn(1, x_train.shape[1], requires_grad=False)
torch.onnx.export(model, dummy_x, os.path.join(out_path,name), training=torch.onnx.TrainingMode.PRESERVE, export_params=True,opset_version=11, do_constant_folding=True, input_names = ['input'],  output_names = ['output'], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})

Some notes on Binarized Neural Networks

Binarized Neural Networks (BNNs) are Neural Networks with weights constraint to {-1,+1} so that the forward pass of the entire network can be executed via boolean operations (usually XNOR + popcount). A typical structure of these networks are as follows:

Input -> Linear / Conv -> BatchNorm -> Step -> … -> Linear / Conv -> BatchNorm -> Step -> Output

where the Linear / Conv layers only have “binary” weights and biases {-1,+1} and the step function is Heaviside function. BNNs are usually not supported by the major frameworks out of the box, but require some additional libraries as well as some tweaks in the ONNX format. For example, larq offers binarization for keras / tensorflow and Brevitas enables binarization for PyTorch. Alternatively, we can directly implement binarization as shown in the example below. Unfortunately, ONNX does not support the custom operators from these libraries so that we have to sanitize these before exporting. In fastinference we simply replace each binary layer, e.g. BinaryLinear, with its regular counterpart torch.nn.Linear. Moreover, PyTorch cannot export the Heaviside function yet into an ONNX file. Hence we mimic this function with a series of “Constant -> Greater -> Constant -> Constant -> Where” layers which is then parsed and merged back into a Step layer by fastinference. For a complete example check out train_mlp.py or train_cnn.py.

class BinarizeF(Function):
    @staticmethod
    def forward(ctx, input):
        output = input.new(input.size())
        output[input > 0] = 1
        output[input <= 0] = -1
        return output

    @staticmethod
    def backward(ctx, grad_output):
        #return grad_output, None
        grad_input = grad_output.clone()
        return grad_input#, None

binarize = BinarizeF.apply

class BinaryLinear(nn.Linear):
    def __init__(self, *args, **kwargs):
        super(BinaryLinear, self).__init__(*args, **kwargs)

    def forward(self, input):
        if self.bias is None:
            binary_weight = binarize(self.weight)

            return F.linear(input, binary_weight)
        else:
            binary_weight = binarize(self.weight)
            binary_bias = binarize(self.bias)
            return F.linear(input, binary_weight, binary_bias)

class BinaryTanh(nn.Module):
    def __init__(self, *args, **kwargs):
        super(BinaryTanh, self).__init__()
        self.hardtanh = nn.Hardtanh(*args, **kwargs)

    def forward(self, input):
        output = self.hardtanh(input)
        output = binarize(output)
        return output


class SimpleMLP(nn.Module):

    def __init__(self, input_dim, n_classes):
        super().__init__()
        self.layer_1 = BinaryLinear(input_dim, 128)
        self.bn_1 = nn.BatchNorm1d(128)
        self.activation_1 = BinaryTanh()
        self.layer_2 = BinaryLinear(128, 256)
        self.bn_2 = nn.BatchNorm1d(256)
        self.activation_2 = BinaryTanh()
        self.layer_3 = BinaryLinear(256, n_classes)

    def forward(self, x):
        x = self.layer_1(x)
        x = self.bn_1(x)
        x = self.activation_1(x)
        x = self.layer_2(x)
        x = self.bn_2(x)
        x = self.activation_2(x)
        x = self.layer_3(x)
        x = torch.log_softmax(x, dim=1)

        return x

def sanatize_onnx(model):

    # Usually I would use https://pytorch.org/docs/stable/generated/torch.heaviside.html for exporting here, but this is not yet supported in ONNX files.
    class Sign(nn.Module):
        def forward(self, input):
            return torch.where(input > 0, torch.tensor([1.0]), torch.tensor([-1.0]))

    for name, m in reversed(model._modules.items()):
        print("Checking {}".format(name))

        if isinstance(m, BinaryLinear):
            print("Replacing {}".format(name))
            # layer_old = m
            layer_new = nn.Linear(m.in_features, m.out_features, hasattr(m, 'bias'))
            if (hasattr(m, 'bias')):
                layer_new.bias.data = binarize(m.bias.data)
            layer_new.weight.data = binarize(m.weight.data)
            model._modules[name] = layer_new

        if isinstance(m, BinaryTanh):
            model._modules[name] = Sign()

    return model

model = SimpleMLP(input_dim, n_classes)
# Train the model
model = sanatize_onnx(model)
torch.onnx.export(model,dummy_x,os.path.join(out_path,name), export_params=True,opset_version=11, do_constant_folding=True, input_names = ['input'],  output_names = ['output'], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})

Available optimizations

fastinference.optimizers.neuralnet.merge_nodes.optimize(model, **kwargs)

Merges subsequent BatchNorm and Step layers into a new Step layer with adapted thresholds in a single pass. Currently there is no recursive merging applied.

TODO: Perform merging recursively.

Parameters: model (NeuralNet) – The NeuralNet model.
Returns: The NeuralNet model with merged layers.
Return type: NeuralNet

fastinference.optimizers.neuralnet.remove_nodes.optimize(model, **kwargs)

Removes LogSoftmax and positive scaling (Mul) layers from the network because they do not change the prediction.

Parameters: model (NeuralNet) – The NeuralNet model.
Returns: The NeuralNet model with removed layers.
Return type: NeuralNet

The NeuralNet object

class fastinference.models.nn.NeuralNet.NeuralNet(path_to_onnx, accuracy=None, name='model')

A (simplified) neural network model. This class currently supports feed-forward multi-layer perceptrons as well as feed-forward convnets. In detail the following operations are supported

Linear Layer

Convolutional Layer

Sigmoid Activation

ReLU Activation

LeakyRelu Activation

MaxPool

AveragePool

LogSoftmax

LogSoftmax

Multiplication with a constant (Mul)

Reshape

BatchNormalization

All layers are stored in self.layer which is already order for execution. Additionally, the original onnx_model is stored in self.onnx_model.

This class loads ONNX files to build the internal computation graph. This can sometimes become a little tricky since the ONNX exporter work differently for each framework / version. In PyToch we usually use

dummy_x = torch.randn(1, x_train.shape[1], requires_grad=False)
torch.onnx.export(model, dummy_x, os.path.join(out_path,name), training=torch.onnx.TrainingMode.PRESERVE, export_params=True,opset_version=11, do_constant_folding=True, input_names = ['input'],  output_names = ['output'], dynamic_axes={'input' : {0 : 'batch_size'},'output' : {0 : 'batch_size'}})

Important: This class automatically merges “Constant -> Greater -> Constant -> Constant -> Where” operations into a single step layer. This is specifically designed to parse Binarized Neural Networks, but might be wrong for some types of networks.

__init__(path_to_onnx, accuracy=None, name='model')

Constructor of NeuralNet.

Parameters

onnx_neural_net (str) – Path to the onnx file.
accuracy (float, optional) – The accuracy of this tree on some test data. Can be used to verify the correctness of the implementation. Defaults to None.
name (str, optional) – The name of this model. Defaults to “Model”.

predict_proba(X)

Applies this NeuralNet to the given data and provides the predicted probabilities for each example in X. This function internally calls onnxruntime.InferenceSession for inference..

Parameters: X (numpy.array) – A (N,d) matrix where N is the number of data points and d is the feature dimension. If X has only one dimension then a single example is assumed and X is reshaped via X = X.reshape(1,X.shape[0])
Returns: A (N, c) prediction matrix where N is the number of data points and c is the number of classes
Return type: numpy.array

fastinference.models.nn.NeuralNet.layer_from_node(graph, node, input_shape)

Constructs the appropriate layer from the given graph and node.

Parameters

graph – The onnx graph.
node – The current node.
input_shape (tuple) – The input shape of the current node

Raises

NotImplementedError – Throws an error if there is no implementation for the current node available.

Returns

The newly constructed layer.

Return type

Layer