Two ways of TensorRT to optimize Neural Network Computation Graph
Last updated
Last updated
GIE is now TensorRT https://devblogs.nvidia.com/deploying-deep-learning-nvidia-tensorrt/
GIE performs several important transformations and optimizations to the neural network graph. First, layers with unused output are eliminated to avoid unnecessary computation. Next, where possible convolution, bias, and ReLU layers are fused to form a single layer. Figure 4 shows the result of this vertical layer fusion on the original network from Figure 3 (fused layers are labeled CBR in Figure 4). Layer fusion improves the efficiency of running GIE-optimized networks on the GPU.
Another transformation is horizontal layer fusion, or layer aggregation, along with the required division of aggregated layers to their respective outputs, as Figure 5 shows. Horizontal layer fusion improves performance by combining layers that take the same source tensor and apply the same operations with similar parameters, resulting in a single larger layer for higher computational efficiency. The example in Figure 5 shows the combination of 3 1×1 CBR layers from Figure 4 that take the same input into a single larger 1×1 CBR layer. Note that the output of this layer must be disaggregated to feed into the different subsequent layers from the original input graph.