- `GeLU`
- Add logic to replace a set of primitive OPs with `GeLU`s to improve overall model processing efficiency.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/gelu
https://www.tensorflow.org/api_docs/python/tf/nn/gelu
- Automatically detect the range where the following OP combination is valid and replace it with the `GeLU` OP.
- Add `--rtpo`, `--replace_to_pseudo_operators` options with `GeLU`.
- If `-rtpo`, `--replace_to_pseudo_operators` option is set to `GeLU`, the internal processing of `GeLU` is replaced by an `approximate` calculation, speeding up the operation in exchange for a small loss of accuracy.
- `GeLU` by itself can infer 28 times faster than before.
[gelu_11_float32_primitive.tflite.zip](https://github.com/PINTO0309/onnx2tf/files/12466479/gelu_11_float32_primitive.tflite.zip)
[gelu_11_float32_approximate_false.tflite.zip](https://github.com/PINTO0309/onnx2tf/files/12466481/gelu_11_float32_approximate_false.tflite.zip)
[gelu_11_float32_approximate_true.tflite.zip](https://github.com/PINTO0309/onnx2tf/files/12466482/gelu_11_float32_approximate_true.tflite.zip)
python
import tensorflow as tf
import numpy as np
np.random.seed(0)
import time
data = np.random.randn(1,512,512,3).astype(np.float32)
loop = 10
interpreter = tf.lite.Interpreter(
model_path="gelu_11_float32_primitive.tflite",
num_threads=20
)
tf_lite_model = interpreter.get_signature_runner()
inputs = {
'input': data,
}
warmup
_ = tf_lite_model(**inputs)
test
total = 0.0
for i in range(loop):
start_time = time.perf_counter()
_ = tf_lite_model(**inputs)
elapsed_time = time.perf_counter() - start_time
total += elapsed_time
print(f"[TFLite] Primitive inf time: {total / loop}")
interpreter = tf.lite.Interpreter(
model_path="gelu_11_float32_approximate_false.tflite",
num_threads=20
)
tf_lite_model = interpreter.get_signature_runner()
inputs = {
'input': data,
}
warmup
_ = tf_lite_model(**inputs)
test
total = 0.0
for i in range(loop):
start_time = time.perf_counter()
_ = tf_lite_model(**inputs)
elapsed_time = time.perf_counter() - start_time
total += elapsed_time
print(f"[TFLite] Disable approximate inf time: {total / loop}")
interpreter = tf.lite.Interpreter(
model_path="gelu_11_float32_approximate_true.tflite",
num_threads=20
)
tf_lite_model = interpreter.get_signature_runner()
inputs = {
'input': data,
}
warmup
_ = tf_lite_model(**inputs)
test
total = 0.0
for i in range(loop):
start_time = time.perf_counter()
_ = tf_lite_model(**inputs)
elapsed_time = time.perf_counter() - start_time
total += elapsed_time
print(f"[TFLite] Enable approximate inf time: {total / loop}")

- Results
[gelu.onnx.zip](https://github.com/PINTO0309/onnx2tf/files/12450575/gelu.onnx.zip)

|onnx|tflite|
|:-:|:-:|
|||
- [[TODO] Implemented forced replacement of GeLU processing with standard OP. 465](https://github.com/PINTO0309/onnx2tf/issues/465)
What's Changed
* Add logic to replace a set of primitive OPs with `GeLU`s to improve overall model processing efficiency by PINTO0309 in https://github.com/PINTO0309/onnx2tf/pull/466
**Full Changelog**: https://github.com/PINTO0309/onnx2tf/compare/1.15.18...1.16.0