- `GeLU`
- Add logic to replace a set of primitive OPs with `GeLU`s to improve overall model processing efficiency.
https://www.tensorflow.org/api_docs/python/tf/keras/activations/gelu
https://www.tensorflow.org/api_docs/python/tf/nn/gelu
- Automatically detect the range where the following OP combination is valid and replace it with the `GeLU` OP.
- Add `--rtpo`, `--replace_to_pseudo_operators` options with `GeLU`.
- If `-rtpo`, `--replace_to_pseudo_operators` option is set to `GeLU`, the internal processing of `GeLU` is replaced by an `approximate` calculation, speeding up the operation in exchange for a small loss of accuracy.
- `GeLU` by itself can infer 28 times faster than before.
[gelu_11_float32_primitive.tflite.zip](https://github.com/PINTO0309/onnx2tf/files/12466479/gelu_11_float32_primitive.tflite.zip)
[gelu_11_float32_approximate_false.tflite.zip](https://github.com/PINTO0309/onnx2tf/files/12466481/gelu_11_float32_approximate_false.tflite.zip)
[gelu_11_float32_approximate_true.tflite.zip](https://github.com/PINTO0309/onnx2tf/files/12466482/gelu_11_float32_approximate_true.tflite.zip)
python
import tensorflow as tf
import numpy as np
np.random.seed(0)
import time
data = np.random.randn(1,512,512,3).astype(np.float32)
loop = 10
interpreter = tf.lite.Interpreter(
model_path="gelu_11_float32_primitive.tflite",
num_threads=20
)
tf_lite_model = interpreter.get_signature_runner()
inputs = {
'input': data,
}
warmup
_ = tf_lite_model(**inputs)
test
total = 0.0
for i in range(loop):
start_time = time.perf_counter()
_ = tf_lite_model(**inputs)
elapsed_time = time.perf_counter() - start_time
total += elapsed_time
print(f"[TFLite] Primitive inf time: {total / loop}")
interpreter = tf.lite.Interpreter(
model_path="gelu_11_float32_approximate_false.tflite",
num_threads=20
)
tf_lite_model = interpreter.get_signature_runner()
inputs = {
'input': data,
}
warmup
_ = tf_lite_model(**inputs)
test
total = 0.0
for i in range(loop):
start_time = time.perf_counter()
_ = tf_lite_model(**inputs)
elapsed_time = time.perf_counter() - start_time
total += elapsed_time
print(f"[TFLite] Disable approximate inf time: {total / loop}")
interpreter = tf.lite.Interpreter(
model_path="gelu_11_float32_approximate_true.tflite",
num_threads=20
)
tf_lite_model = interpreter.get_signature_runner()
inputs = {
'input': data,
}
warmup
_ = tf_lite_model(**inputs)
test
total = 0.0
for i in range(loop):
start_time = time.perf_counter()
_ = tf_lite_model(**inputs)
elapsed_time = time.perf_counter() - start_time
total += elapsed_time
print(f"[TFLite] Enable approximate inf time: {total / loop}")
![image](https://github.com/PINTO0309/onnx2tf/assets/33194443/79d7fe8e-7b54-4952-b785-46d3f129aba0)
- Results
[gelu.onnx.zip](https://github.com/PINTO0309/onnx2tf/files/12450575/gelu.onnx.zip)
![image](https://github.com/PINTO0309/onnx2tf/assets/33194443/755cf787-9150-4572-a458-88354ac295f9)
|onnx|tflite|
|:-:|:-:|
|![image](https://github.com/PINTO0309/onnx2tf/assets/33194443/9b2f7a0d-ff76-4e7f-a7e9-60e87cce26b8)|![image](https://github.com/PINTO0309/onnx2tf/assets/33194443/0944f075-6b0c-462a-a5cb-fe0c986f6f8d)|
- [[TODO] Implemented forced replacement of GeLU processing with standard OP. 465](https://github.com/PINTO0309/onnx2tf/issues/465)
What's Changed
* Add logic to replace a set of primitive OPs with `GeLU`s to improve overall model processing efficiency by PINTO0309 in https://github.com/PINTO0309/onnx2tf/pull/466
**Full Changelog**: https://github.com/PINTO0309/onnx2tf/compare/1.15.18...1.16.0