* 79: update to onnxruntime v1.16.0
* 77: helpers to benchmark a model
* 74: add a function to enumerate all intermediate results with onnxruntime
* 71, 72, 73: add function to analyse a profile produce by onnxruntime
* 68, 69, 70: add CPU implementation for CustomGemmFloat8
* 67: add a function to extract a subgraph of a model
* 59, 60, 61, 62, 63, 65,
66, 68, 69, 70:
add local functions to quantize into float 8, float 16
* 57: add C implementation for DynamicQuantizeLinear (for experimentation)
* 56: add C implementation to cast a float into float 8
* 55, 58: add basic functionality to transform a graph, starts with basic quantization
* 51: fix optmized TreeEnsembleRegressor and adds TreeEnsembleClassifier as custom ops
* 50: add command line store to store intermediate outputs
* 49: add option to save intermediate results in CReferenceEvaluator
* 45: add option cuda-link to setup.py to specify how to link with CUDA library
* 41: implements a custom kernel for RandomForestRegressor easier to optimize
* 34: update to onnxruntime v1.15.1
* 31: implement a custom CUDA kernel (gemm)
* 32: update to onnxruntime v1.15.0
* 27: add a custom kernel with parameters to onnxruntime
* 26: add a custom kernel to onnxruntime
* 24: use Eigen to implement Conv operator
* 23: make pip wheel . work
* 22: rename cmake into _cmake to avoid warnings related to cmake package
* 19: minimal settings to use onnxruntime
* 14: minimal setting to use CUDA
* 8: support for C++ unit test