Features
- Added support for Qwen2-VL.
- Introduced support for GTE and split embedding layers for BGE/GTE.
- Implemented `imitate_quant` functionality during testing.
- Enabled usage of C++ compiled MNNConvert.
Refactors
- Refactored the implementation of the VL model.
- Updated model path handling for ONNX models.
Bug Fixes
- Resolved issues with `stop_ids` and quantization.
- Fixed the bug related to `block_size = 0`.