pnnx转换的ncnn模型文件使用fp16保存
pnnx在linux上链接pthread,修复windows minmax编译问题
pnnx新增静态msvc crt cmake选项
修正pnnx hardtanh 参数的ncnn转换
修复pnnx macos动态库加载路径的问题
New Contributors
* MouriNaruto made their first contribution in https://github.com/Tencent/ncnn/pull/3591
* YoungSx made their first contribution in https://github.com/Tencent/ncnn/pull/3655
* hariag made their first contribution in https://github.com/Tencent/ncnn/pull/3656
* EdVince made their first contribution in https://github.com/Tencent/ncnn/pull/3667
* mirrorsysu made their first contribution in https://github.com/Tencent/ncnn/pull/3696
* jasonZhang892 made their first contribution in https://github.com/Tencent/ncnn/pull/3710
* UNeedCryDear made their first contribution in https://github.com/Tencent/ncnn/pull/3649
**Full Changelog**: https://github.com/Tencent/ncnn/compare/20220216...20220420
20220216
编译版本,默认配置,android-ndk-r21d,xcode 12.4,ubuntu-18.04,ubuntu-20.04,vs2015,vs2017,vs2019,emscripten-2.0.8
| file | content | arch |
|---|---|---|
|ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | |
|ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 |
|ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 |
|ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 |
|ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 |
|ncnn-macos.zip | macos 静态库 | x86_64 + arm64 |
|ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 |
|ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 |
|ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 |
|ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |
conv sgemm pack4/pack1to4/pack4to1 x86 sse2/avx优化
conv3x3s1 winograd pack4/pack4to1 x86 sse2/avx优化
conv int8 gemm pack8to4/pack8to1/pack1to8 x86 xop/avx2/avx512-vnni/avx-vnni优化
conv3x3s1 int8 winograd pack8to4/pack8to1 x86 xop/avx2/avx512-vnni/avx-vnni优化
scale x86 avx优化(Yoh-Z)
interp x86 avx优化(Yoh-Z)
conv pack arm neon优化
x86 avx512基础架构
默认启用x86 avx512编译和运行时检测
解耦合x86 fma和avx2
不依赖libgcc的x86 cpu指令集探测
支持动态权重的卷积
修正可能因Mat成员函数没有内联导致的非法指令问题
修正可能因函数对象实例没有内联导致的非法指令问题
修正单元测试比较函数错误(yyuzhong)
binaryop/unaryop/reduction支持4维输入
新增Tile层和torch.repeat的转换
新增MatMul层和torch.matmul的转换
armv8.2 dot编译为运行时可选
支持sw_64平台(wzyforgit)
增加c-api的cmake开关
c-api增加默认mat构造函数(tpoisonooo)
简化binaryop的函数对象代码(tpoisonooo)
修正interp nearest在有非常规scale_factor参数计算错误的问题
简化c-api自定义层forward_n参数类型
删除非avx2编译时退化sse2的警告(kagurazakakotori)
在64位编译时使用_mm_cvtsi128_si64降低内存访问(kagurazakakotori)
修正low-level op api文档错误(FeiGeChuanShu)
修正crop test缺失的doffset参数(xh-liu-tech)
修正arm convolution pack1to4 int8权重重排(cmdbug)
简化get_current_time平台相关宏(cmdbug)
修正armv7无neon编译时计算错误的问题
增加c906 v223工具链(zchrissirhcz)
添加第二个qq技术交流群答案(LJoson)
python ci禁用tools和examples构建
ci动态库编译禁用LTO
ci更新swiftshader-20220211
删除travis ci和readme相关条目(proydakov)
新增yolo-fastest模型benchmark(dog-qiuqiu)
更新来自Q-engineering树莓派/jetson-nano等benchmark数据
benchmark增加zynq-7020/z8350/n5105
pnnx支持转换torch dequantize/quantize_per_tensor/quantized.linearrelu/argmax/argmin/clone/normal/expand/var/amax/amin/logsumexp/prod/sum/arange/matmul/zeros_like/expand_like/deformconv2d/roialign/norm/stack/repeat/zeros/roll/remainder
pnnx自动删除dropout算子
pnnx自动删除无pads的pad和noop算术表达式
pnnx常量折叠
pnnx转换4维常量数据
pnnx支持half数据类型导出的模型
pnnx转ncnn时删除尾部的reshape/permute
pnnx合并conv1d-bn convtranspose1d-bn
pnnx合并单一维度全select为unbind
pnnx确保算子名唯一性
修正pnnx转ncnn时遇到无法展开的表达式发生崩溃的问题
pnnx转ncnn支持负数pads的F.pad
pnnx转ncnn合并transpose-matmul
pnnx转ncnn在pooling123d前后增加升维和降维的reshape模拟nn.MaxPool123d处理无batch维数据的行为
pnnx命令行参数的shape指定输入类型
pnnx自动寻找pytorch安装目录(Yutyrannus)
pnnx ci自动拷贝dll文件(Yutyrannus)
添加pnnx命令行工具用法说明(ling0322)
New Contributors
* wzyforgit made their first contribution in https://github.com/Tencent/ncnn/pull/3421
* dog-qiuqiu made their first contribution in https://github.com/Tencent/ncnn/pull/3470
* xh-liu-tech made their first contribution in https://github.com/Tencent/ncnn/pull/3475
* ling0322 made their first contribution in https://github.com/Tencent/ncnn/pull/3487
* kagurazakakotori made their first contribution in https://github.com/Tencent/ncnn/pull/3527
* LJoson made their first contribution in https://github.com/Tencent/ncnn/pull/3532
* Yoh-Z made their first contribution in https://github.com/Tencent/ncnn/pull/3540
* yyuzhong made their first contribution in https://github.com/Tencent/ncnn/pull/3556
**Full Changelog**: https://github.com/Tencent/ncnn/compare/20211208...20220216
20211208
编译版本,默认配置,android-ndk-r21d,xcode 12.4,ubuntu-18.04,ubuntu-20.04,vs2015,vs2017,vs2019,emscripten-2.0.8
| file | content | arch |
|---|---|---|
|ncnn-full-source.zip |包含全部 submodule 代码的完整源码 | |
|ncnn-android.zip | android 静态库/动态库 | armeabi-v7a + arm64-v8a + x86 + x86_64 |
|ncnn-android-vulkan.zip | android 静态库/动态库,支持 GPU | armeabi-v7a + arm64-v8a + x86 + x86_64 |
|ncnn-ios.zip | ios 静态库,with and w/o bitcode | armv7 + arm64 + arm64e + i386 + x86_64 |
|ncnn-ios-vulkan.zip | ios 静态库,支持 GPU,with and w/o bitcode | arm64 + arm64e + x86_64 |
|ncnn-macos.zip | macos 静态库 | x86_64 + arm64 |
|ncnn-macos-vulkan.zip | macos 静态库,支持 GPU | x86_64 + arm64 |
|ncnn-ubuntu.zip | ubuntu linux 静态库/动态库,支持 GPU,模型转换工具 | x86_64 |
|ncnn-windows.zip | windows 静态库/动态库,支持 GPU,模型转换工具 | x86 + x86_64 |
|ncnn-webassembly.zip | webassembly 静态库 | wasm32 + simd + threads + simd-threads |
Mat数据结构支持4维
新增Convolution3D, Pooling3D和对应的pnnx算子转换
这些算子支持4维输入输出(Cast, Packing, ReLU, BatchNorm, Reshape, Flatten, Permute, Crop)和对应的pnnx算子转换
C api增加4维mat
Convolution1D常规的simd优化(sse/avx/neon/rvv/msa)
降低gpu推理时的cpu占用
降低单元测试cpu占用
改进pnnx转ncnn的batch轴识别
更新operators文档
修复开启simpleocv时仍然寻找系统opencv的问题(zchrissirhcz)
修正p2pnet例子绘图bug(FeiGeChuanShu)