What's Changed
------------
- Merged PR 3101: [build] install pkg-config for macos buddy builds.
[Lisa Ong]
Fixes macos packaging build failure:
https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=47235&view=results
- Merged PR 3098: [nfc] Move vectorization code to separate files.
[Mason Remy]
[nfc] Move vectorization code to separate files
Moves vectorization code out of ExecutionPlanToAffineLoweringPass in
preparation for better separating out a vectorization pass that can be
run later than vectorization is currently happening
- Merged PR 3100: Adds CMake dependencies to acc-translate to ensure
correct build. [Kern Handa]
Adds CMake dependencies to acc-translate to ensure correct build
- Merged PR 3095: Remove duplicate SubArray class. [Mason Remy]
Remove duplicate SubArray class
- Merged PR 3073: vectorize masked load store. [JUBI TANEJA]
This PR handles vectorization specifically for a masked buffer fill, where the output size is larger than the input. There is a conditional load and vector store.
Given the nest:
nest.iteration_logic
def _nest():
def store_value():
Output[i] = Input[i]
def store_zero():
Output[i] = 0
_If(i < N_input, store_value).Else(store_zero)
The unoptimized MLIR is as follows:
%c0_i32 = arith.constant 0 : i32
%c5 = arith.constant 5 : index
"accv.lambda"() ({
affine.for %arg2 = 0 to 8 {
%0 = "accv.cmp"(%arg2, %c5) {predicate = 2 : i64} : (index, index) -> i1
scf.if %0 {
%1 = affine.load %arg0[%arg2] : memref<5xi32>
affine.store %1, %arg1[%arg2] : memref<8xi32>
} else {
affine.store %c0_i32, %arg1[%arg2] : memref<8xi32>
}
}
On vectorizing this for loop, we get the vectorized MLIR (simplified version) as follows:
%c5 = arith.constant 5 : index
%cst = arith.constant dense<false> : vector<8xi1>
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%c3 = arith.constant 3 : index
%c4 = arith.constant 4 : index
%c6 = arith.constant 6 : index
%c7 = arith.constant 7 : index
%c0_i32 = arith.constant 0 : i32
"accv.lambda"() ({
affine.for %arg2 = 0 to 8 step 8 {
%7 = "accv.cmp"(%arg2, %c5) {predicate = 2 : i64} : (index, index) -> i1
%9 = "accv.cmp"(%0, %c5) {predicate = 2 : i64} : (index, index) -> i1
%11 = "accv.cmp"(%1, %c5) {predicate = 2 : i64} : (index, index) -> i1
%13 = "accv.cmp"(%2, %c5) {predicate = 2 : i64} : (index, index) -> i1
%15 = "accv.cmp"(%3, %c5) {predicate = 2 : i64} : (index, index) -> i1
%17 = "accv.cmp"(%4, %c5) {predicate = 2 : i64} : (index, index) -> i1
%19 = "accv.cmp"(%5, %c5) {predicate = 2 : i64} : (index, index) -> i1
%21 = "accv.cmp"(%6, %c5) {predicate = 2 : i64} : (index, index) -> i1
%23 = memref.reinterpret_cast %arg0 to offset: [0], sizes: [5], strides: [1] : memref<5xi32> to memref<5xi32>
%24 = vector.transfer_read %23[%arg2], %c0_i32, %22 : memref<5xi32>, vector<8xi32>
%25 = memref.reinterpret_cast %arg1 to offset: [0], sizes: [8], strides: [1] : memref<8xi32> to memref<8xi32>
vector.store %24, %25[%arg2] : memref<8xi32>, vector<8xi32>
}
- Merged PR 3093: Add meaningful error messages for c++ exceptions.
[Captain Jack Sparrow]
Add meaningful error messages for c++ exceptions
- Merged PR 3092: Add type size getter utility. [Captain Jack Sparrow]
Add type size getter utility
- Merged PR 3074: Add rudimentary pass to fix redundant load/store
issue. [Chuck Jacobs]
This PR adds a simple pattern to `ValueSimplifyPass` that looks for the redundant load/store pattern we often see at the end of kernels, and removes them.
- Merged PR 3075: Enable `fast_exp` operation. [Chuck Jacobs]
This PR makes a few changes to enable the `fast_exp` operation:
- Adds `fast_exp` to the python DSL
- Enables vectorization of `abs` instruction (which is used by `fast_exp`)
It also makes a couple of other minor changes:
- Improves auto-naming of nest indices
- Better support for using custom LLVM builds with Accera
- Merged PR 3088: Support dynamic sub_array shape, split_dim size.
[Mason Remy]
Support dynamic sub_array shape, split_dim size
This still requires that the sizes are static before lowering, but it
supports dynamic sizes temporarily before inlining into an outer static
function
- Merged PR 3078: Adds reinterpret_cast functionality to Array. [Kern
Handa]
Adds reinterpret_cast functionality to Array
- Merged PR 3070: Fixes for sub_array and _split_dimension. [Mason Remy]
Fixes for sub_array and _split_dimension
This fixes the sub array and split dim ops to work with the accera
codebase that has updated around them. Some MemoryLayout assumptions are
getting in the way and have been disabled in the short-term, however
long term our memory layout behavior should more closely match what MLIR
affine maps can represent for more generalized dynamic support
- Merged PR 3063: Refactor Dimension with C++ backend container class
and few other fixes. [Captain Jack Sparrow]
- Refactor Dimension with C++ backend container (ScalarDimension)
- Enable output scalar variables
- Fix dynamic sized TEMP arrays
- Merged PR 3072: Bump hatlib version to 0.0.34, skip unsupported test
on arm64 macOS, minor targets doc update. [Lisa Ong]
Update hatlib version since there is no incompatibility
**Full Changelog**: https://github.com/microsoft/Accera/compare/v1.2.20...v1.2.21