Dace

Latest version: v1.0.1

Safety actively analyzes 701298 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 5

0.15

What's Changed

Work-Depth / Average Parallelism Analysis by hodelcl in 1363 and 1327

A new analysis engine allows SDFGs to be statically analyzed for work and depth / average parallelism. The analysis allows specifying a series of assumptions about symbolic program parameters that can help simplify and improve the analysis results. For an example on how to use the analysis, see the following example:

Python
from dace.sdfg.work_depth_analysis import work_depth

A dictionary mapping each SDFG element to a tuple (work, depth)
work_depth_map = {}
Assumptions about symbolic parameters
assumptions = ['N>5', 'M<200', 'K>N']
work_depth.analyze_sdfg(mysdfg, work_depth_map, work_depth.get_tasklet_work_depth, assumptions)

A dictionary mapping each SDFG element to its average parallelism
average_parallelism_map = {}
work_depth.analyze_sdfg(mysdfg, average_parallelism_map, work_depth.get_tasklet_avg_par, assumptions)

Symbol parameter reduction in generated code (1338, 1344)
To improve our integration with external codes, we limit the symbolic parameters generated by DaCe to only the used symbols. Take the following code for example:
python
dace
def addone(a: dace.float64[N]):
for i in dace.map[0:10]:
a[i] += 1

Since the internal code does not actually need `N` to process the array, it will not appear in the generated code. Before this release the signature of the generated code would be:
cpp
DACE_EXPORTED void __program_addone(addone_t *__state, double * __restrict__ a, int N);

After this release it is:
cpp
DACE_EXPORTED void __program_addone(addone_t *__state, double * __restrict__ a);

Note that this is a major, breaking change that requires users who manually interact with the generated .so files to adapt to.

Externally-allocated memory (workspace) support (1294)
A new allocation lifetime, `dace.AllocationLifetime.External`, has been introduced into DaCe. Now you can use your DaCe code with external memory allocators (such as PyTorch) and ask DaCe for: (a) how much transient memory it will need; and (b) to use a specific pre-allocated pointer. Example:

python
dace
def some_workspace(a: dace.float64[N]):
workspace = dace.ndarray([N], dace.float64, lifetime=dace.AllocationLifetime.External)
workspace[:] = a
workspace += 1
a[:] = workspace

csdfg = some_workspace.to_sdfg().compile()

sizes = csdfg.get_workspace_sizes() Returns {dace.StorageType.CPU_Heap: N*8}
wsp = ...Allocate externally...
csdfg.set_workspace(dace.StorageType.CPU_Heap, wsp)

The same interface is available in the generated code:
cpp
size_t __dace_get_external_memory_size_CPU_Heap(programname_t *__state, int N);
void __dace_set_external_memory_CPU_Heap(programname_t *__state, char *ptr, int N);
// or GPU_Global...

Schedule Trees (EXPERIMENTAL, 1145)
An experimental feature that allows you to analyze your SDFGs in a schedule-oriented format. It takes in SDFGs (even after applying transformations) and outputs a tree of elements that can be printed out in a Python-like syntax. For example:
python
dace.program
def matmul(A: dace.float32[10, 10], B: dace.float32[10, 10], C: dace.float32[10, 10]):
for i in range(10):
for j in dace.map[0:10]:
atile = dace.define_local([10], dace.float32)
atile[:] = A[i]
for k in range(10):
with dace.tasklet:
...
sdfg = matmul.to_sdfg()

from dace.sdfg.analysis.schedule_tree.sdfg_to_tree import as_schedule_tree
stree = as_schedule_tree(sdfg)
print(stree.as_string())

will print:
python
for i = 0; (i < 10); i = i + 1:
map j in [0:10]:
atile = copy A[i, 0:10]
for k = 0; (k < 10); k = (k + 1):
C[i, j] = tasklet(atile[k], B(10) [k, j], C[i, j])

There are some new transformation classes and passes in `dace.sdfg.analysis.schedule_tree.passes`, for example, to remove empty control flow scopes:
python
class RemoveEmptyScopes(tn.ScheduleNodeTransformer):
def visit_scope(self, node: tn.ScheduleTreeScope):
if len(node.children) == 0:
return None
return self.generic_visit(node)

We hope you find new ways to analyze and optimize DaCe programs with this feature!

Other Major Changes

* Support for tensor linear algebra (transpose, dot products) by alexnick83 in 1309
* (Experimental) support for nested data containers and structures by alexnick83 in 1324
* (Experimental) basic support for mpi4py syntax by alexnick83 and Com1t in 1070 and 1288
* (Experimental) Added support for a subset of F77 and F90 language features by acalotoiu and mcopik 1275, 1293, 1349 and 1367

Minor Changes

* Support for Python 3.12 by alexnick83 in 1386
* Support attributes in symbolic expressions by tbennun in 1369
* GPU User Experience Improvements by tbennun in 1283
* State Fusion Extension with happens before dependency edge by acalotoiu in 1268
* Add `CPU_Persistent` map schedule (OpenMP parallel regions) by tbennun in 1330

Fixes and Smaller Changes:

* Fix transient bug in test with `array_equal` of empty arrays by tbennun in 1374
* Fixes GPUTransform bug when data are already in GPU memory by alexnick83 in 1291
* Fixed erroneous parsing of data slices when the data are defined inside a nested scope by alexnick83 in 1287
* Disable OpenMP sections by default by tbennun in 1282
* Make SDFG.name a proper property by phschaad in 1289
* Refactor and fix performance regression with GPU runtime checks by tbennun in 1292
* Fixed RW dependency violation when accessing data attributes by alexnick83 in 1296
* Externally-managed memory lifetime by tbennun in 1294
* External interaction fixes by tbennun in 1301
* Improvements to RefineNestedAccess by alexnick83 and Sajohn-CH in 1310
* Fixed erroneous parsing of while-loop conditions by alexnick83 in 1313
* Improvements to MapFusion when the Map bodies contain NestedSDFGs by alexnick83 in 1312
* Fixed erroneous code generation of indirected accesses by alexnick83 in 1302
* RefineNestedAccess take indices into account when checking for missing free symbols by Sajohn-CH in 1317
* Fixed SubgraphFusion erroneously removing/merging intermediate data nodes by alexnick83 in 1307
* Fixed SDFG DFS traversal missing InterstateEdges by alexnick83 in 1320
* Frontend now uses the AST nodes' context to infer read/write accesses by alexnick83 in 1297
* Added capability for non-strict shape validation by alexnick83 in 1321
* Fixes for persistent schedule and GPUPersistentFusion transformation by tbennun in 1322
* Relax test for inter-state edges in default schedules by tbennun in 1326
* Improvements to inference of an SDFGState's read and write sets by Sajohn-CH in 1325 and 1329
* Fixed ArrayElimination pass trying to eliminate data that were already removed in 1314
* Bump certifi from 2023.5.7 to 2023.7.22 by dependabot in 1332
* Fix some underlying issues with tensor core sample by computablee in 1336
* Updated hlslib to support Xilinx Vitis >=2022.2 by carljohnsen in 1340
* Docs: mention FPGA backend tested with Intel Quartus PRO by TizianoDeMatteis in 1335
* Improved validation of NestedSDFG connectors by alexnick83 in 1333
* Remove unused global data descriptor shapes from arguments by tbennun in 1338
* Fixed Scalar data validation in NestedSDFGs by alexnick83 in 1341
* Fix for None set properties by tbennun in 1345
* Add Object to defined types in code generation and some documentation by tbennun in 1343
* Fix symbolic parsing for ternary operators by tbennun in 1346
* Fortran fix memlet indices by Sajohn-CH in 1342
* Have memory type as argument for fpga auto interleave by TizianoDeMatteis in 1352
* Eliminate extraneous branch-end gotos in code generation by tbennun in 1355
* TaskletFusion: Fix additional edges in case of none-connectors by lukastruemper in 1360
* Fix dynamic memlet propagation condition by tbennun in 1364
* Configurable GPU thread/block index types, minor fixes to integer code generation and GPU runtimes by tbennun in 1357

New Contributors
* computablee made their first contribution in 1290
* Com1t made their first contribution in 1288
* mcopik made their first contribution in 1349

**Full Changelog**: https://github.com/spcl/dace/compare/v0.14.4...v0.15

0.14.4

Minor release; adds support for Python 3.11.

0.14.3

What's Changed

Scope Schedules
The schedule type of a scope (e.g., a Map) is now also determined by the surrounding storage. If the surrounding storage is ambiguous, dace will fail with a nice exception. This means that codes such as the one below:

Python
dace.program
def add(a: dace.float32[10, 10] dace.StorageType.GPU_Global,
b: dace.float32[10, 10] dace.StorageType.GPU_Global):
return a + b b

will now automatically run the `+` and `` operators on the GPU.

(1262 by tbennun)

DaCe Profiler
Easier interface for profiling applications: `dace.profile` and `dace.instrument` can now be used within Python with a simple API:

Python
with dace.profile(repetitions=100) as profiler:
some_program(...)
...
other_program(...)

Print all execution times of the last called program (other_program)
print(profiler.times[-1])

Where instrumentation is applied can be controlled with filters in the form of strings and wildcards, or with a function:

Python
with dace.instrument(dace.InstrumentationType.GPU_Events,
filter='*add??') as profiler:
some_program(...)
...
other_program(...)

Print instrumentation report for last call
print(profiler.reports[-1])

With `dace.builtin_hooks.instrument_data`, the same technique can be applied to instrument data containers.

(1197 by tbennun)

Improved Data Instrumentation
Data container instrumentation can further now be used conditionally, allowing saving and restoring of data container contents only if certain conditions are met. In addition to this, data instrumentation now saves the SDFG's symbol values at the time of dumping data, allowing an entire SDFG's state / context to be restored from data reports.

(1202, 1208 by phschaad)

Restricted SSA for Scalars and Symbols
Two new passes (`ScalarFission` and `StrictSymbolSSA`) allow fissioning of scalar data containers (or arrays of size 1) and symbols into separate containers and symbols respectively, based on the scope or reach of writes to them. This is a form of restricted SSA, which performs SSA wherever possible without introducing Phi-nodes. This change is made possible by a set of new analysis passes that provide the scope or reach of each write to scalars or symbols.

(1198, 1214 by phschaad)

Extending Cutout Capabilities
SDFG Cutouts can now be taken from more than one state.

Additionally, taking cutouts that only access a subset of a data containre (e.g., `A[2:5]` from a data container `A` of size `N`) results in the cutout receiving an "Alibi Node" to represent only that subset of the data (`A_cutout[0:3] -> A[2:5]`, where `A_cutout` is of size 4). This allows cutouts to be significantly smaller and have a smaller memory footprint, simplifying debugging and localized optimization.

Finally, cutouts now contain an exact description of their input and output configuration. The input configuration is anything that may influence a cutout's behavior and may contain data _before_ the cutout is executed in the context of the original SDFG. Similarly, the output configuration is anything that a cutout writes to, that may be read externally or may influence the behavior of the remaining SDFG. This allows isolating all side effects of changes to a particular cutout, allowing transformations to be tested and verified in isolation and simplifying debugging.

(1201 by phschaad)

Bug Fixes, Compatability Improvements, and Other Changes
* SymPy 1.12 Compatibility by alexnick83 in https://github.com/spcl/dace/pull/1256
* GPU Grid-Strided Tiling by C-TC in https://github.com/spcl/dace/pull/1249
* Fix MapInterchange for Maps with dynamic inputs by alexnick83 in https://github.com/spcl/dace/pull/1244
* Assortment of fixes for dynamic Maps on GPU (dynamic thread blocks) by alexnick83 in https://github.com/spcl/dace/pull/1246
* Tuning Compatibility Fixes by lukastruemper in https://github.com/spcl/dace/pull/1234
* Inline preprocessor command by tbennun in https://github.com/spcl/dace/pull/1242
* `unsqueeze_memlet` fixes by alexnick83 in https://github.com/spcl/dace/pull/1203
* Fix-intermediate-nodes by alexnick83 in https://github.com/spcl/dace/pull/1212
* Fix for LoopToMap when applied on multi-nested loops by alexnick83 in https://github.com/spcl/dace/pull/1207
* Fix-nested-sdfg-deepcopy by alexnick83 in https://github.com/spcl/dace/pull/1221
* Fix integer division in Python frontend by tbennun in https://github.com/spcl/dace/pull/1196
* Fix augmented assignment on scalar in condition by tbennun in https://github.com/spcl/dace/pull/1225
* Fix internal subscript access if already existed by tbennun in https://github.com/spcl/dace/pull/1228
* Fix atomic operation detection for exactly-overlapping ranges by tbennun in https://github.com/spcl/dace/pull/1230
* Fix-gpu-transform-copy-out by alexnick83 in https://github.com/spcl/dace/pull/1231
* Fix-interstate-free-symbols by alexnick83 in https://github.com/spcl/dace/pull/1238
* Fix nested access with nested symbol dependency by alexnick83 in https://github.com/spcl/dace/pull/1239
* Fix import in the transformations tutorial. by lamyiowce in https://github.com/spcl/dace/pull/1210
* LoopToMap detects shared transients by alexnick83 in https://github.com/spcl/dace/pull/1200
* Faster CI and reachability checks for codecov.io by tbennun in https://github.com/spcl/dace/pull/1213
* Map-fission-single-data-multi-connectors by alexnick83 in https://github.com/spcl/dace/pull/1216
* Add library path to HIP CMake by tbennun in https://github.com/spcl/dace/pull/1219
* BatchedMatMul: MKL gemm_batch support by lukastruemper in https://github.com/spcl/dace/pull/1181

**Full Changelog**: https://github.com/spcl/dace/compare/v0.14.2...v0.14.3

Please [let us know](https://github.com/spcl/dace/issues) if there are any regressions with this new release.

0.14.2

What's Changed
* GPU instrumentation support with LIKWID by lukastruemper
* New [GPU expansion](https://github.com/spcl/dace/blob/4ee67f0fa86ed52ee91e8a607e7d04ffd624cb8b/dace/libraries/standard/nodes/reduce.py#L1087) for the [Reduce](https://github.com/spcl/dace/blob/4ee67f0fa86ed52ee91e8a607e7d04ffd624cb8b/dace/libraries/standard/nodes/reduce.py#L1538) Library Node by hodelcl
* [CSRMM](https://github.com/spcl/dace/blob/4ee67f0fa86ed52ee91e8a607e7d04ffd624cb8b/dace/libraries/sparse/nodes/csrmm.py#L543) and [CSRMV](https://github.com/spcl/dace/blob/4ee67f0fa86ed52ee91e8a607e7d04ffd624cb8b/dace/libraries/sparse/nodes/csrmv.py#L498) Library Nodes by alexnick83, lukastruemper, and C-TC
* New transformations ([Temporal Vectorization](https://github.com/spcl/dace/blob/860a13dc821d276559d16c0b0101560bb97ea645/dace/transformation/subgraph/temporal_vectorization.py#L13), [HBM Transform](https://github.com/spcl/dace/blob/4ee67f0fa86ed52ee91e8a607e7d04ffd624cb8b/dace/transformation/dataflow/hbm_transform.py#L128)) and other FPGA improvements by carljohnsen, jnice-81, sarahtr, and TizianoDeMatteis
* AMD GPU-related fixes and rocBLAS GEMM by tbennun

**Full Changelog**: https://github.com/spcl/dace/compare/v0.14.1...v0.14.2

0.14.1

This release of DaCe offers mostly stability fixes for the Python frontend, transformations, and callbacks.

**Full Changelog**: https://github.com/spcl/dace/compare/v0.14...v0.14.1

0.14

What's Changed

This release brings forth a major change to how SDFGs are simplified in DaCe, using the [Simplify pass pipeline](dace/transformation/passes/simplify.py). This both improves the performance of DaCe's transformations and introduces new types of simplification, such as [dead dataflow elimination](dace/transformation/passes/dead_dataflow_elimination.py).

Please [let us know](https://github.com/spcl/dace/issues) if there are any regressions with this new release.

Features
* **Breaking change**: The experimental `dace.constant` type hint has now achieved stable status and was renamed to `dace.compiletime`
* **Major change**: Only modified configuration entries are now stored in `~/.dace.conf`. The SDFG build folders still include the full configuration file. Old `.dace.conf` files are detected and migrated automatically.
* Detailed, multi-platform performance counters are now available via native [LIKWID](https://github.com/RRZE-HPC/likwid) instrumentation (by lukastruemper in https://github.com/spcl/dace/pull/1063). To use, set `.instrument` to `dace.InstrumentationType.LIKWID_Counters`
* GPU Memory Pools are now supported through CUDA's `mallocAsync` API. To enable, set `desc.pool = True` on any GPU data descriptor.
* Map schedule and array storage types can now be annotated directly in Python code (by orausch in https://github.com/spcl/dace/pull/1088). For example:
python
import dace
from dace.dtypes import StorageType, ScheduleType

N = dace.symbol('N')

dace
def add_on_gpu(a: dace.float64[N] StorageType.GPU_Global,
b: dace.float64[N] StorageType.GPU_Global):
This map will become a GPU kernel
for i in dace.map[0:N] ScheduleType.GPU_Device:
b[i] = a[i] + 1.0

* Customizing GPU block dimension and OpenMP threading properties per map is now supported
* Optional arrays (i.e., arrays that can be None) can now be annotated in the code. The simplification pipeline also infers non-optional arrays from their use and can optimize code by eliminating branches. For example:
python
dace
def optional(maybe: Optional[dace.float64[20]], always: dace.float64[20]):
always += 1 "always" is always used, so it will not be optional
if maybe is None: This condition will stay in the code
return 1
if always is None: This condition will be eliminated in simplify
return 2
return 3

Minor changes
* Miscellaneous fixes to transformations and passes
* Fixes for string literal (`"string"`) use in the Python frontend
* `einsum` is now a library node
* If CMake is already installed, it is now detected and will not be installed through `pip`
* Add kernel detection flag by TizianoDeMatteis in https://github.com/spcl/dace/pull/1061
* Better support for `__array_interface__` objects by gronerl in https://github.com/spcl/dace/pull/1071
* Replacements look up base classes by tbennun in https://github.com/spcl/dace/pull/1080

**Full Changelog**: https://github.com/spcl/dace/compare/v0.13.3...v0.14

Page 2 of 5

Releases

Has known vulnerabilities

Previous Next

Dace

Page 2 of 5

0.15

0.14.4

0.14.3

0.14.2

0.14.1

0.14

Page 2 of 5

Links

Releases