Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions docs/development/dace_codegen_reproducability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Debugging indeterministic behavior of dace transformations

- Enable printing each transformation step, e.g. using
```
dace.Config.set("progress", value=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only give you like ~95% of the cases, as transformation can also run through other means that the patter matcher or can be simple functions that do things.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to avoid editing the code and adding this line, you can export an environment variable:
export DACE_progress=1

(note the the upper/lower case unfortunately matters)

```
TODO: introduce new config var that prints the hash instead of hard-coding it.
- Execute the program in question twice and compare the output.
- Set a conditonal breakpoint in beginning of the `apply` method of the first pass where the SDFG

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply() is something that is specific to the PatternTransformation.
Not all transformations use them (but the majority does).
The most general interface is given by Pass, whcih defines the apply_pass() method.
But as mentioned above, also functions can be transformations.

hash changes with condition `sdfg.hash_sdfg() == <last equal hash>`.
Note: In case running the previous passes takes a long time it makes sense to serialize the SDFG
to json (`sdfg.to_json("sdfg(1|2).json")`) and loading it again (see debug script below) to
ease debugging. In rare cases the serializing and deserializing the sdfg changes the hash. In such
cases this trick doesn't work and the first location where the hash changes might not be the exact
location where the indeterministic behavior is. It helps to use a different hash, e.g.
`content_hash`, but this should be solved in general.
Note: It makes sense to also place a breakpoint after `DaceTranslator.generate_sdfg` to recognize
when all executions finished.
- When the location is found it is usually easy to spot the origin of the indeterminism. Often
there is a set operation or a symbol is named in an indeterministic way. Use ordered sets and
deterministic symbol names.

## Appendix

__Debugging sdfg autooptimize__

Usage `python debug_auto_optimize_sdfg.py sdfg1.json`

```python
import pickle
import sys

import dace
import json

from dace import SDFG

from gt4py.next.program_processors.runners.dace import (
lowering as gtx_dace_lowering,
sdfg_args as gtx_dace_args,
transformations as gtx_transformations,
)
from dace.utils import print_sdfg_hash

file = sys.argv[1]

with open(file) as f:
data = json.load(f)
sdfg = dace.SDFG.from_json(data)
print_sdfg_hash(sdfg)
Comment on lines +47 to +50

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also dace.SDFG.from_file().


gtx_transformations.gt_auto_optimize(
sdfg,
gpu=False,
constant_symbols={},
unit_strides_kind=None,
)
```

__Debugging single sdfg transform__

Usage `python debug_single_sdfg_transform.py sdfg1.json`

```python
import pickle
import sys

import dace
import json

from dace import SDFG

from gt4py.next.program_processors.runners.dace import (
lowering as gtx_dace_lowering,
sdfg_args as gtx_dace_args,
transformations as gtx_transformations,
)
from dace.utils import print_sdfg_hash

transformation = gtx_transformations.MoveDataflowIntoIfBody
file = sys.argv[1]

with open(file) as f:
data = json.load(f)
sdfg = dace.SDFG.from_json(data)
print_sdfg_hash(sdfg)

sdfg.apply_transformations_repeated(
transformation(
ignore_upstream_blocks=False,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be True for the correct working.

),
validate=False,
validate_all=True,
)
```
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,8 @@ dependencies = [
'toolz>=0.12.1',
'typing-extensions>=4.12.0',
'versioningit>=3.1.1',
'xxhash>=3.5.0'
'xxhash>=3.5.0',
'ordered-set>=4.0.0'
]
description = 'Python library for generating high-performance implementations of stencil kernels for weather and climate modeling from a domain-specific language (DSL)'
dynamic = ['version']
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import dace
from dace import subsets as dace_subsets
from dace.sdfg import nodes as dace_nodes
from ordered_set import OrderedSet

from gt4py.next.program_processors.runners.dace.transformations import (
splitting_tools as gtx_dace_split,
Expand Down Expand Up @@ -78,8 +79,11 @@ def _new_name(old_name: str) -> str:
elif isinstance(node, dace_nodes.NestedSDFG):
node_ = graph.add_nested_sdfg(
sdfg=copy.deepcopy(node.sdfg),
inputs=set(node.in_connectors.keys()),
outputs=set(node.out_connectors.keys()),
# TODO(tehrengruber): What is the performance optimization from Philip about?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with my performance optimization?
What cold be a problem is the copying node.sdfg might also copy the surrounding SDFG, because nested SDFGs have a reference to their parent SDFG.

# In any case this here leads to an sdfg in which the order in graph.nodes
# is indeterministic, but to_json, then from_json restores it again.
inputs={k: None for k in node.in_connectors.keys()},
outputs={k: None for k in node.out_connectors.keys()},
Comment on lines +85 to +86

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be equivelent since the data types should not change.

Suggested change
inputs={k: None for k in node.in_connectors.keys()},
outputs={k: None for k in node.out_connectors.keys()},
inputs=node.in_connectors.copy().
outputs=node.out_connectors.copy(),

It you want to play it safe ignore this suggestion.

symbol_mapping=node.symbol_mapping.copy(),
debuginfo=copy.copy(node.debuginfo),
)
Expand Down Expand Up @@ -202,8 +206,10 @@ def split_overlapping_map_range(
Two lists, each containing the ranges corresponding to the splitted range
for the first and the second map, respectively.
"""
first_map_params = set(first_map.params)
second_map_params = set(second_map.params)
# TODO(tehrengruber): The structure here looks a little funky. We just use an ordered set for
# now, but likely no sets are needed at all.
first_map_params = OrderedSet(first_map.params)
second_map_params = OrderedSet(second_map.params)
if first_map_params != second_map_params:
return None
Comment on lines -205 to 214

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sets are needed here, at least for the check that follows because a Map with parameters ["i", "j"] is the same as one with parameters ["j", "i"].
However, you can do something like:

first_map_params = sorted(first_map.params)
second_map_params = sorted(second_map.params)

then you can also remove the sorted() calls bellow as well.


Expand Down

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this file and it looks okay, although the O(N) deletion is not nice.
One could rework the algorithm to get rid of them, however I would not do this.
There is PR#2531 which updates this file and also tries to avoid in determinism.
While it still uses sets it sorts the nodes ones in a deterministic way (depending on the node order upon input).

Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@
type_inference as dace_type_inference,
utils as dace_sutils,
)
from ordered_set import OrderedSet

from gt4py.next.program_processors.runners.dace import transformations as gtx_transformations


@dace_properties.make_properties
class MoveDataflowIntoIfBody(dace_transformation.SingleStateTransformation):
"""The transformation moves dataflow into the if branches.
Expand Down Expand Up @@ -494,8 +494,8 @@ def _remove_outside_dataflow(

The function will also remove data containers that are no longer in use.
"""
all_relocatable_dataflow: set[dace_nodes.Node] = functools.reduce(
lambda s1, s2: s1.union(s2), relocatable_dataflow.values(), set()
all_relocatable_dataflow: OrderedSet[dace_nodes.Node] = functools.reduce(
lambda s1, s2: s1.union(s2), relocatable_dataflow.values(), OrderedSet()
)

# Before we can clean the original nodes, we must clean the dataflow. If a
Expand Down Expand Up @@ -525,7 +525,7 @@ def _update_symbol_mapping(
are available in the parent SDFG.
"""
symbol_mapping = if_block.symbol_mapping
missing_symbols = [ms for ms in if_block.sdfg.free_symbols if ms not in symbol_mapping]
missing_symbols = sorted([ms for ms in if_block.sdfg.free_symbols if ms not in symbol_mapping])
symbol_mapping.update({s: s for s in missing_symbols})
if_block.symbol_mapping = symbol_mapping # Performs conversion.

Expand Down Expand Up @@ -706,10 +706,10 @@ def _filter_relocatable_dataflow(
sdfg: dace.SDFG,
state: dace.SDFGState,
if_block: dace_nodes.NestedSDFG,
raw_relocatable_dataflow: dict[str, set[dace_nodes.Node]],
non_relocatable_dataflow: dict[str, set[dace_nodes.Node]],
raw_relocatable_dataflow: dict[str, OrderedSet[dace_nodes.Node]],
non_relocatable_dataflow: dict[str, OrderedSet[dace_nodes.Node]],
enclosing_map: dace_nodes.MapEntry,
) -> dict[str, set[dace_nodes.Node]]:
) -> dict[str, OrderedSet[dace_nodes.Node]]:
"""Partition the dependencies.

The function expects the dataflow that is upstream of every connector
Expand All @@ -734,23 +734,23 @@ def _filter_relocatable_dataflow(
"""

# Remove the parts of the dataflow that is unrelocatable.
all_non_relocatable_dataflow: set[dace_nodes.Node] = functools.reduce(
lambda s1, s2: s1.union(s2), non_relocatable_dataflow.values(), set()
all_non_relocatable_dataflow: OrderedSet[dace_nodes.Node] = functools.reduce(
lambda s1, s2: s1.union(s2), non_relocatable_dataflow.values(), OrderedSet()
)
relocatable_dataflow = {
conn_name: rel_df.difference(all_non_relocatable_dataflow)
for conn_name, rel_df in raw_relocatable_dataflow.items()
}

# Find the known_nodes for each branch
known_nodes: dict[dace.SDFGState, set[dace_nodes.Node]] = dict()
known_nodes: dict[dace.SDFGState, OrderedSet[dace_nodes.Node]] = dict()
for conn_name, rel_df in relocatable_dataflow.items():
branch_state, _ = self._find_branch_for(if_block=if_block, connector=conn_name)
if branch_state not in known_nodes:
known_nodes[branch_state] = set()
known_nodes[branch_state] = OrderedSet()
known_nodes[branch_state].update(rel_df)

multiple_df_nodes: set[dace_nodes.Node] = set()
multiple_df_nodes: OrderedSet[dace_nodes.Node] = OrderedSet()
# Find intersect of all known_nodes sets which are the nodes that are in the dataflow
# of multiple branches and thus doesn't make sense to relocate
for branch_state, known_nodes_set in known_nodes.items():
Expand All @@ -770,8 +770,8 @@ def _filter_relocatable_dataflow(
# the data is single use data, is not an AccessNode that refers to global
# memory nor is a source AccessNode.
def filter_nodes(
nodes_proposed_for_reloc: set[dace_nodes.Node],
) -> set[dace_nodes.Node]:
nodes_proposed_for_reloc: OrderedSet[dace_nodes.Node],
) -> OrderedSet[dace_nodes.Node]:
has_been_updated = True
while has_been_updated:
has_been_updated = False
Expand All @@ -793,7 +793,7 @@ def filter_nodes(
for oedge in state.out_edges(reloc_node)
if oedge.dst is not if_block
):
nodes_proposed_for_reloc.remove(reloc_node)
nodes_proposed_for_reloc.remove(reloc_node) # TODO(tehrengruber): this is O(N)
has_been_updated = True
continue

Expand All @@ -804,14 +804,14 @@ def filter_nodes(
# the `if` body or be mapped (remain outside but made accessible
# inside), thus their relocation state is of no concern for
# `reloc_node`.
non_mappable_incoming_nodes: set[dace_nodes.Node] = {
non_mappable_incoming_nodes: OrderedSet[dace_nodes.Node] = OrderedSet(
iedge.src
for iedge in state.in_edges(reloc_node)
if not (
(iedge.src is enclosing_map)
or isinstance(iedge.src, dace_nodes.AccessNode)
)
}
)
if non_mappable_incoming_nodes.issubset(nodes_proposed_for_reloc):
# All nodes that can not be mapped into the `if` body are
# currently scheduled to be relocated, thus there is not
Expand All @@ -826,7 +826,7 @@ def filter_nodes(
# that none of its input can. Thus we remove them from
# `nodes_proposed_for_reloc`.
nodes_proposed_for_reloc.difference_update(non_mappable_incoming_nodes)
nodes_proposed_for_reloc.remove(reloc_node)
nodes_proposed_for_reloc.remove(reloc_node) # TODO(tehrengruber): this is O(N)
has_been_updated = True

return nodes_proposed_for_reloc
Expand All @@ -838,7 +838,7 @@ def filter_nodes(
def _partition_if_block(
self,
if_block: dace_nodes.NestedSDFG,
) -> Optional[tuple[set[str], set[str]]]:
) -> Optional[tuple[OrderedSet[str], OrderedSet[str]]]:
"""Check if `if_block` can be processed and partition the input connectors.

The function will check if `if_block` has the right structure, i.e. if it is
Expand All @@ -858,7 +858,7 @@ def _partition_if_block(
return None

# These are all the output names.
output_names: set[str] = set(if_block.out_connectors.keys())
output_names: OrderedSet[str] = OrderedSet(if_block.out_connectors.keys())

# We require that the nested SDFG contains a single node, which is a
# `ConditionalBlock` containing two branches.
Expand Down Expand Up @@ -899,14 +899,14 @@ def _partition_if_block(
# So the ones that can be relocated were found exactly once. Zero would
# mean they can not be relocated and more than one means that we do not
# support it yet.
relocatable_connectors = {
relocatable_connectors = OrderedSet(
conn_name for conn_name, conn_count in reference_count.items() if conn_count == 1
}
non_relocatable_connectors = {
)
non_relocatable_connectors = OrderedSet(
conn_name
for conn_name in reference_count.keys()
if conn_name not in relocatable_connectors
}
)

if len(non_relocatable_connectors) == 0:
return None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,10 @@ def modifies(self) -> dace_ppl.Modifies:
def should_reapply(self, modified: dace_ppl.Modifies) -> bool:
return modified & (dace_ppl.Modifies.Memlets | dace_ppl.Modifies.AccessNodes)

def depends_on(self) -> set[type[dace_transformation.Pass]]:
return {
def depends_on(self) -> list[type[dace_transformation.Pass]]:
return [
dace_transformation.passes.FindAccessStates,
}
]

def apply_pass(
self, sdfg: dace.SDFG, pipeline_results: dict[str, Any]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -551,11 +551,11 @@ def modifies(self) -> dace_ppl.Modifies:
def should_reapply(self, modified: dace_ppl.Modifies) -> bool:
return modified & (dace_ppl.Modifies.Memlets | dace_ppl.Modifies.AccessNodes)

def depends_on(self) -> set[type[dace_transformation.Pass]]:
return {
def depends_on(self) -> list[type[dace_transformation.Pass]]:
return [
dace_transformation.passes.StateReachability,
dace_transformation.passes.FindAccessStates,
}
]

def apply_pass(
self, sdfg: dace.SDFG, pipeline_results: dict[str, Any]
Expand Down Expand Up @@ -1008,7 +1008,7 @@ def apply(
# This is the tasklet that we will put inside the map, note we have to do it
# this way to avoid some name clash stuff.
inner_tasklet: dace_nodes.Tasklet = graph.add_tasklet(
name=f"{tasklet.label}__clone_{str(uuid.uuid1()).replace('-', '_')}",
name=f"{tasklet.label}__clone_{dace_nodes._get_next_node_id()}", # TODO: use another global counter
outputs=tasklet.out_connectors.keys(),
inputs=set(),
code=tasklet.code,
Expand All @@ -1033,7 +1033,7 @@ def apply(

# Now we will reroute the edges went through the inner map, through the
# inner access node instead.
for old_inner_edge in list(
for old_inner_edge in list( # TODO(tehrengruber): Why all these list comprehensions everywhere?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed because you replace the edges (removing the old and adding a new one).
Furthermore, the by_connector() gives you back an iterator thus modyfing the edges will change your iteration source, which leads to a race condition.

graph.out_edges_by_connector(map_entry, "OUT_" + connector_name)
):
# We now modify the downstream data. This is because we no longer refer
Expand Down Expand Up @@ -1157,8 +1157,8 @@ def __init__(
def expressions(cls) -> Any:
return [dace.sdfg.utils.node_path_graph(cls.map_exit, cls.tmp_ac, cls.glob_ac)]

def depends_on(self) -> set[type[dace_transformation.Pass]]:
return {dace_transformation.passes.ConsolidateEdges}
def depends_on(self) -> list[type[dace_transformation.Pass]]:
return [dace_transformation.passes.ConsolidateEdges]

def can_be_applied(
self,
Expand Down
Loading