[RFC][Sim] Add triggered simulation procedures by fzi-hielscher · Pull Request #7676 · llvm/circt

fzi-hielscher · 2024-10-08T00:48:58Z

Continuing the series of #7314 and #7335 (and hoping to finally get to lower the sim.proc.print operation) this PR adds trigger-related types and operations to the Sim Dialect. The primary point is to be able to express the execution order of side-effecting ops and procedures without having to rely on operation order within a HWModule's graph region. As added benefits, triggers allow us to:

Call procedures on simulation start without using register initializers
Have explicitly concurrent procedures within a single HWModule
Impose an execution order on operations in different modules and instances
Efficiently look-up the order, if any, of two given triggers

Triggers span virtual clock trees. Their root node is either an edge event of a "real" clock (sim.on_edge) or the start of simulation (sim.on_init). When the root event occurs, all leaf operations of the given tree are triggered. In contrast to normal clock trees, trigger trees impose a partial order on their leaf nodes from which we can derive their execution order. Two leaf nodes are unordered (incomparable) if they are not part of the same trigger tree. They are concurrent (equal) if their lowest common ancestor operation is not a TriggerSequenceOp. If the lowest common ancestor is a TriggerSequenceOp the order depends on the result indices of the sequence op.

So... in practice:

%init = sim.on_init
%sequenced:2 = sim.trigger_sequence %init, 2 : !sim.trigger.init
sim.triggered on (%sequenced#0 : !sim.trigger.init) {
  // This first
}
sim.triggered on (%sequenced#1 : !sim.trigger.init) {
  // This second (concurrent with below)
}
sim.triggered on (%sequenced#1 : !sim.trigger.init) {
  // This also second (concurrent with above)
}

sim.triggered provides a region in which we can place procedural operations. These operations can have side-effects. However, they are required to make forward progress and eventually terminate independently of all other procedures and simulation events. This means that concurrent procedures are not actually required to be run in parallel. Any chosen serialization should be legal / dead-lock free. Note that during lowering previously unordered procedures can become concurrent, e.g., by CSEing their root triggers.

TriggeredOps can also produce results via the sim.yield_seq terminator. The "seq" is to indicate an implicit register at the output. I.e., results are produced in a clock synchronous fashion. At some point we"ll probably need an asynchronous sim.yield_comb. But this can create all sorts of complex interactions, so I try to put it off as long as I can 😅.
All results of sim.triggered must have an explicit tie-off constant specified. These are used both as results outside of simulation contexts (i.e., synthesis), and as (pre)initial value of the implicit register.

I have a very much proof-of-conceptish implementation of an arcilator lowering in my github fork. It can compile this little gadget, showing how to do sequenced calls to a side-effecting procedure during initialization across module instances (and print stuff).

uenoku

Interesting, thank you for working on this! Introducing triggered to core dialects might be controversial since it essentially represents behavioral constructs. I have couple questions:

How does this relate do LLHD? LLHD is I think really good at this kind of representation and is more flexible, is it possible to promote LLHD to core dialect and use it for behavioral.
on_init as an operation seems a bit weird to me. Also when on_init is provided to TriggeredSeqeneceOp it must be the first element, correct? it may be more reasonable to put on_init as an attribute on TriggeredOp.
TriggeredOp could capture values outside I think it's fairly easy to cause race conditions. If the two triggered ops are trigged at the same edge and one triggered op depend another triggered op results, what is expected behavior? Also I think there is a same problem as what we talked about seq.to_immutable. If a triggered op operand is a port, there is initialization ordering problem.

fabianschuiki · 2024-10-10T16:21:15Z

Really cool 😎!

I'm wondering how this relates to seq.initial and its !seq.immutable<T> wrapper type. Might that already do what you need? You could return a dummy value to have multiple seq.initials get ordered in a predefined way. We might even want to introduce something like a void type for that purpose. Or use i0? If seq.initial also works, we could extend hw.triggered with a similar !hw.triggered<T> wrapper and ability to return results, and thus allow you to schedule side-effecting op execution on a clock edge. What do you think?

fzi-hielscher · 2024-10-14T16:53:33Z

Thank you both for your feedback, yet another time. Let's see if I can defend my design decisions - apologies if it is getting a bit longer:

Introducing triggered to core dialects might be controversial since it essentially represents behavioral constructs.

I would argue that sim is the place in the core dialects to put behavioral constructs in, so we can keep them out of the other dialects. Initializers are a difficult corner case, but more on that later. I also would not necessarily call it "behavioral", I'd plainly call it "software". The body of sim.triggered can be used to describe software which becomes part of the simulator and which is very explicitly not meant for synthesis. That's why there is a mandatory tieoff constant for each result, so we have well-defined behavior for both simulation and synthesis. We have to live with the fact that there will be differences in behavior. My goal is to make them obvious and explicit rather than hidden and subtle, as they often tend to be in Verilog. Ideally, if you only use hw, comb and seq ops, you should have a guarantee that simulated behavior matches synthesized behavior.
Using sim.triggered you can then do somthing like this:

%init = sim.on_init
%isSimulation = sim.triggered () on (%init : !sim.trigger.init)  tieoff [0 : i1] {
  %true = hw.constant true
  sim.yield_seq %true : i1
} : () -> i1

In SV this would become:

logic isSimulation = 1'b0;
`ifndef SYNTHESIS
initial isSimulation <= 1'b1;
`endif

I'm not saying that you should do that, but at least the difference clearly originates from a sim operation.

How does this relate do LLHD? LLHD is I think really good at this kind of representation and is more flexible, is it possible to promote LLHD to core dialect and use it for behavioral.

I have to shamefully admit that I only have superficial knowledge of LLHD. But from what I have picked up so far, it is mostly aimed at event queue and time based simulation. It's great that we can do that if we must. But for frontends like FIRRTL, which don't really have a concept of time, it seems like overkill to me. For sim I'm aiming for a mechanism, that can cover most basic use-cases, but is restrictive enough to remain easy to analyze and to lower with different backends. Let me try to outline it:

Every trigger tree has an event as its root. If, and only if, the event occurs, the entire trigger tree is executed.
The exact conditions and time at which an event 'occurs' are determined by the simulation environment. The only requirement is that it is the same mechanism which is used to trigger the sampling and updating of registers.
During the execution of a trigger tree, simulation time (if it exists) is suspended.
For any two operations contained in the same tree, we can determine whether they are executed before, after or concurrently to each other.
Operations in a trigger tree must make forward progress independently of all other operations in the model. Notably, they must not wait for any event originating form within the model.
Side-effects of operations in a triggered operation must not be observable by operations outside of the same operation, unless they are passed as a result.

So, I guess the body of a TriggeredOp is pretty much the same as a "function" in LLHD. Thinking of arcilator, I recon the difference between using Sim vs. LLHD would be like the difference between using --no-timing vs. --timing for verilator. But I'll happily let @fabianschuiki have the last word here. 😅

on_init as an operation seems a bit weird to me. Also when on_init is provided to TriggeredSeqeneceOp it must be the first element, correct? it may be more reasonable to put on_init as an attribute on TriggeredOp.

on_init is kind of like a clock that pulses exactly once before all other clocks. I'm not sure I get what you mean by "the first element". TriggerSequenceOp takes precisely one trigger argument that defines its parent trigger. This can be an on_init or an on_edge root, or the result of another sequence. Having an on_init attribute on TriggeredOp would break the single root event concept.

TriggeredOp could capture values outside I think it's fairly easy to cause race conditions. If the two triggered ops are trigged at the same edge and one triggered op depend another triggered op results, what is expected behavior?

TriggeredOps simultaneously capture their argument at the occurrence of their root event. A chain of TriggeredOps on the same clock/event would behave like a shift register or a clocked pipeline. This is meant to avoid race conditions. If I ended up creating them, I did something wrong. 😬
Generally, I'd expect the majority of trigger user ops to not carry results. My primary motivation for adding this option was to be able to model the current behavior of clocked DPI calls with a procedural call operation inside a TriggeredOp. I'd like to have a unified mechanism to deal with clock synchronous function calls, independently of them being defined inline, at the top-level or externally.

Also I think there is a same problem as what we talked about seq.to_immutable. If a triggered op operand is a port, there is initialization ordering problem.

Yes. It is frustrating but at least for SV I'm afraid we cannot avoid it. As I mentioned in the other PR, I think our best option here is some sort of interface contract, either encoded by type or by an attribute, promising that any initialization of the port has occurred before the initial processes are started. For the Arc backend I don't see this as much of a problem. We should be able to insert a hook between state allocation and invocation of the initializer function that allows user code to specify the "pre-initial" value of input ports.

I'm wondering how this relates to seq.initial and its !seq.immutable wrapper type. Might that already do what you need? You could return a dummy value to have multiple seq.initials get ordered in a predefined way.

There is definitely a functional overlap with seq.initial, but I would argue that conceptually a register initializer and a procedure called at the start of simulation are different things. The former is tied to a physical register while the latter is purely a simulation artifact. Initializers occupy this weird space where they may or may not be synthesizable depending on whether you are targeting ASICs or FPGAs. Since we do not specify a tie-off value for seq.initial my interpretation of it is that it provides a (potentially) synthesizable "elaboration-time constant". On the other hand sim.triggered provides a "simulation runtime constant" with an explicit constant tie-off for synthesis. A not-too-accurate analogy could be constexpr vs. const in C++.

We might even want to introduce something like a void type for that purpose. Or use i0? If seq.initial also works, we could extend hw.triggered with a similar !hw.triggered wrapper and ability to return results, and thus allow you to schedule side-effecting op execution on a clock edge. What do you think?

If I understand you correctly you are suggesting to schedule operations via the topological order of their arguments and results, like @uenoku did for seq.initial, right? If so, my primary concern here is that we would need to handle !hw.triggered<T> operands differently depending on whether they are produced on the same or a different event as the TriggererdOp consuming them. If we were to wait on a result produced on a different clock edge, we would deadlock. If we fail to wait for a result on our "own" clock edge, we mess up execution order. This becomes especially a problem when we start passing them through module boundaries, as this dependency might change depending how the IOs are connected in the parent module. In contrast, the tree structure guarantees that we only depend on a single, well-defined parent trigger.
Note that it is fairly easy to convert the tree structure to a topological dependency graph, while the other way around is generally not possible. I implemented this "unraveling" in my arc lowering prototype. But it happens after module inlining, when we can see the entire picture. I would have talked more about this, but given your recent wave of PRs, I need to first check how much of it has become obsolete. 😉
I would also argue that, for a middle-end representation, a tree has some minor "ergonomic" benefits, like not having to deal with unit value results for print and similar operations, and more efficient look-up of order between two arbitrary operations (assuming trigger trees will be mostly flat).
The only real drawback I can see so far is the inability to pass values between TriggeredOps "immediately", i.e. on the same clock edge. There are ways around it, but they would violate the "side-effects are not observable outside of the current op" property. I think in many real-world cases where this becomes necessary we should also be able to just merge the TriggeredOps.

TL;DR: I like trees. 🌲

fzi-hielscher · 2024-11-04T22:26:02Z

After letting this settle for a month I still think it is a viable approach (which, for me, is somewhat unusual 😅). So, let me nudge it out of draft mode. My condensed argument in favor of trigger trees instead of a token-flow approach would be that they structurally guarantee deadlock freedom even through opaque interfaces, while clinging to the familiar concept of clock trees. The only new concept that is added is sim.trigger_sequence.

I think it is worth highlighting that sim.triggered cannot be used to initialize (hardware) registers and was never meant to do that. I've grown increasingly convinced that it would make sense to add a sim.initial op that can work in tandem with seq.initial to have a clean separation between non-synthesizable and synthesizable register initialization. But that's for another PR.

The problem of non-deterministic initialization order in SV remains a pain point. I don't know how much of that should bleed into the core dialects. At the moment my gut-feeling is to avoid being overly restrictive in the middle-end and then have a legalization/sanitizer pass try to convert it into deterministic behavior during SV lowering.

I haven't gotten around writing a lowering for the new arc passes, yet. I've been eyeing the discussion on arc.task in #7650. At first glace it looks to me like a good fit to lower sim.triggered to, even if we don't actually want to do multi threading.

darthscsi · 2024-11-19T15:45:54Z

Starting to go through this. Is it simpler to say that the ordering of tasks is the in-order traversal of the tree from the root of trigger-sequences with multiple leaves on the same edge being unordered?

fzi-hielscher · 2024-11-19T17:18:42Z

Starting to go through this. Is it simpler to say that the ordering of tasks is the in-order traversal of the tree from the root of trigger-sequences with multiple leaves on the same edge being unordered?

Going by my nomenclature above leaves on the same edge would be "concurrent". "Unordered" would imply that they are not part of the same tree. But other than that, yes. In-order traversal always provides a legal order. Iff no trigger value has more than one user, it is the only legal order.

darthscsi · 2024-12-11T16:28:25Z

+  Pure,
+  DeclareOpInterfaceMethods<InferTypeOpInterface, ["inferReturnTypes"]>
+]> {
+  let summary = "Invoke a trigger on a clock edge event.";


nit: invoke -> create ?

darthscsi · 2024-12-11T16:29:44Z

+def YieldSeqOp : SimOp<"yield_seq",[
+  Terminator, HasParent<"circt::sim::TriggeredOp">
+]> {
+  let summary = [{Yield results form a triggerd region with 'seq'


nit form -> from

darthscsi · 2024-12-11T16:32:24Z

+   (i.e. register-like) semantics."}];
+  let description = [{
+    Terminates a triggered region and produces the given list of values.
+    The results only become visible after all triggers and register updates


What does register updates mean here? Registers in seq are not tied to these triggers.

This relates to what I've tried to formulate in my comment above:

The exact conditions and time at which an event 'occurs' are determined by the simulation environment. The only requirement is that it is the same mechanism which is used to trigger the sampling and updating of registers.

That point is crucial to ensure we don't get any race conditions when mixing registers and the results of TriggeredOps. In practice this means for a a legal SV lowering the results must be produced via a non-blocking assignment that is evaluated at the same timestep and in the same scheduling region as registers updated on the clock that the given trigger is sensitive to.

darthscsi · 2024-12-11T16:41:14Z

+      ^bb0(%arg0: i8):
+      %cst1 = hw.constant 1 : i8
+      %inc = comb.add bin %arg0, %cst1 : i8
+      sim.yield_seq %inc : i8


yield significantly overlaps with registers. It makes sense to be able to get values out of triggered regions, but do we want trigger regions to have implicit storage?

%r = reg %v %v = sim.triggered() on ... { %inc = comb.add bin $r, const(1) sim.yield %inc }

performs the same thing without introducing redundant ways to store values.

True, it would make for tidier semantics. But I'm afraid the register-like behavior of yield is required to not turn sim.triggered into a massive footgun. My points of concern are:

Since TriggeredOp only provides a defined value at the point(s) in time when the trigger's root event occurs, what would be the observed result values outside of these points. I.e., what happens if the clock of the trigger does not match the clock of the register? It would be possible to use the tieoff values here, but I'm not sure that's a good idea.

What would be the appropriate clock for a register at the output of an initially triggered procedure?

Without an appropriately formed register at the output, a valid SV lowering may become next to impossible. We'd have to explicitly sequence/synchronize every single use of the result value with the producing procedure. How would we possibly do this with e.g., non-procedural continuous assignments?

darthscsi · 2024-12-11T16:48:32Z

+    For non-simulation flows the results are replaced by their tie-off values.
+  }];
+  let arguments = (ins AnyTriggerType:$trigger,
+                       Optional<I1>:$condition,


Isn't the condition redundant with an if inside the body?

Yes. But without the condition on the outside, we'd have to loop back the results of the TriggeredOp to its arguments, so the if on the inside can switch between the old and the new value. It's basically the same discussion as having an explicit enable on a register vs. muxing the current value into the input. At this level of abstraction I'd prefer to avoid back-edges when possible.
Anyhow - I've removed the condition for now. But I intend to replace it with something else. See my comment below.

darthscsi · 2024-12-11T16:49:03Z

+    root event occurs.
+    The body region must complete without 'consuming' simulation time. It
+    may not run indefinitely or wait for any simulation event. It is allowed to
+    have side-effects and produce results.


what does side-effect mean here?

That's the million-dollar question. 😬
To quote myself from above:

Side-effects of operations in a triggered operation must not be observable by operations outside of the same operation, unless they are passed as a result.

That requirement is both confusingly phrased and overly conservative. Maybe a better way of thinking about this is to separate between the hardware model (i.e., anything written in hw, comb, seq) and the environment (everything else, including the body of TriggeredOps, the simulator, the OS, etc.) The requirement would then be, that all information flow between the model and the environment has to be relayed either through the top-level IOs, or the results and arguments of sim operations (maybe also inner symbols ?!?). At that point the body regions of TriggerdOps would be pretty much free to do whatever they want, unless it causes an effect that is observable from within the model.

But I cannot say I've nailed this down, yet.

darthscsi · 2024-12-11T16:53:29Z

+    may not run indefinitely or wait for any simulation event. It is allowed to
+    have side-effects and produce results.
+    For every result a 'tieoff' constant must be provided. It specifies the
+    respective result's value before the body is first invoked.


how is a tie off different from an on_init trigger? This also seems to be a mechanism to make these sequences encode registers.

Can we have a clean separation of state and triggers or do we really need to go down the verilog style inferred registers path?

Tie-offs provide a pre-initial value, i.e., the value an on_init triggered procedure observes when looking at its own results. They are closely coupled to the "initialization at declaration" in SV (Section 10.5 of the spec). The tie-off and implied state make sure we have a defined value for the results at any point of execution. IMHO that's the lesser of evils.

darthscsi · 2024-12-11T16:54:51Z

                                           ::mlir::Type type,
                                           ::mlir::Location loc) {

+  // Delegate non 'sim' types to the HW dialect materializer.


I'm curious why this is necessary? Are hw constant requests really winding up here? Why?

This is necessary materialize the tie-off integer attributes of TriggeredOps as hw.constant when they get folded.

I've just removed the fold method for the moment, as it relied on the condition argument. But I'd keep this section for future use, if you don't mind.

darthscsi · 2024-12-11T17:25:44Z

+  // leave it to the parent's canonicalization.
+  if (auto parentSeq = op.getParent().getDefiningOp<TriggerSequenceOp>()) {
+    if (parentSeq == op) {
+      op.emitWarning("Recursive trigger sequence.");


Isn't this a failure? Shouldn't it be covered by the verifier? What about multi-op sequences which are circular?

In practice a cyclic trigger graph is almost certainly a bug. But theoretically it is still well-formed: It has no root event, thus it is never invoked. There is no good way of handling this right now, but that should change with the soon to be added sim.never op, which just creates a constant "dead" trigger.

Multi-op circular graphs will be opportunistically handled by the flattening DFS. I'm not sure it is worth spending any effort on making sure all cycles are detected and replaced.

darthscsi · 2024-12-11T17:27:58Z

+  if (!canBeChanged)
+    return failure();
+
+  // DFS for inlinable values.


Can this be done instead on an op-by-op canonicalizer which doesn't walk the IR? having DFS in a canonicalizer is a good way to have n^2 behavior.

This is how I've implemented it originally, but I think the current variant is more efficient. Note that:

The first thing the canonicalizer does it to check, whether the current op can be inlined by the parent. If so, it bails out immediately. That way we can guarantee that the DFS only is performed from the root of a collapsible sub-tree and the entire sub-tree is collapsed with a single op rewrite.

The DFS will only enter a child node if it will be inlined. Thus, it should not do any more traversal than an op-by-op canonicalizer would also need to do.

The DFS itself is pretty cheap. Since all nodes are guaranteed to have a single predecessor we do not need to maintain a set of visited nodes to prevent walking into a cycle.

My motivation to replace the original implementation was to minimize the number of op rewrites. Since a TriggerSequenceOp can have a lot of results and users, I didn't want to substitute them repeatedly. The DFS makes sure this happens in one shot. Given all this, I do not see how this would cause a performance regression. But maybe I'm missing something?

darthscsi

I think this is on a good track. My biggest concern is the implicit register representation being redundant with seq.

Minor questions about n^2 behavior in canonicalization.

fzi-hielscher · 2024-12-12T17:53:41Z

Thanks a lot for your review. I hope, I could provide a somewhat convincing motivation for my decisions in the comments.

I have removed the condition argument for TriggeredOps for the moment. My intent is to replace them with the sim.trigger_gate op, that can be inserted into the tree to selectively disable the entire sub-tree below. To me, this has turned out to be the more useful mechanism when writing the lowerings. It can be seen in action in #7973 . I'll add a more helpful description to that PR tomorrow. It is meant to illustrate how a full lowering pipeline form FIRRTL to SV using triggers can look like.

maerhart · 2024-12-22T10:52:06Z

+include "mlir/Interfaces/SideEffectInterfaces.td"
+include "mlir/Interfaces/FunctionInterfaces.td"
+include "circt/Dialect/Sim/SimDialect.td"
+include "circt/Dialect/Sim/SimTypes.td"
+include "circt/Dialect/Seq/SeqTypes.td"


maerhart · 2024-12-22T10:55:10Z

+  let summary = [{Yield results from a triggerd region with 'seq'
+   (i.e. register-like) semantics."}];


Extra " at the end. Summary should probably also be short enough to fit in one 80 col line. Everything else can go in 'description'.

maerhart · 2024-12-22T10:59:24Z

+def EdgeTriggerType : SimTypeDef<"EdgeTrigger"> {
+  let summary = "Trigger derived from an edge event.";
+  let parameters = (ins "::circt::hw::EventControl":$edgeEvent);
+  let mnemonic = "trigger.edge";
+  let assemblyFormat = "`<` $edgeEvent `>`";
+}
+
+def InitTriggerType : SimTypeDef<"InitTrigger"> {
+  let summary = "Trigger derived from the simulation start event.";
+  let mnemonic = "trigger.init";
+}
+
+def AnyTriggerType : AnyTypeOf<[EdgeTriggerType, InitTriggerType]>;


Why do we need separate types for init and edge and why does edge need an event control attr? Why would a procedural region care by which trigger it was invoked?

That was a provision for verilog lowering. The idea was to be able to determine whether we need to produce an always @(posedge ... ) or an always @(negedge ... ) etc. purely based on the trigger's type, in case the root trigger op is out of scope. I've since changed how verilog lowering works. It should no longer be necessary and I'll remove the event control attribute.
I'm more hesitant though to merge !trigger.edge and !trigger.init. I think !tringger.init could come in handy for side-effecting non-synthesizable register/memory initializers. E.g., it could be used to sequence sim.initial ops, should we decide to go that way.

maerhart · 2024-12-22T11:04:56Z

+}
+
+def TriggeredOp : SimOp<"triggered", [
+  IsolatedFromAbove,


Just wondering about why you decided to isolate it from above and pass through inputs explicitly?

Good point. I don't see a hard technical reason to have it IsolatedFromAbove, but conceptually it seemed to me like the obvious thing to do. I'm thinking of triggered ops as inline-defined procedures, so giving them arguments simply felt natural. It should also drive home the point that all arguments of a trigger tree are captured at the same time and can never change during execution of that tree. Whether we can actually keep that promise for verilog lowering is (sadly) a different story...

Flatten nested sequences in a single rewrite rather than iteratively.

fzi-hielscher · 2026-05-28T14:13:19Z

Let's do some PR necromancy! We've discussed this again in yesterday's ODM (CC @uenoku @seldridge @darthscsi ). In hindsight I should have done a proposal on the general shape of the sequenced IR first before diving into all the implementation details. My excuse is that a wanted to make sure that what I propose can actually be implemented. So, let's focus on the overall concept for now:

I had based the trigger tree design on two premises:
First, it should not have suspend/resume semantics. It should always be possible to serialize a triggered group of operations into a simple function to make lowering to non-event based simulations easier. And it guarantees that all triggered ops observe the same state for passed-in values during one execution run.
Second, it should allow to model arbitrary fork-join execution flows within the body of a HWModule (and optionally even across module boundaries). If we decide that we don't need this, I would argue that we should simply rely on textual order. Just because we are in a graph region, we don't have to pretend that textual order doesn't exist.

Realistically, to achieve this, we need some kind of token SSA values to represent the dependencies in the graph. Note that trigger trees, as proposed here, are effectively token graphs with bidirectional flow on the SSA edges. When a trigger user op receives a token, it guarantees to return that token on the same edge within the same time step. Forks are encoded implicitly where a trigger SSA value has more than one user. Joins are also present implicitly in between all results of a TriggerSequenceOp.

Fundamentally, when modeling token flows, I see two problems we have to consider:

Problem 1: Since we have ruled out suspending execution at a join until all tokens have arrived, what do we do if we join tokens originating from different root triggers (e.g., clock edges)? Some of them may never arrive until we progress the simulation time.

Problem 2: How can we safely pass results between TriggeredOps within the same time step if the determined execution order is not a topological order of the result value dependencies? We could be using values before they are defined.

IMHO, trigger trees are a fairly elegant solution to Problem 1. It allows us to model joins without having an operation that receives more than one trigger/token value. Thus, the DAG cannot depend on more than one root trigger. However, they do not solve Problem 2, and I think that is their most notable drawback. But consider that in FIRRTL (as far as I know) it is currently not even possible to pass results between synchronous side-effecting operations. Until that changes, I don't see this as a significant issue. And even if it changes, I would assume that we usually could simply lower dependent operations into the same TriggeredOp body.

But let's take a step back and consider "generic" token DAGs. The obvious solution to Problem 2 then would be to consider all returned values as tokens (either implicitly or explicitly by wrapping them in a parameterized TokenType), and execute the TriggeredOps in a topological order. This effectively means, that a TriggeredOp has to do a join on all the token values flowing into it. So we're now facing Problem 1.

We could try to fix that problem purely through semantics: If we know that a given root trigger is inactive during a time step, we can specify that all TriggeredOps that directly or indirectly depend on it won't be executed at this step. That would basically shift the problem into the runtime.
I am not a fan of that idea. I doubt that this would even be possible to do in SystemVerilog.

A structural fix would be to encode the root trigger (domain) in the type of the token values and only allow joins on matching types.
While that would be conceptually sound, I don't think it is a practical solution. Trying to encode this kind of information in an MLIR type has always turned out to be a disaster for me. You try to do seemingly simple local changes to the IR and end up having to rewrite the entire downstream DAG to adjust it to the changed type. And all hope is lost when you try to pass it through a call or instance-like operation.

So, in the end, I think we cannot avoid both problems by design and the choice boils down to which one is more important to dodge. Either way, we can of course always try to check and reject problematic IR during lowering.
I'd be happy to hear your ideas and opinions on it.

nanjo712 · 2026-05-29T04:58:51Z

I think it would be helpful to clarify the exact requirement here.

At least for the current use case, my understanding is that we want to preserve the order of side-effecting sim operations that appear in graph regions, probably within a single module.

Maybe I am missing some context, but for this narrower problem, it seems that we could also solve it by putting the relevant operations into a single sim.triggered block, either with a dedicated SquashSimTriggered pass or with a helper similar to the addToSimTriggered function I used in #10545.

I am not saying that the design in this PR would not solve the problem. My concern is more about whether it may be too heavy-weight for this particular requirement.

From my reading, this PR seems to spend a significant amount of effort on modeling ordering between operations in different modules and instances. I do not yet understand the use cases that require that level of cross-module ordering, and it seems to go beyond the narrower requirement above.

If we temporarily restrict the discussion to a single module, then the main advantage of this approach seems to be that it allows multiple sim.triggered blocks to have an explicit order between them. I am still not sure whether that extra expressiveness is necessary for the current use case, compared to simply combining the ordered operations into one sim.triggered block.

I would be very grateful if you could clarify my confusion! 😘

seldridge · 2026-05-30T01:41:34Z

@fzi-hielscher wrote:

If we decide that we don't need this, I would argue that we should simply rely on textual order. Just because we are in a graph region, we don't have to pretend that textual order doesn't exist.

I think this is a reasonable way to handle things. However, the graph region should provide no ordering guarantees. I.e., if things are in a certain order in the graph region, then it is "nice" to lower this to Verilog in the same order. However, it's also legal for us to completely shuffle these. Similarly, most simulations will schedule always blocks in the same module with the same trigger sequentially (it's logical and it's likely the easiest way to implement it).

@fzi-hielscher wrote:

But consider that in FIRRTL (as far as I know) it is currently not even possible to pass results between synchronous side-effecting operations.

Yeah, the traditional side effecting ops (assert, stop, etc.) don't have results so there's no way to pass anything around. Also, their side effects can't "interact". It may be possible to have interactions with DPI, though, if a user is doing this, I think they're hitting undefined behavior. (I don't think there's a problem here. I agree with your comment.)

@fzi-hielscher wrote:

[...]

There's no problem with needing to pass information within a domain, correct? There should always be a solution there.

If the real problem only arises when dealing with different triggers, then that may hint that we are needing better modeling of both: (1) a generalization of "domain" information to color everything that is in the same "domain" and (2) there are explicit domain synchronizing operations which define the ordering.

The direction I've been going with FIRRTL domains is closer to your:

@fzi-hielscher wrote:

That would basically shift the problem into the runtime.

The FIRRTL pipeline just checks, "Does the program ever let two domains talk to each other?" If so, then error. However, allow the user to do domain casts (as it's not useful to have a circuit where two domains can never communicate). Provide some lightweight checking via property assertions that the domain cast is legal (as a domain cast is maximally unsafe). However, this provides zero guidance on how the domain cast works and it's essentially up to the designer to not screw it up. I.e., the actual semantics of the crossing are whatever the Verilog simulator determines or the synthesis tools infer/do.

This doesn't work for core as we'd like to simulate it in Arc, though!

This is a roundabout way of saying that you may be bumping up against a facet of domain modeling in core. Perhaps we should tackle that head on.

seldridge · 2026-05-30T02:25:51Z

@nanjo712 wrote:

If we temporarily restrict the discussion to a single module, then the main advantage of this approach seems to be that it allows multiple sim.triggered blocks to have an explicit order between them. I am still not sure whether that extra expressiveness is necessary for the current use case, compared to simply combining the ordered operations into one sim.triggered block.

What you're proposing is an alternative and closer to the runtime having to deal with it. Or: we could assume that there is no defined ordering (like how Verilog handles it) and then it's the responsibility of the language front end or the lowering from the language front end (e.g., FIRRTL's LowerToHW conversion) to get things in the right shape for what it wants.

This doesn't solve the case of needing to define an ordering between blocks with different triggers, though, or to define an ordering between blocks globally and inter-module. There are some use cases for blocks with different triggers. E.g., modules may have multiple clocks. You'd like to be able to guarantee that all the prints from one clock domain (fast clock) happen before the prints from a second domain (slow clock). This could matter for some log post-processing. However, there's no way to express this today. (Note: there are other approaches for this example like using logging suffixes and post-processing. The example was more just trying to provide motivation for something that we can't express today.)

fzi-hielscher · 2026-05-31T22:27:50Z

Thanks for your feedback, @nanjo712 and @seldridge!

So, I think the question we have to answer before anything else is:
Do we want to be able to sequence procedures that are triggered form different domains? And, if not, do we need to sequence TriggeredOps at all?

I had effectively ruled out the idea of sequencing between domains. As of today, the FIRRTL spec only defines an order for operations that are on the same clock. And it is very difficult to do in SV.
Consider @seldridge's examlpe with a fast and a slow clock: When the simulation detects an edge on the slow clock signal but the edge on the fast clock has not happened yet, we don't know if that edge is still about to occur or will never occur in the current time step. So, we cannot decide whether to execute the slow clock's print operations right away or to wait until the fast clock's prints have been completed. The only (rather hacky) solution I can think of here is to nudge the evaluation of the TriggeredOps into a new scheduling region using #0;, where we can assume no more signal transitions to be happening.

From my reading, this PR seems to spend a significant amount of effort on modeling ordering between operations in different modules and instances. I do not yet understand the use cases that require that level of cross-module ordering, and it seems to go beyond the narrower requirement above.

It is definitely no necessity, but IMHO a "nice to have" option in the IR as it would allow us to freely move side effecting operations without changing semantics.

There's no problem with needing to pass information within a domain, correct? There should always be a solution there.

Correct, as long as there are no cyclic dependencies.

This is a roundabout way of saying that you may be bumping up against a facet of domain modeling in core. Perhaps we should tackle that head on.

Independently of the TriggeredOp, I would very much appreciate having domain information in the core dialects.

nanjo712 · 2026-06-02T08:12:49Z

I think there are several different issues mixed together here.

First, there is the question of ordering across different clock domains. I agree that clock/domain information should probably be modeled in some way in the core dialects, but we currently do not have such a mechanism. That is one missing piece. More importantly, ordering across different clock domains seems semantically odd to me. If two procedures do not share a clock/domain, I do not see a clear reason why we should impose an order between them. Maybe we have reached some agreement that, at least for now, this problem is not well-defined.

Second, there is the question of ordering across modules within the same clock domain. Without explicit clock/domain information, even determining whether two modules are in the same clock domain can be difficult. I also do not yet see a sufficiently strong use case for this. It would certainly be a nice feature to have, but given the current state of the IR, this also seems difficult to model cleanly.

Third, there is the question of ordering within the same module and the same clock domain. In this case, determining whether operations are under the same clock is much easier, especially during lowering. However, for this case, instead of assigning an explicit order between multiple sim.triggered blocks through some additional mechanism, simply merging the relevant operations into a single sim.triggered block seems like a much simpler and more direct solution.

It would of course be great if the first and second problems could be solved eventually. However, I think that would probably require us to first establish some notion of clock/domain information in the core dialects.

uenoku reviewed Oct 10, 2024

View reviewed changes

fzi-hielscher mentioned this pull request Oct 18, 2024

[Arc] Remove obsolete arc.clock_tree and arc.passthrough ops #7704

Merged

fzi-hielscher force-pushed the sim-triggered branch 2 times, most recently from 0ed5c45 to 27e6769 Compare November 4, 2024 21:40

fzi-hielscher marked this pull request as ready for review November 4, 2024 22:26

fzi-hielscher requested review from fabianschuiki and uenoku November 4, 2024 22:26

fzi-hielscher force-pushed the sim-triggered branch from 27e6769 to b03365d Compare November 22, 2024 11:38

darthscsi reviewed Dec 11, 2024

View reviewed changes

fzi-hielscher force-pushed the sim-triggered branch from ad81565 to 54a2f26 Compare December 12, 2024 15:13

fzi-hielscher mentioned this pull request Dec 14, 2024

[Do-Not-Merge][FIRRTL][Sim][SV] Rework printf lowering pipeline #7973

Draft

maerhart reviewed Dec 22, 2024

View reviewed changes

fzi-hielscher force-pushed the sim-triggered branch 2 times, most recently from 587820e to 42e2f59 Compare January 8, 2025 15:24

fzi-hielscher added 9 commits June 3, 2025 00:03

Implement trigger ops.

fd15723

Add trigger tests

428fc3d

Don't CSE TriggerSequenceOps

91d16a3

EOF

e2bdb81

Improve TriggerSequenceOp canonicalizer

bc51768

Flatten nested sequences in a single rewrite rather than iteratively.

Remove TriggeredOp condition operand

7aba24a

Doc review

1601d40

Tablegen comments

0a27e1b

Remove EdgeEventAttr from sim.trigger.edge type

080017a

fzi-hielscher force-pushed the sim-triggered branch from 42e2f59 to 080017a Compare June 2, 2025 22:20

Update error test

4fcf651

fzi-hielscher mentioned this pull request Sep 2, 2025

[LLHD] Remove redundant destination operands from wait operation #8870

Draft

fzi-hielscher mentioned this pull request Apr 8, 2026

[Sim][SimToSV] Supplementing the infrastructure for Sim dialects #10146

Closed

fzi-hielscher mentioned this pull request May 28, 2026

[FIRRTLToHW] Add support for lower-to-core in firrtl.fflush #10545

Open

		let summary = [{Yield results from a triggerd region with 'seq'
		(i.e. register-like) semantics."}];

Conversation

fzi-hielscher commented Oct 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uenoku left a comment

Choose a reason for hiding this comment

Uh oh!

fabianschuiki commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fzi-hielscher commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fzi-hielscher commented Nov 4, 2024

Uh oh!

darthscsi commented Nov 19, 2024

Uh oh!

fzi-hielscher commented Nov 19, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

darthscsi left a comment

Choose a reason for hiding this comment

Uh oh!

fzi-hielscher commented Dec 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fzi-hielscher commented May 28, 2026

Uh oh!

nanjo712 commented May 29, 2026

Uh oh!

seldridge commented May 30, 2026

Uh oh!

seldridge commented May 30, 2026

Uh oh!

fzi-hielscher commented Oct 8, 2024 •

edited

Loading

fabianschuiki commented Oct 10, 2024 •

edited

Loading

fzi-hielscher commented Oct 14, 2024 •

edited

Loading