[RFC][Sim] Add triggered simulation procedures#7676
Conversation
uenoku
left a comment
There was a problem hiding this comment.
Interesting, thank you for working on this! Introducing triggered to core dialects might be controversial since it essentially represents behavioral constructs. I have couple questions:
- How does this relate do LLHD? LLHD is I think really good at this kind of representation and is more flexible, is it possible to promote LLHD to core dialect and use it for behavioral.
on_initas an operation seems a bit weird to me. Also whenon_initis provided to TriggeredSeqeneceOp it must be the first element, correct? it may be more reasonable to puton_initas an attribute onTriggeredOp.- TriggeredOp could capture values outside I think it's fairly easy to cause race conditions. If the two triggered ops are trigged at the same edge and one triggered op depend another triggered op results, what is expected behavior? Also I think there is a same problem as what we talked about seq.to_immutable. If a triggered op operand is a port, there is initialization ordering problem.
|
Really cool 😎! I'm wondering how this relates to |
|
Thank you both for your feedback, yet another time. Let's see if I can defend my design decisions - apologies if it is getting a bit longer:
I would argue that %init = sim.on_init
%isSimulation = sim.triggered () on (%init : !sim.trigger.init) tieoff [0 : i1] {
%true = hw.constant true
sim.yield_seq %true : i1
} : () -> i1In SV this would become: logic isSimulation = 1'b0;
`ifndef SYNTHESIS
initial isSimulation <= 1'b1;
`endifI'm not saying that you should do that, but at least the difference clearly originates from a
I have to shamefully admit that I only have superficial knowledge of LLHD. But from what I have picked up so far, it is mostly aimed at event queue and time based simulation. It's great that we can do that if we must. But for frontends like FIRRTL, which don't really have a concept of time, it seems like overkill to me. For
So, I guess the body of a TriggeredOp is pretty much the same as a "function" in LLHD. Thinking of arcilator, I recon the difference between using Sim vs. LLHD would be like the difference between using
TriggeredOps simultaneously capture their argument at the occurrence of their root event. A chain of TriggeredOps on the same clock/event would behave like a shift register or a clocked pipeline. This is meant to avoid race conditions. If I ended up creating them, I did something wrong. 😬
Yes. It is frustrating but at least for SV I'm afraid we cannot avoid it. As I mentioned in the other PR, I think our best option here is some sort of interface contract, either encoded by type or by an attribute, promising that any initialization of the port has occurred before the
There is definitely a functional overlap with
If I understand you correctly you are suggesting to schedule operations via the topological order of their arguments and results, like @uenoku did for TL;DR: I like trees. 🌲 |
0ed5c45 to
27e6769
Compare
|
After letting this settle for a month I still think it is a viable approach (which, for me, is somewhat unusual 😅). So, let me nudge it out of draft mode. My condensed argument in favor of trigger trees instead of a token-flow approach would be that they structurally guarantee deadlock freedom even through opaque interfaces, while clinging to the familiar concept of clock trees. The only new concept that is added is I think it is worth highlighting that The problem of non-deterministic initialization order in SV remains a pain point. I don't know how much of that should bleed into the core dialects. At the moment my gut-feeling is to avoid being overly restrictive in the middle-end and then have a legalization/sanitizer pass try to convert it into deterministic behavior during SV lowering. I haven't gotten around writing a lowering for the new arc passes, yet. I've been eyeing the discussion on |
|
Starting to go through this. Is it simpler to say that the ordering of tasks is the in-order traversal of the tree from the root of trigger-sequences with multiple leaves on the same edge being unordered? |
Going by my nomenclature above leaves on the same edge would be "concurrent". "Unordered" would imply that they are not part of the same tree. But other than that, yes. In-order traversal always provides a legal order. Iff no trigger value has more than one user, it is the only legal order. |
27e6769 to
b03365d
Compare
| Pure, | ||
| DeclareOpInterfaceMethods<InferTypeOpInterface, ["inferReturnTypes"]> | ||
| ]> { | ||
| let summary = "Invoke a trigger on a clock edge event."; |
| def YieldSeqOp : SimOp<"yield_seq",[ | ||
| Terminator, HasParent<"circt::sim::TriggeredOp"> | ||
| ]> { | ||
| let summary = [{Yield results form a triggerd region with 'seq' |
| (i.e. register-like) semantics."}]; | ||
| let description = [{ | ||
| Terminates a triggered region and produces the given list of values. | ||
| The results only become visible after all triggers and register updates |
There was a problem hiding this comment.
What does register updates mean here? Registers in seq are not tied to these triggers.
There was a problem hiding this comment.
This relates to what I've tried to formulate in my comment above:
The exact conditions and time at which an event 'occurs' are determined by the simulation environment. The only requirement is that it is the same mechanism which is used to trigger the sampling and updating of registers.
That point is crucial to ensure we don't get any race conditions when mixing registers and the results of TriggeredOps. In practice this means for a a legal SV lowering the results must be produced via a non-blocking assignment that is evaluated at the same timestep and in the same scheduling region as registers updated on the clock that the given trigger is sensitive to.
| ^bb0(%arg0: i8): | ||
| %cst1 = hw.constant 1 : i8 | ||
| %inc = comb.add bin %arg0, %cst1 : i8 | ||
| sim.yield_seq %inc : i8 |
There was a problem hiding this comment.
yield significantly overlaps with registers. It makes sense to be able to get values out of triggered regions, but do we want trigger regions to have implicit storage?
%r = reg %v
%v = sim.triggered() on ... {
%inc = comb.add bin $r, const(1)
sim.yield %inc
}
performs the same thing without introducing redundant ways to store values.
There was a problem hiding this comment.
True, it would make for tidier semantics. But I'm afraid the register-like behavior of yield is required to not turn sim.triggered into a massive footgun. My points of concern are:
- Since TriggeredOp only provides a defined value at the point(s) in time when the trigger's root event occurs, what would be the observed result values outside of these points. I.e., what happens if the clock of the trigger does not match the clock of the register? It would be possible to use the tieoff values here, but I'm not sure that's a good idea.
- What would be the appropriate clock for a register at the output of an
initially triggered procedure? - Without an appropriately formed register at the output, a valid SV lowering may become next to impossible. We'd have to explicitly sequence/synchronize every single use of the result value with the producing procedure. How would we possibly do this with e.g., non-procedural continuous assignments?
| For non-simulation flows the results are replaced by their tie-off values. | ||
| }]; | ||
| let arguments = (ins AnyTriggerType:$trigger, | ||
| Optional<I1>:$condition, |
There was a problem hiding this comment.
Isn't the condition redundant with an if inside the body?
There was a problem hiding this comment.
Yes. But without the condition on the outside, we'd have to loop back the results of the TriggeredOp to its arguments, so the if on the inside can switch between the old and the new value. It's basically the same discussion as having an explicit enable on a register vs. muxing the current value into the input. At this level of abstraction I'd prefer to avoid back-edges when possible.
Anyhow - I've removed the condition for now. But I intend to replace it with something else. See my comment below.
| root event occurs. | ||
| The body region must complete without 'consuming' simulation time. It | ||
| may not run indefinitely or wait for any simulation event. It is allowed to | ||
| have side-effects and produce results. |
There was a problem hiding this comment.
what does side-effect mean here?
There was a problem hiding this comment.
That's the million-dollar question. 😬
To quote myself from above:
Side-effects of operations in a triggered operation must not be observable by operations outside of the same operation, unless they are passed as a result.
That requirement is both confusingly phrased and overly conservative. Maybe a better way of thinking about this is to separate between the hardware model (i.e., anything written in hw, comb, seq) and the environment (everything else, including the body of TriggeredOps, the simulator, the OS, etc.) The requirement would then be, that all information flow between the model and the environment has to be relayed either through the top-level IOs, or the results and arguments of sim operations (maybe also inner symbols ?!?). At that point the body regions of TriggerdOps would be pretty much free to do whatever they want, unless it causes an effect that is observable from within the model.
But I cannot say I've nailed this down, yet.
| may not run indefinitely or wait for any simulation event. It is allowed to | ||
| have side-effects and produce results. | ||
| For every result a 'tieoff' constant must be provided. It specifies the | ||
| respective result's value before the body is first invoked. |
There was a problem hiding this comment.
how is a tie off different from an on_init trigger? This also seems to be a mechanism to make these sequences encode registers.
Can we have a clean separation of state and triggers or do we really need to go down the verilog style inferred registers path?
There was a problem hiding this comment.
Tie-offs provide a pre-initial value, i.e., the value an on_init triggered procedure observes when looking at its own results. They are closely coupled to the "initialization at declaration" in SV (Section 10.5 of the spec). The tie-off and implied state make sure we have a defined value for the results at any point of execution. IMHO that's the lesser of evils.
| ::mlir::Type type, | ||
| ::mlir::Location loc) { | ||
|
|
||
| // Delegate non 'sim' types to the HW dialect materializer. |
There was a problem hiding this comment.
I'm curious why this is necessary? Are hw constant requests really winding up here? Why?
There was a problem hiding this comment.
This is necessary materialize the tie-off integer attributes of TriggeredOps as hw.constant when they get folded.
I've just removed the fold method for the moment, as it relied on the condition argument. But I'd keep this section for future use, if you don't mind.
| // leave it to the parent's canonicalization. | ||
| if (auto parentSeq = op.getParent().getDefiningOp<TriggerSequenceOp>()) { | ||
| if (parentSeq == op) { | ||
| op.emitWarning("Recursive trigger sequence."); |
There was a problem hiding this comment.
Isn't this a failure? Shouldn't it be covered by the verifier? What about multi-op sequences which are circular?
There was a problem hiding this comment.
In practice a cyclic trigger graph is almost certainly a bug. But theoretically it is still well-formed: It has no root event, thus it is never invoked. There is no good way of handling this right now, but that should change with the soon to be added sim.never op, which just creates a constant "dead" trigger.
Multi-op circular graphs will be opportunistically handled by the flattening DFS. I'm not sure it is worth spending any effort on making sure all cycles are detected and replaced.
| if (!canBeChanged) | ||
| return failure(); | ||
|
|
||
| // DFS for inlinable values. |
There was a problem hiding this comment.
Can this be done instead on an op-by-op canonicalizer which doesn't walk the IR? having DFS in a canonicalizer is a good way to have n^2 behavior.
There was a problem hiding this comment.
This is how I've implemented it originally, but I think the current variant is more efficient. Note that:
- The first thing the canonicalizer does it to check, whether the current op can be inlined by the parent. If so, it bails out immediately. That way we can guarantee that the DFS only is performed from the root of a collapsible sub-tree and the entire sub-tree is collapsed with a single op rewrite.
- The DFS will only enter a child node if it will be inlined. Thus, it should not do any more traversal than an op-by-op canonicalizer would also need to do.
- The DFS itself is pretty cheap. Since all nodes are guaranteed to have a single predecessor we do not need to maintain a set of visited nodes to prevent walking into a cycle.
My motivation to replace the original implementation was to minimize the number of op rewrites. Since a TriggerSequenceOp can have a lot of results and users, I didn't want to substitute them repeatedly. The DFS makes sure this happens in one shot. Given all this, I do not see how this would cause a performance regression. But maybe I'm missing something?
darthscsi
left a comment
There was a problem hiding this comment.
I think this is on a good track. My biggest concern is the implicit register representation being redundant with seq.
Minor questions about n^2 behavior in canonicalization.
ad81565 to
54a2f26
Compare
|
Thanks a lot for your review. I hope, I could provide a somewhat convincing motivation for my decisions in the comments. I have removed the condition argument for TriggeredOps for the moment. My intent is to replace them with the |
| include "mlir/Interfaces/SideEffectInterfaces.td" | ||
| include "mlir/Interfaces/FunctionInterfaces.td" | ||
| include "circt/Dialect/Sim/SimDialect.td" | ||
| include "circt/Dialect/Sim/SimTypes.td" | ||
| include "circt/Dialect/Seq/SeqTypes.td" |
| let summary = [{Yield results from a triggerd region with 'seq' | ||
| (i.e. register-like) semantics."}]; |
There was a problem hiding this comment.
Extra " at the end. Summary should probably also be short enough to fit in one 80 col line. Everything else can go in 'description'.
| def EdgeTriggerType : SimTypeDef<"EdgeTrigger"> { | ||
| let summary = "Trigger derived from an edge event."; | ||
| let parameters = (ins "::circt::hw::EventControl":$edgeEvent); | ||
| let mnemonic = "trigger.edge"; | ||
| let assemblyFormat = "`<` $edgeEvent `>`"; | ||
| } | ||
|
|
||
| def InitTriggerType : SimTypeDef<"InitTrigger"> { | ||
| let summary = "Trigger derived from the simulation start event."; | ||
| let mnemonic = "trigger.init"; | ||
| } | ||
|
|
||
| def AnyTriggerType : AnyTypeOf<[EdgeTriggerType, InitTriggerType]>; |
There was a problem hiding this comment.
Why do we need separate types for init and edge and why does edge need an event control attr? Why would a procedural region care by which trigger it was invoked?
There was a problem hiding this comment.
That was a provision for verilog lowering. The idea was to be able to determine whether we need to produce an always @(posedge ... ) or an always @(negedge ... ) etc. purely based on the trigger's type, in case the root trigger op is out of scope. I've since changed how verilog lowering works. It should no longer be necessary and I'll remove the event control attribute.
I'm more hesitant though to merge !trigger.edge and !trigger.init. I think !tringger.init could come in handy for side-effecting non-synthesizable register/memory initializers. E.g., it could be used to sequence sim.initial ops, should we decide to go that way.
| } | ||
|
|
||
| def TriggeredOp : SimOp<"triggered", [ | ||
| IsolatedFromAbove, |
There was a problem hiding this comment.
Just wondering about why you decided to isolate it from above and pass through inputs explicitly?
There was a problem hiding this comment.
Good point. I don't see a hard technical reason to have it IsolatedFromAbove, but conceptually it seemed to me like the obvious thing to do. I'm thinking of triggered ops as inline-defined procedures, so giving them arguments simply felt natural. It should also drive home the point that all arguments of a trigger tree are captured at the same time and can never change during execution of that tree. Whether we can actually keep that promise for verilog lowering is (sadly) a different story...
587820e to
42e2f59
Compare
Flatten nested sequences in a single rewrite rather than iteratively.
42e2f59 to
080017a
Compare
|
Let's do some PR necromancy! We've discussed this again in yesterday's ODM (CC @uenoku @seldridge @darthscsi ). In hindsight I should have done a proposal on the general shape of the sequenced IR first before diving into all the implementation details. My excuse is that a wanted to make sure that what I propose can actually be implemented. So, let's focus on the overall concept for now: I had based the trigger tree design on two premises: Realistically, to achieve this, we need some kind of token SSA values to represent the dependencies in the graph. Note that trigger trees, as proposed here, are effectively token graphs with bidirectional flow on the SSA edges. When a trigger user op receives a token, it guarantees to return that token on the same edge within the same time step. Forks are encoded implicitly where a trigger SSA value has more than one user. Joins are also present implicitly in between all results of a TriggerSequenceOp. Fundamentally, when modeling token flows, I see two problems we have to consider: Problem 1: Since we have ruled out suspending execution at a join until all tokens have arrived, what do we do if we join tokens originating from different root triggers (e.g., clock edges)? Some of them may never arrive until we progress the simulation time. Problem 2: How can we safely pass results between TriggeredOps within the same time step if the determined execution order is not a topological order of the result value dependencies? We could be using values before they are defined. IMHO, trigger trees are a fairly elegant solution to Problem 1. It allows us to model joins without having an operation that receives more than one trigger/token value. Thus, the DAG cannot depend on more than one root trigger. However, they do not solve Problem 2, and I think that is their most notable drawback. But consider that in FIRRTL (as far as I know) it is currently not even possible to pass results between synchronous side-effecting operations. Until that changes, I don't see this as a significant issue. And even if it changes, I would assume that we usually could simply lower dependent operations into the same TriggeredOp body. But let's take a step back and consider "generic" token DAGs. The obvious solution to Problem 2 then would be to consider all returned values as tokens (either implicitly or explicitly by wrapping them in a parameterized TokenType), and execute the TriggeredOps in a topological order. This effectively means, that a TriggeredOp has to do a join on all the token values flowing into it. So we're now facing Problem 1. We could try to fix that problem purely through semantics: If we know that a given root trigger is inactive during a time step, we can specify that all TriggeredOps that directly or indirectly depend on it won't be executed at this step. That would basically shift the problem into the runtime. A structural fix would be to encode the root trigger (domain) in the type of the token values and only allow joins on matching types. So, in the end, I think we cannot avoid both problems by design and the choice boils down to which one is more important to dodge. Either way, we can of course always try to check and reject problematic IR during lowering. |
|
I think it would be helpful to clarify the exact requirement here. At least for the current use case, my understanding is that we want to preserve the order of side-effecting sim operations that appear in graph regions, probably within a single module. Maybe I am missing some context, but for this narrower problem, it seems that we could also solve it by putting the relevant operations into a single sim.triggered block, either with a dedicated SquashSimTriggered pass or with a helper similar to the addToSimTriggered function I used in #10545. I am not saying that the design in this PR would not solve the problem. My concern is more about whether it may be too heavy-weight for this particular requirement. From my reading, this PR seems to spend a significant amount of effort on modeling ordering between operations in different modules and instances. I do not yet understand the use cases that require that level of cross-module ordering, and it seems to go beyond the narrower requirement above. If we temporarily restrict the discussion to a single module, then the main advantage of this approach seems to be that it allows multiple sim.triggered blocks to have an explicit order between them. I am still not sure whether that extra expressiveness is necessary for the current use case, compared to simply combining the ordered operations into one sim.triggered block. I would be very grateful if you could clarify my confusion! 😘 |
|
@fzi-hielscher wrote:
I think this is a reasonable way to handle things. However, the graph region should provide no ordering guarantees. I.e., if things are in a certain order in the graph region, then it is "nice" to lower this to Verilog in the same order. However, it's also legal for us to completely shuffle these. Similarly, most simulations will schedule always blocks in the same module with the same trigger sequentially (it's logical and it's likely the easiest way to implement it). @fzi-hielscher wrote:
Yeah, the traditional side effecting ops (assert, stop, etc.) don't have results so there's no way to pass anything around. Also, their side effects can't "interact". It may be possible to have interactions with DPI, though, if a user is doing this, I think they're hitting undefined behavior. (I don't think there's a problem here. I agree with your comment.) @fzi-hielscher wrote:
There's no problem with needing to pass information within a domain, correct? There should always be a solution there. If the real problem only arises when dealing with different triggers, then that may hint that we are needing better modeling of both: (1) a generalization of "domain" information to color everything that is in the same "domain" and (2) there are explicit domain synchronizing operations which define the ordering. The direction I've been going with FIRRTL domains is closer to your: @fzi-hielscher wrote:
The FIRRTL pipeline just checks, "Does the program ever let two domains talk to each other?" If so, then error. However, allow the user to do domain casts (as it's not useful to have a circuit where two domains can never communicate). Provide some lightweight checking via property assertions that the domain cast is legal (as a domain cast is maximally unsafe). However, this provides zero guidance on how the domain cast works and it's essentially up to the designer to not screw it up. I.e., the actual semantics of the crossing are whatever the Verilog simulator determines or the synthesis tools infer/do. This doesn't work for core as we'd like to simulate it in Arc, though! This is a roundabout way of saying that you may be bumping up against a facet of domain modeling in core. Perhaps we should tackle that head on. |
|
@nanjo712 wrote:
What you're proposing is an alternative and closer to the runtime having to deal with it. Or: we could assume that there is no defined ordering (like how Verilog handles it) and then it's the responsibility of the language front end or the lowering from the language front end (e.g., FIRRTL's This doesn't solve the case of needing to define an ordering between blocks with different triggers, though, or to define an ordering between blocks globally and inter-module. There are some use cases for blocks with different triggers. E.g., modules may have multiple clocks. You'd like to be able to guarantee that all the prints from one clock domain (fast clock) happen before the prints from a second domain (slow clock). This could matter for some log post-processing. However, there's no way to express this today. (Note: there are other approaches for this example like using logging suffixes and post-processing. The example was more just trying to provide motivation for something that we can't express today.) |
|
Thanks for your feedback, @nanjo712 and @seldridge! So, I think the question we have to answer before anything else is: I had effectively ruled out the idea of sequencing between domains. As of today, the FIRRTL spec only defines an order for operations that are on the same clock. And it is very difficult to do in SV.
It is definitely no necessity, but IMHO a "nice to have" option in the IR as it would allow us to freely move side effecting operations without changing semantics.
Correct, as long as there are no cyclic dependencies.
Independently of the TriggeredOp, I would very much appreciate having domain information in the core dialects. |
|
I think there are several different issues mixed together here. First, there is the question of ordering across different clock domains. I agree that clock/domain information should probably be modeled in some way in the core dialects, but we currently do not have such a mechanism. That is one missing piece. More importantly, ordering across different clock domains seems semantically odd to me. If two procedures do not share a clock/domain, I do not see a clear reason why we should impose an order between them. Maybe we have reached some agreement that, at least for now, this problem is not well-defined. Second, there is the question of ordering across modules within the same clock domain. Without explicit clock/domain information, even determining whether two modules are in the same clock domain can be difficult. I also do not yet see a sufficiently strong use case for this. It would certainly be a nice feature to have, but given the current state of the IR, this also seems difficult to model cleanly. Third, there is the question of ordering within the same module and the same clock domain. In this case, determining whether operations are under the same clock is much easier, especially during lowering. However, for this case, instead of assigning an explicit order between multiple It would of course be great if the first and second problems could be solved eventually. However, I think that would probably require us to first establish some notion of clock/domain information in the core dialects. |
Continuing the series of #7314 and #7335 (and hoping to finally get to lower the
sim.proc.printoperation) this PR adds trigger-related types and operations to the Sim Dialect. The primary point is to be able to express the execution order of side-effecting ops and procedures without having to rely on operation order within a HWModule's graph region. As added benefits, triggers allow us to:Triggers span virtual clock trees. Their root node is either an edge event of a "real" clock (
sim.on_edge) or the start of simulation (sim.on_init). When the root event occurs, all leaf operations of the given tree are triggered. In contrast to normal clock trees, trigger trees impose a partial order on their leaf nodes from which we can derive their execution order. Two leaf nodes are unordered (incomparable) if they are not part of the same trigger tree. They are concurrent (equal) if their lowest common ancestor operation is not aTriggerSequenceOp. If the lowest common ancestor is aTriggerSequenceOpthe order depends on the result indices of the sequence op.So... in practice:
sim.triggeredprovides a region in which we can place procedural operations. These operations can have side-effects. However, they are required to make forward progress and eventually terminate independently of all other procedures and simulation events. This means that concurrent procedures are not actually required to be run in parallel. Any chosen serialization should be legal / dead-lock free. Note that during lowering previously unordered procedures can become concurrent, e.g., by CSEing their root triggers.TriggeredOps can also produce results via the
sim.yield_seqterminator. The "seq" is to indicate an implicit register at the output. I.e., results are produced in a clock synchronous fashion. At some point we"ll probably need an asynchronoussim.yield_comb. But this can create all sorts of complex interactions, so I try to put it off as long as I can 😅.All results of
sim.triggeredmust have an explicit tie-off constant specified. These are used both as results outside of simulation contexts (i.e., synthesis), and as (pre)initial value of the implicit register.I have a very much proof-of-conceptish implementation of an arcilator lowering in my github fork. It can compile this little gadget, showing how to do sequenced calls to a side-effecting procedure during initialization across module instances (and print stuff).