Review and update pattern subsumption and exhaustion#1691
Conversation
There was a problem hiding this comment.
The second sentence of §11.3 starts “Informally” but I don't think the following text is informal at all, it simply states the definition of subsumption (in terms of patterns not Engish!)
That aside before I can aye/nay this I need some clarification of what §11.3 (and from that §11.4) are trying to say.
You could reduce §11.3 to:
In a switch statement, it is an error if a case’s pattern is subsumed by the preceding set of cases, this means that any input value would have been matched by one of the previous cases.
That’s a concise high-level definition.
The use of “unguarded cases” in the original is an indication that subsumption for patterns wass understood to be either unsolvable, or deemed potentially too expensive, in general to determine. Given that is the goal here to:
- encourage error messages when an implementation can prove subsumption; and
- require error messages is a set of known provable cases
or something else?
If the above is the goal then material in the informative note regarding “precise extent” is surely normative? (The bit about “applies recursively” is informative as it is a direct consequence of the preceding bullet.)
42a615f to
035b5ca
Compare
|
I haven't made edits for the review comments yet. We should be sure to agree on what the scope of the spec should be for this area before going forward. I'd like us to get to consensus on that discussion in our next meeting. I'm writing a summary of the exhaustiveness and subsumption algorithm used by a certain compiler to help us discuss what we want the standard to say about these areas of pattern matching. A more complete discussion is in the repo for that compiler. My purpose here is to provide enough of a framework for our discussion. The algorithm determines the set of all possible input values for the pattern input expression. For example, if the input expression is of type Note that some input expression types which might appear unbounded actually aren't because an ordering function applies to the type. For example, an input expression of type var output = someDouble switch
{
< 0 => "negative",
0 => "zero",
> 0 => "positive",
double.NaN => "Not a number",
};As this compiler examines each switch arm, it determine if any of the set of input values matches that pattern. If none of the input values in the set matches, that switch are is considered to be subsumed. If any of the input values matches, those values are removed from the input set as "handled" before proceeding to the next arm. Consider this example: bool b1 = SomeTruthiness();
bool b2 = SomeTruthiness();
var output = (b1, b2) switch
{
(true, true) => "very truthy",
(false, false) => "much false",
_ => "mixed",
};At the beginning of the switch, the set of values is: (true, true)
(true, false)
(false, true)
(false, false)After the first arm, the remaining set is: (true, false)
(false, true)
(false, false)After the second arm, it's: (true, false)
(false, true)The third arm is a catch-all, and handles the rest. The same analysis works on the
The fourth arm makes the set exhaustive. After the sixth arm, the remaining input set is empty. A modification to the first example shows how this process works for subsumption: var output = someDouble switch
{
double.NegativeInfinity => "negative infinity",
< 0 => "negative",
0 => "zero",
> 0 => "positive",
double.PositiveInfinity => "positive infinity",
double.NaN => "Not a number",
};In this version, the analysis is:
I've only described the process in terms of a single pattern. More complex patterns (those involving recursive patterns, or logical patterns) becomes more complicated to analyze. That's important for the questions below in that this overview isn't nearly complete enough to normatively describe the algorithm. The discussion questions we should cover:
|
|
I'd suggest we try to establish an approach which may not be as exhaustive as we want it to be eventually, but allows us to get there over time. Ideally, let's start with just simple cases - I'd suggest at least Possibly we should even do this in two stages even for C# 8:
I think it might be worth having a separate numbered subsection for each "kind" of subsumption (so that it's easy to refer to that specific kind, and it's easy when reading to know that you've read everything relevant to that kind). That'll look a bit odd when we've only got We can then have an umbrella issue with "things we want to add" (other integer types, double, tuples, recursive matching), and tick them off over time. How does that sound in terms of a general principle? (I don't yet have a firm opinion on normative vs informative. I think it wouldn't be unreasonable to be normative with an increasing set of kinds of subsumption, explicitly allowing an implementation to implement additional unspecified kinds. We don't really need to worry about other implementations becoming non-compliant IMO - but we do want to make sure that everything that's normative is "reasonable".) |
|
(I've read through the changes in the commits, but I don't think I'm in the right headspace to approve or request changes; discussion at the meeting may well help there.) |
This first commit makes minimal adjustments to the subsumption and exhaustiveness sections. This keeps the rules fairly high level, and provides more leeway for an implementation to devise any algorithm for managing recursive patterns.
Provide more extensive rule definitions for recursive patterns on subsumption and exhaustiveness. This version leans more closely to the implementation, but also doesn't dive deep enough to provide more clarity on pattern forms where exhaustiveness or subsumption might not be detected. I'm tempted to use the first commit, but I want people to see both.
This reverts commit 4858e65.
Use a minimal definition for subsumption and exhaustiveness.
035b5ca to
fef79e0
Compare
|
@jskeet @Nigel-Ecma I've made the updates to remove discussion of which patterns could correctly catch exhaustiveness. |
Nigel-Ecma
left a comment
There was a problem hiding this comment.
This is looking good. Some suggestions & questions.
| > ``` | ||
| > | ||
| > Arms 1 and 2 have non-constant guards and so are not *unguarded*; only arm 3 is *unguarded* with pattern `int i`, which does not subsume the final `_` arm because it does not match a non-`int` value such as `null`. *end example* | ||
| > *Note*: This means that any input value would have been matched by one of the previous cases or arms. *end note* |
There was a problem hiding this comment.
With the detail removed this line should no longer be a note. Line 450 introduces the use of set, so maybe something along the lines of:
| > *Note*: This means that any input value would have been matched by one of the previous cases or arms. *end note* | |
| A pattern `P` is *subsumed* by set of unguarded patterns `Q` if any input value matched by `P` is matched by one of the members of `Q`. |
This maybe should become before 450 – i.e. short definition of subsumed followed by the error cases?
Can a pattern only be subsumed if it is itself unguarded? I.e. should it be “An unguarded pattern P is…”?
| > ``` | ||
| > | ||
| > *end example* | ||
| A set of patterns is exhaustive if, for every possible input value, some pattern in the set is applicable. When an implementation detects that a set of patterns is not exhaustive, it shall issue a warning. |
There was a problem hiding this comment.
This is now an incredibly short section, and we're not planning on adding more detail. 11.3 is really short now too.
Is there a reasonable way of combining them, or do we want to avoid confusing the two terms?
Fixes #1650
This should be reviewed commit by commit
The first commit takes a minimal approach to describing the new rules for subsumption and exhaustiveness. That keeps the rules at a high level, and avoids any discussion of the particular algorithm to determine either subsumption or exhaustiveness in property patterns and other recursive patterns.
The second commit takes a more maximal approach to the rules, but still tries to avoid the actual algorithm used by a well known compiler. I'm less than satisfied, as it makes more assumptions, but still does a fair amount of handwaving. (Fourth commit fixes a lint issue).
My recommendation is to revert the final commit and then merge.After the last meeting, we agreed to update the PR with a new commit to define the terms of subsumption and exhaustiveness, and limit the requirement to produce the diagnostics when either non-exhaustive patterns or subsumed branches are detected.