-
Notifications
You must be signed in to change notification settings - Fork 2k
Document and test use of char::is_whitespace in collapsible_if #16840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
iAmChidiebereh
wants to merge
1
commit into
rust-lang:master
Choose a base branch
from
iAmChidiebereh:test/confirm-is_whitespace-intent
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+101
−1
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, my eyes caught a mistake here.
I made an edit on this line (line 188).
We had u8::is_ascii_whitespace instead of the correct char::is_whitespace, hence the update
View changes since the review
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think your comment change is correct, please check the documentation for the lexer:
https://doc.rust-lang.org/reference/whitespace.html
And for
char::is_whitespace(which uses Unicode):https://doc.rust-lang.org/std/primitive.char.html#method.is_whitespace
If you mean
ascii::Char::is_whitespace, then that method is unstable:https://doc.rust-lang.org/std/ascii/enum.Char.html#method.is_whitespace
So it is better to refer to the underlying method, which is
u8::is_ascii_whitespace:https://doc.rust-lang.org/src/core/ascii/ascii_char.rs.html#1078
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay.. Thank you so much
I will go through the links and get back to you 🙏
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have gone through the links.
This is what the code for checking snippet-ending characters in the collapsible_else_if lint is using currently, hence my documenting it. Clicking on the is_whitespace call in the codebase and opting to go to its implementation leads us to it. Image shown below:

Please can you help re-confirm? Maybe I am still looking at the wrong place?
Patiently awaiting your response. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are looking in the right place, but "the 2 zero-width characters" are recognised as whitespace by
unicode::White_Space.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your code suggestions might be over-complicating things, and introducing a special case?
As long as clippy provides a useful suggestion with visible space, and rustfmt introduces visible space, I think the current code and test are fine.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't particularly get what you meant when you said 'introducing a special case'. Could it be that it is any different from when we are using the current char::is_whitespace? Do you mind explaining, please?
Oh, Okay thank you... Will just leave it as it is then.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda asked this because the only special case I could see was when I singled out
\u{200E}and\u{200F}when using the lexer's whitespace check... which is not different from what char::is_whitespace is doing since for any valid snippet, any ending character identified as NOT whitespace by char::is_whitespace is also identified as non-whitespace by the lexer... with the sole exception of\u{200E}and\u{200F}(the lexer recognizes these as whitespace)... so char::is_whitespace already implicitly creates a special case for those 2 characters which was why clippy still provided useful suggestions with visible space in those cases... and was the basis of my argument at the beginning of this PR that its (char::is_whitespace) use might be intentional.. The code i suggested just makes the special case explicit.That said, I agree with you that we should leave it as it is since they both do same thing. Thank you.
I would appreciate your response though 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a single lint in the clippy tool.
It's ok to add tests and fix any bugs caused by unusual whitespace when we find them.
But any larger changes in the compiler or tools to handle directional markers would be better made in a principled manner across the entire codebase. I'm not sure what those principles are yet, so I think there would need to be some discussion first.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Teor...
I spent the last 2 days with my dentist so I couldn't respond earlier.
To respond to your message, I think I need to clarify something because I see we have a lot of mixups. Pardon my lengthy text.
You spoke about how "any larger changes in the compiler or tools to handle directional markers would be better made in a principled manner across the entire codebase"... But there is no where in the PR I suggested any such kind of change... I am assuming you mistook the suggestion I gave for such kind of change?
If so, then that's not the case.
An analogy would be like having
a + b * cbut then, changing it toa + (b*c). They both do same thing, the second one just shows the operator precedence more clearly, thereby improving readability.In the case of the pr, what happens in the current code, that is, when we apply the negation of
char::is_whitespaceon a char (!c.is_whitespace)?For a regular character that is not whitespace by any definition, it returns true because they are not whitespace. Now, for whitespaces, remember that if the whitespace is not recognized by the lexer, the lint wouldn't run because it is not valid rust... So we are not concerned with those cases because the responsibility is on the lexer to stop such program from running by throwing an error. This is basically why @dswij asked that we remove the test we added before for whitespaces not recognized by the lexer.
So we are only left with whitespaces that are recognized by the lexer. Passing any of those whitespaces to (!c.is_whitespace) returns false... With the sole exception of
\u200Eand\200F... Which returns true because they don't have the Unicode White_Space property.We can see that there is an implicit 'special case' already for the directional markers.
Now let's check the suggested code, specifically the whitespace checker. What happens when we pass a char to
!rustc_lexer::is_whitespace(c) || matches!(c, '\u{200E}' | '\u{200F}')If it is not a whitespace, it returns true. But if it is a whitespace, the first expression returns false and we then have to check the right hand of OR (||)... Which checks if the char matches the directional markers. If it doesn't, it returns false... Otherwise, if it does match any of the directional marks, it returns true.
This is exact same behaviour... Just improving the readability and showing intent behind code. So I am not suggesting any kind of change that alters the behavior and because they are same behavior, we could leave the code at what it is currently. I have already documented that we intentionally used char::is_whitespace and written test for the directional marks... So we are good. Just wanted to clarify things.
Thank you.