Skip to content

Document and test use of char::is_whitespace in collapsible_if#16840

Open
iAmChidiebereh wants to merge 1 commit intorust-lang:masterfrom
iAmChidiebereh:test/confirm-is_whitespace-intent
Open

Document and test use of char::is_whitespace in collapsible_if#16840
iAmChidiebereh wants to merge 1 commit intorust-lang:masterfrom
iAmChidiebereh:test/confirm-is_whitespace-intent

Conversation

@iAmChidiebereh
Copy link
Copy Markdown

@iAmChidiebereh iAmChidiebereh commented Apr 10, 2026

View all comments

changelog: none

This PR addresses the whitespace check in collapsible_if.

// ./clippy_lints/src/collapsible_if.rs:145

let requires_space = snippet(cx, up_to_else, "..").ends_with(|c: char| !c.is_whitespace());

My investigation revealed that the current char::is_whitespace check is actually intentional. If we switched to rustc_lexer::is_whitespace, zero-width characters (like \u{200E}) would be treated as valid spacing, which would make Clippy to output suggestions that visibly look like elseif. To prevent future regressions, this PR adds a UI test with a zero-width space and an inline comment explaining why we must keep the current check.

The issue is for outreachy applicants and is being tracked here: rustfoundation/interop-initiative#53

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Apr 10, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 10, 2026

r? @dswij

rustbot has assigned @dswij.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: 7 candidates
  • 7 candidates expanded to 7 candidates
  • Random selection from Jarcho, dswij, llogiq, samueltardieu

@iAmChidiebereh
Copy link
Copy Markdown
Author

When I tried out rust_lexer::is_whitespace.

stephen@debian:~/outreachy/rust-clippy$ TESTNAME=collapsible_else_if.rs cargo uitest
Screenshot From 2026-04-10 09-06-28
stephen@debian:~/outreachy/rust-clippy$ TESTNAME=collapsible_else_if.rs cargo uibless

Snippet from generated collapsible_else_if.stderr file:

Screenshot From 2026-04-10 09-08-05

 
 

We can see we have an undesirable result collapse suggestion using elseif instead of else if

Copy link
Copy Markdown
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please make the PR description shorter?
Just 3-4 dot points saying what the tests and code does it great.
If you need help summarising it, please let me know.

Edit: you can just link to the Outreachy ticket, you don't need to explain that ticket in detail.

I also want to check you're looking at the right is_whitespace method?
https://doc.rust-lang.org/std/primitive.char.html#method.is_whitespace

The snippet function returns a str, so we are working with Unicode chars here:
https://github.com/rust-lang/rust/blob/ad4b9354009cb6bd5a9ff1b5f5a63a13ec98ebc9/src/tools/clippy/clippy_utils/src/source.rs#L535

View changes since this review

Comment thread tests/ui/collapsible_else_if.fixed Outdated
#[rustfmt::skip]
fn spacing_zero_width_ws() {
// This test shows that the Char::is_whitespace gives a more desirable result than rust_lexer::is_whitespace for require spaacing logic
// We test out 2 zero-wisth characters recognized as whitespaces by the lexer but not by Char::is_whitespace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see we have an undesirable result collapse suggestion using elseif instead of else if

It looks like you're testing whitespace characters that are included in Rust language whitespace (Unicode Pattern_White_Space or rust_lexer::is_whitespace). This won't find whitespace bugs, but it is still a good test to have.

Please add another test with a whitespace character that is Unicode White_Space, but not Pattern_White_Space / rust_lexer::is_whitespace. Here are those lists:

For example, some characters to test are:

  • 00A0 ; White_Space # Zs NO-BREAK SPACE
  • 202F ; White_Space # Zs NARROW NO-BREAK SPACE

Copy link
Copy Markdown
Author

@iAmChidiebereh iAmChidiebereh Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're testing whitespace characters that are included in Rust language whitespace (Unicode Pattern_White_Space or rust_lexer::is_whitespace). This won't find whitespace bugs, but it is still a good test to have

I found out that the lexer panics on non-Pattern_White_Space characters, meaning that adding a test sample involving if else blocks that will generate snippets ending with whitespace characters in White_Space, but not in Pattern_White_Space is not possible.. We can only test within the Pattern_White_Space set. Given this, rustc_lexer::is_whitespace seemed logical to use. However, it counts the zero-width characters \u{200E} and \u{200F} as whitespace (and rightfully so), which causes Clippy to output what visibly looks like elseif as suggestion to users. While syntactically valid to the lexer, this invisible spacing is confusing to users. On the other hand, char::is_whitespace doesn't classify those two as whitespace, forcing the insertion of a visible space (else if) for readability. Therefore, this test only acts as a safeguard to prevent future developers from mistakenly "fixing" the check by switching to the lexer's method, which would degrade the visual output...

I could still use rustc_lexer::is_whitespace but I will also have to add a check that will let us treat \u{200E} and \u{200F} as non-whitespace characters for this particular problem. That will give us the same desirable behaviour. This is actually more intuitive as anyone reading the code will know we just treating them differently but they are still whitespaces.

Please add another test with a whitespace character that is Unicode White_Space, but not Pattern_White_Space / rust_lexer::is_whitespace. Here are those lists:

image

The image here was the 'unknown start of token' error when I tried testing fo a snippet that ends with 202F. It is the same for others.

Though it seems like I am missing out out on some detail and also, my assumption that we wouldn't want to print a suggestion that appears to look like 'elseif' even when there is actually a whitespace in between might very well be wrong.
I think you are trying to draw my attention to something that I am currently not figuring out. Please, I would appreciate a bit more of a pointer. Thank you.

Copy link
Copy Markdown
Contributor

@teor2345 teor2345 Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out that the lexer panics on non-Pattern_White_Space characters,

If adding a non-Pattern_White_Space character to a Rust source file causes a panic, then this exactly the kind of bug in clippy (or rustc) that we're trying to find.

meaning that adding a test sample involving if else blocks that will generate snippets ending with whitespace characters in White_Space, but not in Pattern_White_Space is not possible.

You can add a test and mark it with should_panic in one commit, then fix the bug in clippy/rustc, and remove should_panic from the test:
https://doc.rust-lang.org/reference/attributes/testing.html#the-should_panic-attribute

If it is a UI test, you can use //@ should-ice instead:
https://rustc-dev-guide.rust-lang.org/tests/directives.html#controlling-outcome-expectations

If using those directives isn't possible, it's ok to just make one final commit with the test.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.. Thank you
It is a ui test, so i tried out the should-ice directive and got "should-ice is not a command known to ui_test".
In trying to research on what could be done, I also realized @should-ice is not a supported directive in ui_test: https://docs.rs/ui_test/0.30.4/ui_test/#supported-comment-annotations

If using those directives isn't possible, it's ok to just make one final commit with the test.

Alright...I will just push the panicking test as it is. Thank you!

@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 10, 2026

Can you please make the PR description shorter? Just 3-4 dot points saying what the tests and code does it great. If you need help summarising it, please let me know.

Edit: you can just link to the Outreachy ticket, you don't need to explain that ticket in detail.

I have shortened it. Please help confirm it is okay.

I also want to check you're looking at the right is_whitespace method? https://doc.rust-lang.org/std/primitive.char.html#method.is_whitespace

Yes, this was what I was looking at

The snippet function returns a str, so we are working with Unicode chars here: https://github.com/rust-lang/rust/blob/ad4b9354009cb6bd5a9ff1b5f5a63a13ec98ebc9/src/tools/clippy/clippy_utils/src/source.rs#L535

Yes, I get this part... but my investigation showed that the lexer fails to scan the file before the lint even has a chance to run... so we don't even get to the part where we can extract the snippet to test on (right?).

@iAmChidiebereh iAmChidiebereh requested a review from teor2345 April 10, 2026 23:38
@iAmChidiebereh iAmChidiebereh changed the title Add test to confirm intentional use of Char::is_whitespace Document and test intentional use of char::is_whitespace in collapsible_if Apr 12, 2026
@iAmChidiebereh iAmChidiebereh changed the title Document and test intentional use of char::is_whitespace in collapsible_if Document and test use of char::is_whitespace in collapsible_if Apr 13, 2026
@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 14, 2026

Hi @teor2345
I have added the test and It currently panicks
Rustfmt wouldn't let my commit go through (i added a hook that auto-runs rustfmt on commit) and even #![rustfmt::skip] at the top of my file didn't work... so I had to write this test in a new file and then, instruct rustfmt to ignore the file by adding it to the ignore array in rustfmt.toml. That worked.

Comment thread tests/ui/collapsible_else_if_snippet_ends_with_npws.rs Outdated
@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 14, 2026

Hi @teor2345
Even though the ui test runs now, the one of the tests triggered to run in the CI is failing due to our generated .stderr file not meeting established convention for lint messages... Please could you look into this: https://github.com/rust-lang/rust-clippy/actions/runs/24382289363/job/71208280778?pr=16840
I think this is because a .stderr file now contains non-lint message. The clippy team probably ensured all lint messages are lowercase... Our error (not lint) message comes from the lexer and as such, do not conform to the convention. They contains uppercase letter.

@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 14, 2026

Also, I have been thinking... We are currently not testing any whitespace specific behaviour. We are just suppressing panic.
And I am trying to wrap my head around how this will play out because I currently don't see a way we can fix the main code (probably by switching to rustc_lexer::is_whitespace), take lines like //~ ERROR: unknown start of token: \u{202f} off and bring back //~ collapsible_else_if to finally have things running without suppressing the error. I think the error is a constant. Can you help clear my confusion?
I will be looking forward to your response. Thanks!

Copy link
Copy Markdown
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the tests … is failing due to our generated .stderr file not meeting established convention for lint messages: rust-lang/rust-clippy/actions/runs/24382289363/job/71208280778?pr=16840
this is because a .stderr file now contains [lexer] message. The clippy team probably ensured all lint messages are lowercase

This is a question for a clippy maintainer. They might decide to exempt the test, move it to another test category, or silence the whitespace errors.

But the test is important, because it shows that clippy doesn't crash, or break valid code.

(i've tried to just quote the essential parts of your question, to make it easier to read.)

We are currently not testing any whitespace specific behaviour. We are just suppressing panic. And I am trying to wrap my head around how this will play out because I currently don't see a way we can fix the main code (probably by switching to rustc_lexer::is_whitespace), take lines like //~ ERROR: unknown start of token: \u{202f} off and bring back //~ collapsible_else_if to finally have things running without suppressing the error. I think the error is a constant.

I don't see any clippy panics here. I see errors, but those errors are a correct response to invalid Rust source code.

Let's take a step back, and think about the purpose of the lint, this test, and any code changes:

  • the lint finds else { if that can be collapsed to else if, and suggests a fix if possible
  • for whitespace accepted by the Rust lexer, if the suggestion/fix changes that whitespace to other whitespace, the meaning of the program doesn't change (rustfmt also makes similar changes)
  • for whitespace not accepted by the Rust lexer, the Rust code has an error that will show up in the compiler. So it's ok for clippy to show that error, fix or not fix it, or ignore it. Clippy just can't panic or crash.

One test shows that clippy fixes code accepted by the lexer, and doesn't break the code when there's unusual white space. Another test shows that clippy correctly prints lexer errors when the code isn't valid Rust code.

So the only improvement we could make here is fixing non-lexer whitespace in the suggestion. But that is optional, and might not be possible, because the lint comes after the lexer.

View changes since this review

Comment thread tests/ui/collapsible_else_if_snippet_ends_with_npws.rs Outdated
Comment thread tests/ui/collapsible_else_if.rs Outdated
@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 15, 2026

This is a question for a clippy maintainer. They might decide to exempt the test, move it to another test category, or silence the whitespace errors.

But the test is important, because it shows that clippy doesn't crash, or break valid code.

Oh... Okay.. I will just wait for a maintainer to come around when they are free so that they can address this. Is it okay to tag someone? I'm cautious of appearing like I am pressing them for response despite their schedule... but I honestly wish I could tag them though.

I don't see any clippy panics here. I see errors, but those errors are a correct response to invalid Rust source code.

Oh... I keep mixing things up... Pardon my use of the word 'panic'... They are errors as you mentioned. I meant 'suppressing test failure due to errors'.

Let's take a step back, and think about the purpose of the lint, this test, and any code changes:

  • the lint finds else { if that can be collapsed to else if, and suggests a fix if possible
  • for whitespace accepted by the Rust lexer, if the suggestion/fix changes that whitespace to other whitespace, the meaning of the program doesn't change (rustfmt also makes similar changes)
  • for whitespace not accepted by the Rust lexer, the Rust code has an error that will show up in the compiler. So it's ok for clippy to show that error, fix or not fix it, or ignore it. Clippy just can't panic or crash.

One test shows that clippy fixes code accepted by the lexer, and doesn't break the code when there's unusual white space. Another test shows that clippy correctly prints lexer errors when the code isn't valid Rust code.

So the only improvement we could make here is fixing non-lexer whitespace in the suggestion. But that is optional, and might not be possible, because the lint comes after the lexer.

Thank you so much! I fully get it now. Thanks for the explanation...

But that is optional, and might not be possible, because the lint comes after the lexer.

This cleared things up honestly because I was confused on how we were going to fix this when the lint actually comes after the lexer. I have been researching on how I could go about fixing this, but then, like you said, it might not be possible. I think I found an issue in the course of investigating this problem and that was like few days ago. Will tag you to it
So yeah... we keep the tests as they act as documentation of the behaviour for snippets ending in the 2 category of whitespaces (recognized by the lexer and not recognized by the lexer).
Thank you once again!

@dswij
Copy link
Copy Markdown
Member

dswij commented Apr 15, 2026

the test collapsible_else_if_snippet_ends_with_npws is not necessary. Clippy will not run if it is not a valid rust. Let's not include those tests in this PR

@dswij
Copy link
Copy Markdown
Member

dswij commented Apr 15, 2026

My investigation revealed that the current char::is_whitespace check is actually intentional.

Can you point out where it is agreed/explicitly mentioned that it is intentional?

@iAmChidiebereh
Copy link
Copy Markdown
Author

the test collapsible_else_if_snippet_ends_with_npws is not necessary. Clippy will not run if it is not a valid rust. Let's not include those tests in this PR

Alright thank you!
I will take it out now.

@iAmChidiebereh
Copy link
Copy Markdown
Author

iAmChidiebereh commented Apr 15, 2026

Can you point out where it is agreed/explicitly mentioned that it is intentional?

Looking back now, I think my choice of words didn't really tell what I meant. I am sorry about that. There was no place where it was explicitly mentioned that it was intentional. What I meant to say was "My investigation revealed that the current char::is_whitespace check 'might' actually be intentional".

The statement came from my investigation after testing out all lexer-recognized whitespaces... Using char::is_whitespace for the whitespace check, every whitespace recognized by the lexer is also classified as whitspace, except for 2: \u{200E} and \u{200F}. Co-incidentally, these 2 characters turns out to be zero-width. Using char::is_whitespace, thereby, ensures a space is added betweeen else and if when a snippet ends with any of those 2 characters... so we get to see the desirable "else if" suggestion.... If we had used the rustc_lexer::is_whitespace function, those 2 will be classified as whitespaces (and rightfully so), therefore, no space will be added in between, making us see suggestions like "elseif" (no space in between on print, even though there is an invisible whitespace). I don't think that is desirable... So it felt deliberate and intentional from the person that implemented this to use a function that classified zero-width whitespce characters as non-whitespace to enable addition of a space between else and if for them. So it was just an assumption from my end backed by the thought I presented here.

@teor2345
Copy link
Copy Markdown
Contributor

I will just wait for a maintainer to come around when they are free so that they can address this. Is it okay to tag someone? I'm cautious of appearing like I am pressing them for response despite their schedule... but I honestly wish I could tag them though.

If you are feeling time pressure because final Outreachy applications are due in a few hours' time, you don't need to worry. You can put links to incomplete PRs in your final application.

What happens to your PR is up to the maintainers of the tool you're modifying or testing. Sometimes PRs (or parts of PRs) don't get accepted for good reasons, and that's ok. Sometimes there are delays because reviewers are busy, and that's also ok.

@iAmChidiebereh
Copy link
Copy Markdown
Author

If you are feeling time pressure because final Outreachy applications are due in a few hours' time, you don't need to worry. You can put links to incomplete PRs in your final application.

What happens to your PR is up to the maintainers of the tool you're modifying or testing. Sometimes PRs (or parts of PRs) don't get accepted for good reasons, and that's ok. Sometimes there are delays because reviewers are busy, and that's also ok.

Thank you so much!

@dswij
Copy link
Copy Markdown
Member

dswij commented Apr 18, 2026

@iAmChidiebereh can you please help to squash the commits?

@iAmChidiebereh iAmChidiebereh force-pushed the test/confirm-is_whitespace-intent branch from 9149b3e to 575c612 Compare April 18, 2026 17:31
@iAmChidiebereh
Copy link
Copy Markdown
Author

Sure... I just did @dswij

Copy link
Copy Markdown
Member

@dswij dswij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dswij dswij added this pull request to the merge queue Apr 23, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 23, 2026
…_else_if lint

docs: document intentional char::is_whitespace usage

chore: make test function name clearer

test: add panicking test for snipets ending with non-Pattern_White_Space whitespaces

fix: test panic

chore: fix typos

Co-authored-by: teor <teor@riseup.net>

docs: add comment to non-pattern_white_space test

chore: take out redundant tests

docs: fix wrong func name
@iAmChidiebereh
Copy link
Copy Markdown
Author

The merge failed.. from the logs, it failed to access 'https://github.com/rust-lang/cargo/'... maybe github server was bad as it returned 500

@iAmChidiebereh iAmChidiebereh force-pushed the test/confirm-is_whitespace-intent branch from 575c612 to 777bdce Compare April 24, 2026 02:08
#[rustfmt::skip]
fn ends_with_zero_width_whitespace() {
// Test out snippets ending with the 2 zero-width characters recognized as whitespaces by the lexer,
// but not by char::is_whitespace
Copy link
Copy Markdown
Author

@iAmChidiebereh iAmChidiebereh Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, my eyes caught a mistake here.
I made an edit on this line (line 188).
We had u8::is_ascii_whitespace instead of the correct char::is_whitespace, hence the update

View changes since the review

Copy link
Copy Markdown
Contributor

@teor2345 teor2345 Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think your comment change is correct, please check the documentation for the lexer:
https://doc.rust-lang.org/reference/whitespace.html

And for char::is_whitespace (which uses Unicode):
https://doc.rust-lang.org/std/primitive.char.html#method.is_whitespace

If you mean ascii::Char::is_whitespace, then that method is unstable:
https://doc.rust-lang.org/std/ascii/enum.Char.html#method.is_whitespace

So it is better to refer to the underlying method, which is u8::is_ascii_whitespace:
https://doc.rust-lang.org/src/core/ascii/ascii_char.rs.html#1078

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay.. Thank you so much
I will go through the links and get back to you 🙏

Copy link
Copy Markdown
Author

@iAmChidiebereh iAmChidiebereh Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone through the links.

And for char::is_whitespace (which uses Unicode):
https://doc.rust-lang.org/std/primitive.char.html#method.is_whitespace

This is what the code for checking snippet-ending characters in the collapsible_else_if lint is using currently, hence my documenting it. Clicking on the is_whitespace call in the codebase and opting to go to its implementation leads us to it. Image shown below:
image


Please can you help re-confirm? Maybe I am still looking at the wrong place?
Patiently awaiting your response. Thank you.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are looking in the right place, but "the 2 zero-width characters" are recognised as whitespace by unicode::White_Space.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your code suggestions might be over-complicating things, and introducing a special case?

As long as clippy provides a useful suggestion with visible space, and rustfmt introduces visible space, I think the current code and test are fine.

Copy link
Copy Markdown
Author

@iAmChidiebereh iAmChidiebereh Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your code suggestions might be over-complicating things, and introducing a special case?

I don't particularly get what you meant when you said 'introducing a special case'. Could it be that it is any different from when we are using the current char::is_whitespace? Do you mind explaining, please?

As long as clippy provides a useful suggestion with visible space, and rustfmt introduces visible space, I think the current code and test are fine.

Oh, Okay thank you... Will just leave it as it is then.

Copy link
Copy Markdown
Author

@iAmChidiebereh iAmChidiebereh Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your code suggestions might be over-complicating things, and introducing a special case?

I don't particularly get what you meant when you said 'introducing a special case'. Could it be that it is any different from when we are using the current char::is_whitespace? Do you mind explaining, please?

I kinda asked this because the only special case I could see was when I singled out \u{200E} and \u{200F} when using the lexer's whitespace check... which is not different from what char::is_whitespace is doing since for any valid snippet, any ending character identified as NOT whitespace by char::is_whitespace is also identified as non-whitespace by the lexer... with the sole exception of \u{200E} and \u{200F} (the lexer recognizes these as whitespace)... so char::is_whitespace already implicitly creates a special case for those 2 characters which was why clippy still provided useful suggestions with visible space in those cases... and was the basis of my argument at the beginning of this PR that its (char::is_whitespace) use might be intentional.. The code i suggested just makes the special case explicit.
That said, I agree with you that we should leave it as it is since they both do same thing. Thank you.
I would appreciate your response though 🙏

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a single lint in the clippy tool.

It's ok to add tests and fix any bugs caused by unusual whitespace when we find them.

But any larger changes in the compiler or tools to handle directional markers would be better made in a principled manner across the entire codebase. I'm not sure what those principles are yet, so I think there would need to be some discussion first.

Copy link
Copy Markdown
Author

@iAmChidiebereh iAmChidiebereh Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Teor...

I spent the last 2 days with my dentist so I couldn't respond earlier.
To respond to your message, I think I need to clarify something because I see we have a lot of mixups. Pardon my lengthy text.

You spoke about how "any larger changes in the compiler or tools to handle directional markers would be better made in a principled manner across the entire codebase"... But there is no where in the PR I suggested any such kind of change... I am assuming you mistook the suggestion I gave for such kind of change?

If so, then that's not the case.
An analogy would be like having a + b * c but then, changing it to a + (b*c). They both do same thing, the second one just shows the operator precedence more clearly, thereby improving readability.

In the case of the pr, what happens in the current code, that is, when we apply the negation of char::is_whitespace on a char (!c.is_whitespace)?
For a regular character that is not whitespace by any definition, it returns true because they are not whitespace. Now, for whitespaces, remember that if the whitespace is not recognized by the lexer, the lint wouldn't run because it is not valid rust... So we are not concerned with those cases because the responsibility is on the lexer to stop such program from running by throwing an error. This is basically why @dswij asked that we remove the test we added before for whitespaces not recognized by the lexer.
So we are only left with whitespaces that are recognized by the lexer. Passing any of those whitespaces to (!c.is_whitespace) returns false... With the sole exception of \u200E and \200F... Which returns true because they don't have the Unicode White_Space property.
We can see that there is an implicit 'special case' already for the directional markers.

Now let's check the suggested code, specifically the whitespace checker. What happens when we pass a char to
!rustc_lexer::is_whitespace(c) || matches!(c, '\u{200E}' | '\u{200F}')
If it is not a whitespace, it returns true. But if it is a whitespace, the first expression returns false and we then have to check the right hand of OR (||)... Which checks if the char matches the directional markers. If it doesn't, it returns false... Otherwise, if it does match any of the directional marks, it returns true.

This is exact same behaviour... Just improving the readability and showing intent behind code. So I am not suggesting any kind of change that alters the behavior and because they are same behavior, we could leave the code at what it is currently. I have already documented that we intentionally used char::is_whitespace and written test for the directional marks... So we are good. Just wanted to clarify things.
Thank you.

@iAmChidiebereh
Copy link
Copy Markdown
Author

@dswij please when time permits, can you take a look at this PR? Thank you 🙏🏾

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants