Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496
Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496asder8215 wants to merge 6 commits into
Components iterator to use front and back indexing instead mutating and subslicing path field#156496Conversation
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
1627e2f to
33e69e1
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
This comment has been minimized.
This comment has been minimized.
33e69e1 to
ed9d33d
Compare
This comment has been minimized.
This comment has been minimized.
ed9d33d to
0b0f84c
Compare
This comment has been minimized.
This comment has been minimized.
… of mutating and subslicing path field; as a result, Components iterator memory size goes from 64 bytes to 40 bytes and as_path does not use cloning at all
0b0f84c to
8ed33ea
Compare
This comment was marked as outdated.
This comment was marked as outdated.
2151b8f to
83cdbed
Compare
This comment was marked as outdated.
This comment was marked as outdated.
…ity, added safety comments, and check for root dir after Prefix component (e.g., '\\?\checkout\src\tools' should produce Prefix, RootDir, Normal, Normal, None, ...) in Components::parse_single_component
83cdbed to
3921fff
Compare
…ng iter().position()/rposition()
…here to use iter().position()/.iter().rposition(), refactored code in compare_components, and removed stale comments
0a25dda to
92e0132
Compare
|
New benchmarking results. You can see what the benchmark code looks like here and run it yourself to see if there are any difference in measurements on your end: This is the measurement of the current implementation of This is the measurement of the new implementation of Edit: Updated |
|
Here are the benchmark results with black box: From current From this Edit: Updated |
92e0132 to
574d7f2
Compare
|
I'm confident this code works (passed CI in previous run, the current amended commit change I made doesn't change logic, but makes the code written in a more idiomatic way). In my opinion, the logic in this code should look more readable than how |
|
@rustbot label +I-libs-nominated Since |
…omponents::normalize_back instead, refactored Components::as_path code
574d7f2 to
1d45aef
Compare
|
Have been meaning to take a closer look at this implementation; will try to give it a read over the weekend. Would like to see to what extent this changes performance for comparisons. |
By comparisons, are you referring to Path equality or Path ordering? I was a bit confused from the issue because you mentioned |
|
I was thinking about both, but I was particularly using paths in a |
Got you. Also, I think I could optimize |
|
Yeah, I suspect that probably the best solution would be to do something similar to what Python does and offer some sort of I'll take any wins if the code ends up working better. |
|
I realized my benchmarking for path ordering is incorrect; I thought the |
…e in previous implementation, but making it work with Components<'_> front index
|
New benchmarks for path ordering comparisons: Performance is nearly the same using what the current implementation of |
This PR entirely changes how
Components<'_>is implemented. Currently, theComponents<'_>iterator 'consumes' components through mutating its path field to a subslice that presents the left over unconsumed path components (this consumed path component is what's returned inComponents::nextorComponents::next_back). However, this PR keeps the path field alive/unmodified and uses front and back indexing strategy to extract consumed/unconsumed components.This PR benefits implementations like
Components::as_path, which is pretty used is multiple areas of the standard library. Previously,Components<'_>iterator was required to clone inside the function to present the unconsumed path because our originalComponent<'_>consuming behavior on path will not allow the returned&'a PathfromComponents::as_pathto last after aComponents::nextorComponents::next_backcall. Due to the current implementation ofComponentsiterator has a size of 64 bytes, if you're usingComponents::as_pathafter eachComponents::next/Components::next_back, then it's pretty unfortunate to be cloning 64 bytes again and again, especially if each of your path components are a few bytes (e.g., "foo/bar/baz").On the point of size, with the indexing strategy, this PR has further optimized the size of
Components<'_>from 64 bytes -> 40 bytes since a large chunk of theComponents<'_>was taken up by theOption<Prefix>(this takes up 40 bytes). Instead of holding a prefix field inComponents<'_>, we can encode the length of thePrefixwithin ourfrontfield index and use another enum calledFirstComponentto check whether our first component of the given path isPrefix(or something else). If it's aPrefix, we can useparse_prefixon the subsliceself.path[..self.front]since we know our front index encodes thePrefixlength.Due to not having the prefix
Option<Prefix>field insideComponents<'_>anymore, all the prefix functions inComponents<'_>have been removed in favor of callingparse_prefix,Prefix::is_verbatim,Prefix::is_drive, etc.I'm curious if this redesign of
Components<'_>improves Path equality as pointed out by @clarfonthey in #154521 with Path equality (not to be confused with Path ordering as mentioned in the issue, since that usesComponents:::compare_componentsand the example code shows equality) being slow.I haven't benchmarked this though.I have benchmarked the result and I can say that currently this implementation improves Path equality due toComponents::next_backrunning faster with this implementation than the current mutating path with a subslice implementation. However, Path ordering runs slightly slower. You can check the benchmark code I used here, and play around with the number of bytes in a component, the number of components, etc..Right now, when I tested it locally on my PC (Fedora OS), it passed all the standard library tests and rust analyzer didn't crash on me (had a few crash reports coming from rust analyzer early on when I messed around with
Components<'_>dealing something with threads usingPath::components, but now that's all resolved).I have not tested this on Windows yet, and I would probably need someone to help me test on this platform as my Windows VM is not working properly to run the standard library test suite.There's a lot of things being done here, and possibly there may be better approaches or ways I could improve this implementation or write the code in a neater way here. I am open to any advice or feedback on this approach.
Update: I got to testing some things out with Prefixes on my Windows VM manually, so the prefix component index encoded into the
Components<'_>front field seems to work out nicely. I've also accounted for root directory being able to exists after a Prefix component like "\?\checkout\src\tools" having the following components:PrefixVerbatim->RootDir->Normal->Normal->None(learnt this from the fail that occurred in miri tests, which is nice to see thisComponents<'_>implementation works on the Windows tests in CI).