Added a query function to get minimum distance and the indices of clo…#149
Added a query function to get minimum distance and the indices of clo…#149plooney wants to merge 1 commit intogeorust:mainfrom
Conversation
michaelkirk
left a comment
There was a problem hiding this comment.
I'm new to this crate, so don't consider my review as blocking. Probably @kylebarron or someone else more familiar should make the final call.
| let node_size = self.node_size(); | ||
|
|
||
| // Use TinyVec to avoid heap allocations | ||
| let mut stack: TinyVec<[usize; 33]> = TinyVec::new(); |
There was a problem hiding this comment.
Since all pushes/pops happen in 3's maybe it makes more sense to have:
let mut stack: TinVec<[(usize, usize, usize); 11]> = TinyVec::new();
stack.push((0, indices.len() - 1, 0));
There was a problem hiding this comment.
Fair. I implemented pushing and popping in 3's to be a relatively literal port of the upstream JS code, such as in https://github.com/mourner/kdbush/blob/92b25eb356852b6cade256f147ada41b1a7e04d2/index.js#L162-L164
|
|
||
| // recursively search for items within radius in the kd-sorted arrays | ||
| while !stack.is_empty() { | ||
| let axis = stack.pop().unwrap_or(0); |
There was a problem hiding this comment.
This unwrap_or "should never happen", right?
If you adopt the 3-tuple approach above, I think this could become:
while let Some((axis, right, left)) = stack.pop() {
...
}
And you could get rid of the unwrap_or
There was a problem hiding this comment.
Looking around I bit, it looks like this was copy/pasted from another method. Actually, a quick scan shows this traversal logic is being used in at least 2 other places. Is it reasonable to abstract the traversal logic from the query logic at this point?
| /// - qy: y value of query point | ||
| /// | ||
| /// Returns distance squared and indices of found items | ||
| fn query(&self, qx: N, qy: N) -> (N, Vec<u32>) { |
There was a problem hiding this comment.
I have several superficial suggestions, but they are admittedly all subjective matters of taste.
1. naming: query is vague - nearest would be more specific without being overly verbose.
2. Returning distance squared: I see how this could be helpful for performance in some cases, but I'd guess it's surprising. (The current implementation actually seems to already return distance not distance squared so maybe you agree and this was just an out of date doc).
3. Tuple order: I'd expect most people to be more interested in the Vec<u32> than the distance, so I'd flip the order of the return tuple.
So, all together, I'm suggesting something like this:
/// Returns indices of found items and their distance from the query point.
fn nearest(&self, qx: N, qy: N) -> (Vec<u32>, N) {
(nearest_distance_squared.0, nearest_distance_squared.1.sqrt())
}
/// Optional: if you think it's worth exposing the squared distance - give it a more obvious name:
///
/// Returns indices of found items and their distance squared from the query point.
fn nearest_distance_squared(&self, qx: N, qy: N) -> (Vec<u32>, N) {
...
}There was a problem hiding this comment.
On the naming. I have been using KDTrees in Scipy and I have mimicked the naming there
| /// - qy: y value of query point | ||
| /// | ||
| /// Returns distance squared and indices of found items | ||
| fn query(&self, qx: N, qy: N) -> (N, Vec<u32>) { |
There was a problem hiding this comment.
Why is this u32? Aren't these array indices? I think if you use usize some of the unwrap's below can go away.
There was a problem hiding this comment.
Ah, my bad. I see that indices are either u16 or u32, so converting to u32 should never fail. For future cleanup (not in this PR), maybe Indices should have some kind of infallible method like get_u32 so we don't have so many unwraps in the calling code.
There was a problem hiding this comment.
Yeah, indices were defined as u32 to match the upstream JavaScript implementation in https://github.com/mourner/kdbush
| stack.push(1 - axis); | ||
| } | ||
| } | ||
| let d = min_val.sqrt().unwrap(); |
There was a problem hiding this comment.
Why is it safe to unwrap here? When is this None?
There was a problem hiding this comment.
It would be None if min_val is < 0.0. That should not be the case since it is squared distance.
|
I'd like to find time to review this, but I'm pretty swamped right now and just got selected to serve on a 3-week jury trial, so my time will likely be very limited for the rest of January. |
|
@kylebarron No problem. I have some other work I need to progress and will come back to this when you have more time.. |
Kontinuation
left a comment
There was a problem hiding this comment.
We'd better add some tests for the newly added query function.
| // queue search in halves that intersect the query | ||
| let lte = if axis == 0 { | ||
| qx - min_val <= x | ||
| } else { | ||
| qy - min_val <= y | ||
| }; | ||
| if lte { | ||
| stack.push(left); | ||
| stack.push(m - 1); | ||
| stack.push(1 - axis); | ||
| } | ||
|
|
||
| let gte = if axis == 0 { | ||
| qx + min_val >= x | ||
| } else { | ||
| qy + min_val >= y | ||
| }; |
There was a problem hiding this comment.
If I understand it correctly, min_val is the minimum-so-far squared distance. Will it cause problems to add a coordinate with a squared distance? This may lead to incorrect pruning.
|
|
||
| // queue search in halves that intersect the query | ||
| let lte = if axis == 0 { | ||
| qx - min_val <= x |
There was a problem hiding this comment.
Should we use sqrt(min_val) for correct pruning? For distances < 1.0, this bug can cause the algorithm to miss valid results.
| /// - qx: x value of query point | ||
| /// - qy: y value of query point | ||
| /// | ||
| /// Returns distance squared and indices of found items |
There was a problem hiding this comment.
Looks like the actual return value is min_val.sqrt() (the actual distance)?
There was a problem hiding this comment.
Yes the documentation I wrote is incorrect. Should be distance
…sest points
CHANGES.mdorCHANGELOG.md