Skip to content

Speed Up Nearest Zone Calculation in Disaggregate Accessibilities#1031

Open
dhensle wants to merge 6 commits into
ActivitySim:mainfrom
RSGInc:disagg_access_nearest_zone_speed_up
Open

Speed Up Nearest Zone Calculation in Disaggregate Accessibilities#1031
dhensle wants to merge 6 commits into
ActivitySim:mainfrom
RSGInc:disagg_access_nearest_zone_speed_up

Conversation

@dhensle
Copy link
Copy Markdown
Contributor

@dhensle dhensle commented Jan 27, 2026

Fix for #1030

This pull request introduces significant performance improvements to zone lookup operations in the accessibility calculations by vectorizing nearest zone searches and optimizing the mapping of zone IDs to skim indices. The changes focus on reducing redundant operations and leveraging efficient numpy-based lookups, which should result in faster computations, especially for large datasets.

Performance improvements in zone lookup and mapping:

  • Replaced the per-origin nearest zone search with a new vectorized function find_nearest_zones_via_skims, enabling a single batched skim lookup for all origin-destination pairs instead of one lookup per origin zone. This reduces computational overhead in disaggregate_accessibility.py. [1] [2]
  • Updated the code to use the new vectorized nearest zone search in place of the old loop-based approach.

Optimizations in OffsetMapper for skim index mapping:

  • Added a fast-path numpy array (_offset_array) to OffsetMapper for O(1) zone ID to index mapping when the offset is specified as a pandas Series, replacing the slower pandas .map() approach. This is constructed only when zone IDs are non-negative and the range is reasonable. [1] [2]
  • Modified the map method to use this numpy array when available, including safe handling of out-of-range indices, which improves performance for large zone lists.
  • Ensured that the numpy array is not used or constructed when the offset is a simple integer, maintaining correct behavior for all offset types.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets the performance bottleneck in disaggregate accessibility zone lookups (Issue #1030) by optimizing (1) nearest-zone identification when using skims and (2) zone-id-to-skim-index mapping.

Changes:

  • Introduces a vectorized nearest-zone skim lookup helper and switches find_nearest_accessibility_zone to use it.
  • Adds a numpy-array fast path to OffsetMapper to speed up zone id → skim index mapping when an offset series is used.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
activitysim/core/skim_dictionary.py Adds a numpy-based fast path for OffsetMapper mapping to reduce pandas .map() overhead.
activitysim/abm/tables/disaggregate_accessibility.py Replaces per-origin skim nearest-zone lookups with a vectorized approach intended to reduce Python-level overhead.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +76 to +94
# Build numpy lookup array for fast O(1) mapping
# This replaces slow pandas Series.map() with direct numpy indexing
index_vals = offset_series.index.values
if len(index_vals) > 0:
min_zone = int(index_vals.min())
max_zone = int(index_vals.max())
# Only build array if zone IDs are non-negative and range is reasonable
# (avoid huge arrays for sparse zone IDs)
if min_zone >= 0 and (max_zone - min_zone + 1) <= len(index_vals) * 10:
self._offset_array = np.full(
max_zone + 1, NOT_IN_SKIM_ZONE_ID, dtype=np.int32
)
self._offset_array[
index_vals.astype(int)
] = offset_series.values.astype(np.int32)
else:
self._offset_array = None
else:
self._offset_array = None
Comment on lines +149 to +162
if self._offset_array is not None:
# Fast path: use numpy array indexing (O(1) per element)
zone_ids = np.asanyarray(zone_ids).astype(int)
# Clip to valid range to avoid index errors, then mark out-of-range as NOT_IN_SKIM
max_valid = len(self._offset_array) - 1
valid_mask = (zone_ids >= 0) & (zone_ids <= max_valid)
# Use clip to safely index, then apply mask
clipped_ids = np.clip(zone_ids, 0, max_valid)
offsets = np.where(
valid_mask,
self._offset_array[clipped_ids],
NOT_IN_SKIM_ZONE_ID,
)
return offsets
Comment on lines +35 to +58
origin_zones = np.asarray(origin_zones)
dest_zones = np.asarray(dest_zones)
n_origins = len(origin_zones)
n_dests = len(dest_zones)

# handle empty input case
if n_origins == 0 or n_dests == 0:
return []

# Create all origin-destination pairs in one go
# all_orig: [o1, o1, o1, ..., o2, o2, o2, ..., oN, oN, oN, ...]
# all_dest: [d1, d2, d3, ..., d1, d2, d3, ..., d1, d2, d3, ...]
all_orig = np.repeat(origin_zones, n_dests)
all_dest = np.tile(dest_zones, n_origins)

# Single skim lookup for all pairs
all_dists = skim_dict.lookup(all_orig, all_dest, "DIST")

# Reshape to (n_origins, n_dests) and find argmin per origin
dist_matrix = np.asarray(all_dists).reshape(n_origins, n_dests)
nearest_indices = np.argmin(dist_matrix, axis=1)

# Return list of (origin, nearest_dest) tuples
return list(zip(origin_zones, dest_zones[nearest_indices]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants