-
Notifications
You must be signed in to change notification settings - Fork 175
Implement zero-copy optimization for single-read operations #840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
googlyrahman
wants to merge
2
commits into
fsspec:main
Choose a base branch
from
ankitaluthra1:buffx
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+370
−218
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,12 +27,7 @@ | |
| from gcsfs import zb_hns_utils | ||
| from gcsfs.core import GCSFile, GCSFileSystem | ||
| from gcsfs.retry import DEFAULT_RETRY_CONFIG, get_storage_control_retry_config | ||
| from gcsfs.zb_hns_utils import ( | ||
| DirectMemmoveBuffer, | ||
| MRDPool, | ||
| PyBytes_AsString, | ||
| PyBytes_FromStringAndSize, | ||
| ) | ||
| from gcsfs.zb_hns_utils import DirectMemmoveBuffer, MRDPool | ||
| from gcsfs.zonal_file import ZonalFile | ||
|
|
||
| logger = logging.getLogger("gcsfs") | ||
|
|
@@ -411,39 +406,34 @@ async def _fetch_range_split( | |
| await mrd.close() | ||
|
|
||
| async def _concurrent_mrd_fetch(self, offset, length, concurrency, mrd_or_pool): | ||
| """Helper to handle concurrent chunk downloads into a DirectMemmoveBuffer.""" | ||
| """Helper to handle concurrent chunk downloads cleanly.""" | ||
| concurrency = ( | ||
| concurrency if length >= self.MIN_CHUNK_SIZE_FOR_CONCURRENCY else 1 | ||
| ) | ||
| result_bytes = PyBytes_FromStringAndSize(None, length) | ||
| buffer_ptr = PyBytes_AsString(result_bytes) | ||
|
|
||
| part_size = length // concurrency | ||
| tasks = [] | ||
| buffers = [] | ||
| loop = asyncio.get_running_loop() | ||
|
|
||
| # Track if the core download process failed | ||
| tasks = [] | ||
| views = [] | ||
| has_error = False | ||
|
|
||
| async def _download(o, s, b, mrd_or_pool): | ||
| # The master buffer manages its own allocation under the hood | ||
| master_buffer = DirectMemmoveBuffer(length, self._memmove_executor) | ||
|
|
||
| async def _download(o, s, view, mrd_or_pool): | ||
| async with _get_mrd_from_pool_or_mrd(mrd_or_pool) as m_client: | ||
| await m_client.download_ranges([(o, s, b)]) | ||
| await m_client.download_ranges([(o, s, view)]) | ||
|
|
||
| for i in range(concurrency): | ||
| part_offset = offset + (i * part_size) | ||
| actual_size = part_size if i < concurrency - 1 else length - (i * part_size) | ||
|
|
||
| part_address = buffer_ptr + (part_offset - offset) | ||
| buf = DirectMemmoveBuffer( | ||
| part_address, | ||
| part_address + actual_size, | ||
| self._memmove_executor, | ||
| ) | ||
| buffers.append(buf) | ||
| # Give each task a restricted view of the master buffer | ||
| view = master_buffer.get_view(part_offset - offset, actual_size) | ||
| views.append(view) | ||
|
|
||
| tasks.append( | ||
| asyncio.create_task( | ||
| _download(part_offset, actual_size, buf, mrd_or_pool) | ||
| _download(part_offset, actual_size, view, mrd_or_pool) | ||
| ) | ||
| ) | ||
|
|
||
|
|
@@ -453,6 +443,8 @@ async def _download(o, s, b, mrd_or_pool): | |
| if isinstance(res, Exception): | ||
| has_error = True | ||
| raise res | ||
| for view in views: | ||
| view.close() | ||
| except BaseException: | ||
| has_error = True | ||
| for t in tasks: | ||
|
|
@@ -461,18 +453,17 @@ async def _download(o, s, b, mrd_or_pool): | |
| await asyncio.gather(*tasks, return_exceptions=True) | ||
| raise | ||
| finally: | ||
| for buf in buffers: | ||
| try: | ||
| await loop.run_in_executor(None, buf.close) | ||
| except BufferError: | ||
| # If we are already handling a network/download exception, | ||
| # ignore the BufferError (which is just a symptom of the drop). | ||
| # If there's no download error, this means the buffer logic | ||
| # itself failed, so we must surface the error. | ||
| if not has_error: | ||
| raise | ||
|
|
||
| return result_bytes | ||
| try: | ||
| master_buffer.close() | ||
| except Exception: | ||
| # If we are already handling a network/download exception, | ||
| # ignore the exception from buffer (which is just a symptom of the drop). | ||
| # If there's no download error, this means the buffer logic | ||
| # itself failed, so we must surface the error. | ||
| if not has_error: | ||
| raise | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are multiple errors possible? |
||
|
|
||
| return master_buffer.get_value() | ||
|
|
||
| async def _cat_file( | ||
| self, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large reads can block the asyncio loop while waiting for memmove, as it calls master_buffer.close() synchronously within the finally block of _concurrent_mrd_fetch. Since this coroutine executes directly on the main asyncio event loop thread, this causes a block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually an intentional change, and I want to block the async event loop in this specific case. I know it seems like an anti-pattern, but it is required for performance in these heavily multithreaded scenarios.
When testing with 32 coroutines, I found that running
master.close()synchronously was about 500 MB/s faster than usingawait loop.run_in_executor(None, buff.close)with SDK. Because the system is completely bombarded byctypes.memmoveoperations, the overhead of offloading the close operation becomes too high.If you look at the previous implementation, the initial write operation using
io.BytesIOwas synchronous and consumed the GIL. We were already blocking the event loop to write data into the buffer. With the new approach, we are simply blocking the event loop to wait for a thread to finish writing that data. The end result on the event loop is effectively the same, but with much better throughput.I'm open for suggestions, so feel free to let me know if we can do it in a better way :) - The only condition here is that we need a synchronous
def writemethod to a buffer (similar toio.BytesIOinterface) called from async method.Regarding the concern about large reads, I do not think it will cause problems. Downloading large blocks of data over the network will inherently take significantly longer than writing that data into the buffer.
I have included a script below for your reference. If you run it with
master.close(), you will see it hits over 2 GB/s. If you switch toawait loop.run_in_executor(None, master.close), it drops to around 1.4 GB/s on average.Script
Numbers with loop.run_in_executor(None, master.close)
Numbers for master.close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhixiangli : are you satisfied?