feat: implement merge source for compaction#413
Conversation
dab7993 to
e0dfb13
Compare
| type ( | ||
| Document = util.Pair[seq.ID, []byte] | ||
| DocBlockLocation = util.Pair[[]byte, uint64] | ||
| TokenPosting = util.Pair[[]byte, []uint32] |
There was a problem hiding this comment.
I have a strange feeling that we start to use two terms for same thing (lid and posting). I'd stick to a single term (which is a lid since we don't have much choise). At the same time docs/comments might mention that lids are postings.
Besides, a single posting == a single lid. So, util.Pair[[]byte, []uint32] is techically a posting list. But I'd suggest to stick to lids and rename to TokenLids
There was a problem hiding this comment.
Yeah, what I really meant here is TokenToPostingList, but for brevity I've dropped with part with To because this is Pair. Will rename it.
|
|
||
| // NB: This buffer will be reused across | ||
| // all calls within current field. | ||
| var lidRenamed []uint32 |
| return s.f.blocksData.BlocksOffsets | ||
| } | ||
|
|
||
| func (s *SealedSource) ID() iter.Seq2[indexwriter.DocLocation, error] { |
There was a problem hiding this comment.
nit: not that I care, but you use single noun func name which returns an iter. Is it some Go best practise?
There was a problem hiding this comment.
I do not know, honestly.
|
|
||
| func (s *SealedSource) DocBlock() iter.Seq2[DocBlockLocation, error] { | ||
| return func(yield func(DocBlockLocation, error) bool) { | ||
| // We do not want to cache payload of DocBlock because |
There was a problem hiding this comment.
Yes, but we would sill evict a lot from page cache. And page cache is used a lot by our search workers, it's just not that you can see it directly. So, for compaction (and write path) direct I/O would be preferable. The only downside - compaction could also benefit from readahead.
There is also sendfile system call which will allow to bypass even userspace (zero copy). It seems io.Copy directly uses sendfile. Might be a good choise for copying docblocks. However, it will not work if we unpack doc blocks.
There was a problem hiding this comment.
Yes, but we would sill evict a lot from page cache
Unfortunately, we do not have low-level access methods otherwise I would use fadvise(FADV_DONTNEED), of course.
But, I am not sure this will affect the page cache anyway. Linux is smart about page cache and eviction so such workloads with one-pass pattern will not affect active-list in page cache [1]
direct I/O would be preferable
Yep, however it requires some guarantees that we cannot provide (alignment guarantees, for instance). And this requirement is stated in the article as well -- "It requires both disk operations and off-heap memory buffers to be aligned to the filesystem block size"
There is also
sendfilesystem call which will allow to bypass even userspace
Yep, I am aware of that. However, using sendfile for transferring data between files does not show noticeable performance improvements [2]:
sendfile 159ms 168ms 175ms
pipe+splice 161ms 162ms 163ms (+1.3%)
read+write 16bs 164ms 165ms 178ms (+3.1%)
and, again, we do not have necessary API for doing such low-level IO. On the other hand, there is one slightly different approach explored in this article with preallocating filesystem metadata with fallocate and it shows real performance gains.
And, AFAIK, sendfile was designed to efficient transferring between sockets, not files. So there's a better alternative copy_file_range [3]
[1] https://biriukov.dev/docs/page-cache/4-page-cache-eviction-and-page-reclaim/
[2] https://blog.plenz.com/2014-04/so-you-want-to-write-to-a-file-real-fast.html
[3] https://man7.org/linux/man-pages/man2/copy_file_range.2.html
7689790 to
353316d
Compare
d3e1c9e to
0f1955e
Compare
Description
Yet another pull request of series #336.
This one introduces all buildings blocks required for compaction of fractions.
If you have used LLM/AI assistance please provide model name and full prompt: