-
Notifications
You must be signed in to change notification settings - Fork 0
Added gRPC interface definition for data cache #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
79d3db3
87b194e
441cc13
7145c9e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| syntax = "proto3"; | ||
|
|
||
| package services.cache; | ||
|
|
||
| service KafkaService { | ||
| // Create a topic with partitions and replication factor | ||
| rpc CreateTopic(CreateTopicRequest) returns (CreateTopicResponse); | ||
|
|
||
| // Send a single message to a topic | ||
| rpc Produce(ProduceRequest) returns (ProduceResponse); | ||
|
|
||
| // Stream messages from a topic (server streaming) | ||
| rpc Consume(ConsumeRequest) returns (stream ConsumeResponse); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not familiar with Kafka standard operations. Is there a way to request multiple topics? This seems to imply a socket connection per device, which I think is a single device at a single rate.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Most implementations of a kafka consumer support one-consumer-many-topics setups. Seems like the question is where we want the complexity to emerge. If each consumer is on one topic, we could have data from a single consumer be streamed to any requesting external client, and thereby limit the maximum number of active consumers. Allowing arbitrary combinations of topics means it gets harder to reuse consumers for different clients. On the other hand, we have the concern you bring up, of a single client now needing many separate connections to listen on many topics, instead of a few connections. But there's also the question of why we'd want data from many topics to be mangled into one stream. Usually a topic contains a specific kind of data. The data can come from many sources, but the idea is each topic is its own little pool of things that can be operated on or reasoned about in the same way. Allowing consumers to stream many topics back in one connection kinda breaks this pattern, making it harder to know what the data is that we're getting, and forcing external clients to implement a bunch of logic to disambiguate the data that comes in.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Normally, consumer has ability to consume messages from different topics. We can make an interface which has ability to consumer messages from different topics with single gRPC call. |
||
| } | ||
|
|
||
| message CreateTopicRequest { | ||
| string topic = 1; | ||
| int32 partitions = 2; | ||
| int32 replication_factor = 3; | ||
| } | ||
|
|
||
| message CreateTopicResponse { | ||
| bool success = 1; | ||
| string message = 2; | ||
| } | ||
|
|
||
| message ProduceRequest { | ||
| string topic = 1; | ||
| string key = 2; // optional | ||
| bytes value = 3; | ||
| } | ||
|
|
||
| message ProduceResponse { | ||
| bool success = 1; | ||
| string message = 2; | ||
| int64 offset = 3; | ||
| } | ||
|
|
||
| message ConsumeRequest { | ||
| string topic = 1; | ||
| string group_id = 2; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is group_id? It doesn't mirror the ProduceRequest.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a kafka-ism. Each consumer of a topic can optionally be in a group. If in a group, kafka will spread consumers across the partitions for the topic as evenly as possible. If there are fewer consumers in the group than partitions, some consumers will get multiple partitions. If there are more consumers than partitions, some consumers will not get any data at all. Probably better to not use groups unless we're sure we're ok with some client only getting some of the messages from a topic. EDIT: Wanted to clarify, if a consumer is not in a group, or if it is the only consumer in its group, it will get all the partitions of a topic (unless the consumer has been configured to listen on a specific subset of partitions, which is also a possibility). And you can have as many consumers listening to the topic that you want. They only get throttled when in the same group.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding group_id, I should not include (thank you Beau nice catch) |
||
| int32 timeout_seconds = 3; // optional, stop after secs (0 = no timeout) | ||
| } | ||
|
|
||
| message ConsumeResponse { | ||
| string key = 1; | ||
| bytes value = 2; | ||
| int64 offset = 3; | ||
| int32 partition = 4; | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our data logger process/service is likely a good example of why a streaming producer is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streaming is added to the gRPC interface