Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions proto/controls/service/data-cache/cache.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
syntax = "proto3";

package services.cache;

service KafkaService {
// Create a topic with partitions and replication factor
rpc CreateTopic(CreateTopicRequest) returns (CreateTopicResponse);

// Send a single message to a topic
rpc Produce(ProduceRequest) returns (ProduceResponse);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our data logger process/service is likely a good example of why a streaming producer is a good idea.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming is added to the gRPC interface


// Stream messages from a topic (server streaming)
rpc Consume(ConsumeRequest) returns (stream ConsumeResponse);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with Kafka standard operations. Is there a way to request multiple topics? This seems to imply a socket connection per device, which I think is a single device at a single rate.

@jacob-curley-fnal jacob-curley-fnal Nov 17, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most implementations of a kafka consumer support one-consumer-many-topics setups.

Seems like the question is where we want the complexity to emerge. If each consumer is on one topic, we could have data from a single consumer be streamed to any requesting external client, and thereby limit the maximum number of active consumers. Allowing arbitrary combinations of topics means it gets harder to reuse consumers for different clients.

On the other hand, we have the concern you bring up, of a single client now needing many separate connections to listen on many topics, instead of a few connections.

But there's also the question of why we'd want data from many topics to be mangled into one stream. Usually a topic contains a specific kind of data. The data can come from many sources, but the idea is each topic is its own little pool of things that can be operated on or reasoned about in the same way. Allowing consumers to stream many topics back in one connection kinda breaks this pattern, making it harder to know what the data is that we're getting, and forcing external clients to implement a bunch of logic to disambiguate the data that comes in.

@amolfnal amolfnal Nov 17, 2025

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, consumer has ability to consume messages from different topics. We can make an interface which has ability to consumer messages from different topics with single gRPC call.
is this the requirement?

}

message CreateTopicRequest {
string topic = 1;
int32 partitions = 2;
int32 replication_factor = 3;
}

message CreateTopicResponse {
bool success = 1;
string message = 2;
}

message ProduceRequest {
string topic = 1;
string key = 2; // optional
bytes value = 3;
}

message ProduceResponse {
bool success = 1;
string message = 2;
int64 offset = 3;
}

message ConsumeRequest {
string topic = 1;
string group_id = 2;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is group_id? It doesn't mirror the ProduceRequest.

@jacob-curley-fnal jacob-curley-fnal Nov 17, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a kafka-ism. Each consumer of a topic can optionally be in a group. If in a group, kafka will spread consumers across the partitions for the topic as evenly as possible. If there are fewer consumers in the group than partitions, some consumers will get multiple partitions. If there are more consumers than partitions, some consumers will not get any data at all. Probably better to not use groups unless we're sure we're ok with some client only getting some of the messages from a topic.

EDIT: Wanted to clarify, if a consumer is not in a group, or if it is the only consumer in its group, it will get all the partitions of a topic (unless the consumer has been configured to listen on a specific subset of partitions, which is also a possibility). And you can have as many consumers listening to the topic that you want. They only get throttled when in the same group.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding group_id, I should not include (thank you Beau nice catch)
This is more kafka related which may not be present if we changed from kafka to something else
Therefore, I should remove this.....

int32 timeout_seconds = 3; // optional, stop after secs (0 = no timeout)
}

message ConsumeResponse {
string key = 1;
bytes value = 2;
int64 offset = 3;
int32 partition = 4;
}