kafka consumer groups explained

Hello, this is codeshow.
In this time, we will learn about kafka consumer groups.
In practice , consumer groups are often used rather than connecting consumer directly to partition .
consumer groups allow for more automated partition consume than connecting consumer directly to partition .
For example, to consume a topic composed of 4 partition , you need to create 4 consumer that look at partition 0 to 3.
This is because one consumer can consume only one partition .
However, if consumer groups are used, one consumer can consume all four partition .
And when the number of consumer in the same group increases, the partition to consume is automatically rebalancing.
Unlike directly consume partition , there are many benefits to operation because they are consume automatically.

We will practice consumer groups .
Environment setting will continue from the previous video.
Run devcontainers for kafka practice.
Wait until the container runs.
When the container is ready, we’ll open docker desktop.
Execute shell in kafka container .

Let’s delete the previously created topic and start practicing.
If there is no hello topic, you can skip this step.

kafka-topics --bootstrap-server kafka:9092 --delete --topic hello

Delete a topic with the kafka topics command.

Create a hello topic with two partition with the kafka topics command.

kafka-topics --bootstrap-server kafka:9092 --create --topic hello --partitions 2

Check the created topic in the topic menu of AKHQ .
I have confirmed that there are two partition.

consume partition 0 and 1 with the kafka console consumer command.

kafka-console-consumer --bootstrap-server kafka:9092 --topic hello --partition 0
kafka-console-consumer --bootstrap-server kafka:9092 --topic hello --partition 1

Like the practice in the previous video, we will produce key and value based on hyphens.
Run kafka console producer.

kafka-console-producer --bootstrap-server kafka:9092 --topic hello --property "key.separator=-" --property "parse.key=true"

You can see that two consumer are consume messages.
When a consumer consume through the partition option , only one partition can be consume .
Therefore, the topic accumulated in the partition of the previously terminated consumer cannot be consume.

Then, from now on, we will use consumer groups instead of directly consume partition .
Kill the running kafka console consumer.

Adds a group option to an existing command.
The group option specifies the id of the group where the consumer is located.
Enter group id as work.

kafka-console-consumer --bootstrap-server kafka:9092 --topic hello --partition 0 --group work

Executing the command as intended will result in an error.
Because partition option and group option cannot be entered at the same time.
Only one of the two option must be entered.
Exclude the partition option and issue the kafka console consumer command again.

kafka-console-consumer --bootstrap-server kafka:9092 --topic hello --group work

Now, we will check the operation of consumer groups through AKHQ .
Connect to AKHQ with localhost 8080 port .
Check out the AKHQ topics menu.
Select hello topic from the list of topic .
Select the consumer groups tab.
Select consumer groups whose id is work .
In the topics tab, you can see information about the currently consume topic and partition.
Note that consumer groups can consume one or more topic .
In this exercise, we will consume only one topic .

On the AKHQ page, select the Members tab.
There are columns client id, id, host, and assignments .
The client id is a value entered to identify the consumer.
In the practice, you can see that ‘ console consumer ‘, the default value of the shell command, has been entered.
The client id allows duplicates.
On the other hand, the value of the id column is automatically assigned a unique id within the consumer group.
You can check the consumer’s host information through the host column.
In the assignments column, you can check the partition information of the topic consume by the consumer .
Currently, it is consume two partition 0 and 1 of the hello topic .
Then, run the consumer whose group id is work in the terminal at the bottom.

kafka-console-consumer --bootstrap-server kafka:9092 --topic hello --group work

Refresh the AKHQ phase.
You can see that one more console consumer is added.
As for the client id, there are two console consumer,
You can check that consumer have unique id values through the id column.

One thing to pay attention to is the assignments column.
Previously, one consumer consume partition 0 and 1,
Since there are now two consumer, the two consumer consume partition 0 and 1, respectively.

When the configuration of consumer participating in consumer groups is changed, the consumer can perform partition rebalancing automatically.
Note that since the number of consumer doubled, the throughput of these consumer groups also doubled.

partition rebalancing automatically allocates the number of consumer divided by the partition size.
If the partition size is 10 and there are two consumer, each consume 5.
Conversely, let’s look at the case where the consumer size is larger than the partition size .
Let’s assume that there are 2 partition and 3 consumer, and the consumer is one bigger.
Add a new consumer to the terminal so that there are 3 consumer in the work consumer groups .

kafka-console-consumer --bootstrap-server kafka:9092 --topic hello --group work

Refresh the AKHQ page.
You can see that there are 3 client whose group id is work .
By the way, if you check the assignments column, you can see that two consumer are assigned partition, but one consumer is not.
As such, if the number of consumer exceeds the number of partition , the remaining consumer are not assigned a partition.
You need to be careful because you can waste your server unnecessarily.

Let’s go back to our lab and terminate one consumer in terminal .
Refresh the AKHQ page quickly.
You can see that there are no assigned partition in the assignments column.
If there is a change in the consumer composition of the consumer group,
All consumer stop consume partition and wait until rebalancing.
And when partition rebalancing is completed, the consumer consume again.
Since this process is short, you need to check it quickly to see it.

Then, instead of consumer with the kafka topics command,
What happens if you increase the size of the partition that the kafka node has?
Will the remaining consumer immediately consume the new partition ?
I’ll do an exercise to confirm.
Add consumer.

kafka-console-consumer --bootstrap-server kafka:9092 --topic hello --group work

Refresh the AKHQ page.
You can check unassigned consumer in the assignments column.
Now let’s increase the number of partition to 3 with kafka topics .

kafka-topics --bootstrap-server kafka:9092 --alter --topic hello --partitions 3

However, unlike the addition or removal of consumer in the consumer group,
Even if you increase the partition with the kafka topics command, it is not immediately assigned to the consumer.
For reference, in this exercise, it took about 2 minutes for the partition to be assigned to the consumer group.
There may be a delay, so if you want to process quickly, after using the partition size alter command,
Another way is to add a consumer right away.

The reason we deal with delay is because of the order of topic in kafka .
For example, suppose topic A must be processed before topic B can be processed.
However, A is on partition 0,
B is assumed to be stored on partition #1.
If partition 0 is delayed for 1 minute, B in partition 1 is consume first,
When the delay ends and A topic is consume through partition 0,
A should be saved before B, but B is saved first, so an error occurs.
Since kafka guarantees order by partition ,
A and B topic that require order use the same key so that they can be stored in the same partition .
As a consumer group, when one consumer consume two or more partition ,
You need to check the meta information in the topic, read the partition information, and process each partition separately.
I will check this through the code in another video.

Finally, let’s take a look at managing offset in consumer groups .
When consume directly with the existing partition number, offset information is not separately managed for each consumer .
However, consumer groups records the last offset information processed per partition based on group id .
So, if a specific consumer terminates due to a failure, a partition is immediately assigned to another consumer , and consume can be started from the last processed offset .
Run two consumer and a producer in the terminal .

kafka-console-consumer --bootstrap-server kafka:9092 --topic hello --group work
kafka-console-consumer --bootstrap-server kafka:9092 --topic hello --group work
kafka-console-producer --bootstrap-server kafka:9092 --topic hello --property "key.separator=-" --property "parse.key=true"

Publish the topic by entering the same key.
Check that one consumer is consume.
Kill the consumer that received the topic .
Publish the topic with the same key as the producer again.
You can see that the consumer that previously did not receive the topic is now processing the topic.
Then we will close all consumer.
Publish topic with various key as a producer .

Run the consumer as kafka console consumer again.

kafka-console-consumer --bootstrap-server kafka:9092 --topic hello --group work

You can see that unprocessed topic are consume.

In this way, if you use kafka, even if the consumer fails, you can work from the offset after processing is completed, so you can achieve eventual consistency.
With kafka you can get stable service operation.

For reference, offset information for consumer groups can be found in the consumer offsets topic.
You can look up the consumer offsets topic by selecting ‘ show all topics ‘ on the AKHQ topics page and searching for it.
If you press the ‘ live tail ‘ button at the bottom of the consumer offsets topic , you can check the topic in real time.
If you have group id, topic, partition number as key in topic and click detail view, you can search offset information.
This allows consumer groups to look up the processed offset number.

This concludes the explanation of consumer groups.

Setting like notifications is very helpful for content creators.
thank you