Distributed messaging system, message broker (basic before understanding kafka, rabbitmq)

Hello, this is codeshow.
In this class, we will learn distributed messaging system.
In a messaging system, a message is data that can be exchanged between two or more relationships.
The data here is usually binary data. Send and receive data using serializers such as json or Protocol Buffers.
With the messaging system in the middle, one side has a publisher and the other side has a subscriber relationship.
A publisher is a client that produces messages and submits them to message queues. Sending a message to a queue is called publish .
A subscriber is a client that receives messages from a message queue. Getting messages from the queue is called consume .
Messages are publish by publisher , and messages travel through the messaging system to subscribers.
Messages always flow one-way: from the publisher to the messaging system to the subscribers.
For reference, we use the terms producer instead of publisher and consumer instead of subscriber.
In addition, pub and sub are used by shortening publisher and subscriber.

To understand the relationship between publisher and subscriber, I will explain the code as an example.

package main

import "time"

func main() {
    c := make(chan string)
    go pub(c)
    go sub(c)
    select {}
}

func sub(c <-chan string) {
    for {
        message := <-c
        println(message)
    }
}

func pub(c chan<- string) {
    c <- "Hello World 1"
    time.Sleep(1 * time.Second)
    c <- "Hello World 2"
    time.Sleep(1 * time.Second)
    c <- "Hello World 3"
    time.Sleep(1 * time.Second)
    c <- "Hello World 4"
    time.Sleep(24 * time.Hour)
}

In this code, a string type channel is created in the main function.
The pub function publish a “Hello World” character.
And the sub outputs “Hello World” when a message arrives in the channel in the for infinite loop.
Inside the pub function, we publish a string to the channel,
The sub function consume the string through the channel.
A module that consume and processes messages is also called a worker.

The relationship between publisher, messaging system, and subscriber was discussed earlier.
In the old language code above, the pub function acts as a publisher and the sub function acts as a subscriber.
And you can think of the c variable of string channel type between this pub and sub as a messaging system.

If a language doesn’t provide messaging, like GoLanguage, the framework does.
Taking Java’s Spring Framework as an example, it emits messages of type ApplicationEvent.
And you can subscribe to publish messages through EventListener annotation.
At this time, the messaging system is taken care of by the Spring framework.
Note that messages and events are similar.
The difference between the two is that messaging usually stores messages in a storage called Queue, so subscribers can process them later rather than immediately.
Events are dispatched to subscribers the moment they occur, and events are destroyed.

As explained above, messaging has a storage called Queue.
That’s why it’s called Message Queue.

In general, messaging systems are designed for message delivery and processing in a distributed environment.
So neither Go language’s channels nor Spring Framework’s EventListeners are messaging systems in a limited context.
However, both the preceding and the following follow an event driven architecture and can be viewed as messaging systems in a broad sense.

From now on, let’s look at kafka and rabbitmq, which are the most used distributed messaging systems.

First, kafka and rabbitmq are both messaging systems that operate in a distributed environment.
Also called message broker .

Distributed messaging systems typically provide scalability, high availability, and reliability.

For scalability, kafka and rabbitmq can configure a cluster.
This allows horizontal scaling by increasing the number of nodes.
Throughput can be increased through horizontal scaling.
High Availability means that the messaging system remains operational even if some nodes are down.
Reliability is the reliable behavior of message delivery, such as not losing published messages and not sending duplicate messages to subscribers.

kafka and rabbitmq satisfy the above criteria as distributed messaging systems.

And the publisher publish one message at a time, but through the messaging system, this message can be replicated and delivered to multiple queue.

For example,

kafka divides messages into topic.
And this topic is again divided into several partition.
And on the subscriber side, multiple worker can use one message through consumer groups.

rabbitmq consists of exchange and queue.
Exchanges determine routing to which queue messages are delivered.
Rabbitmq has 4 types of exchanges: direct, fanout, topic and headers.
Not only can messages be forwarded to various exchanges, but also routing can be performed from exchange to exchange.
Compared to kafka, rabbitmq has a very flexible and convenient message routing design.

Instead, because rabbitmq is a queue , if a consumer dequeues a message from the queue , the message is removed from the queue.
On the other hand, in kafka , the storage units are topic and partition , not queue .
A topic is a logical unit that distinguishes messages, and a unit that is actually stored is a partition.
Also, when a consumer consumes messages from a partition, it does not delete messages like a queue.
Remember a specific offset and subscribe to data after the offset . It’s like an array.

In this difference, rabbitmq can attach multiple subscribers to one queue, but kafka can attach only one consumer group to one partition .

The difference between the two is also the difference in storage.
rabbitmq is mostly in memory and kafka is on disk.

That’s why we use kafka as a persistent data store, like a database.

This difference in behavior is reflected in the design of the messaging system depending on what the messaging system is going for.
rabbit mq is aimed at message queuing.
So compared to kafka, message queuing is very flexible about routing and queue creation and deletion.
kafka is aimed at large message processing compared to rabbitmq.
So, compared to rabbitmq, it has superior performance for mass batch work.

Above, we looked at distributed messaging systems and briefly looked at rabbitmq and kafka.
We’ll take a closer look at kafka and rabbitmq through code.

Finally, we have summarized 5 reasons why we use messaging systems.

Communication: Data can be passed between publisher and subscriber. This frees you from constraints between different languages, platforms, and heterogeneities.
Asynchronous: Compared to synchronous communication, asynchronous communication can increase performance compared to procedural communication.
High Availability: In a distributed messaging system, normal operation is guaranteed even if some messaging systems are down, although performance may be degraded.
Retry: Even if the worker goes down, messages remain in the queue, so if the worker comes back up, it can continue after the previous failed operation. Even if real-time is reduced, the consistency of the data becomes reliable.
Loose coupling: Through indirect calls through the messaging system rather than direct calls, the publisher side does not need to know all the business logic after message publication. It is the role of the publisher only until publication. The message is then delivered to the subscriber via the messaging system. Now it is the subscriber who is responsible for handling the message. The loose coupling achieved through indirect, rather than direct, publisher calls allows our software to be more flexible and scalable.

Subscribe and like notification settings are very helpful for content creators.

thank you