Overview of Kafka Cluster and Components, Replication and Order
Kafka Cluster Architecture
Kafka Broker
A Kafka broker is a server running in a Kafka cluster (or, put another way: a Kafka cluster is made up of a number of brokers).
Each broker instance is capable of handling read and write quantities reaching to hundreds of thousands each second (and terabytes of messages) without any impact on performance.
Each broker has a unique ID and can be responsible for partitions of one or more topic logs.
Connecting to any broker will bootstrap a client to the full Kafka cluster. To achieve reliable failover, a minimum of three brokers should be utilized — with greater numbers of brokers comes increased reliability.
How they assist ZooKeeper:
- Brokers achieve load balancing and reliable redundancy and failover.
- Brokers utilize Apache ZooKeeper for the management and coordination of the cluster.
- Kafka brokers also leverage ZooKeeper for leader elections, in which a broker is elected to lead the dealing with client requests for an individual partition of a topic.
Kafka Component Overview
- A producer sends a message to 1 topic (at a time)
- A consumer subscribes to 1 or more topics.
- A topic has 0 or more consumers.
- A consumer is a member of 1 consumer group.
- A partition has 1 consumer (per group).
- A consumer pulls messages from 0 or more partitions (per topic).
- A partition has 1 or more replicas.
- A partition has 1 leader.
- A partition has 0 or more followers.
- A broker has 0 or 1 replicas (per partition).
- A replica is on 1 broker.
- A cluster has 1 or more brokers.
- A broker is part of 1 cluster.
- A topic is replicated over 1 or more partitions.
Replication
Kafka brokers are able to host multiple partitions. There is no limit on the number of Kafka partitions that can be created (subject to the processing capacity of a cluster).
Each of a partition’s replicas has to be on a different broker:
- All writes to and reads from a topic happen through the leader.
- Each broker can be the leader for 0 or more topic/partition pairs.
- If a leader fails, a replica takes over as a new leader (only in-sync replicas can become a leader).
What does it require to be called a “in-sync” replica:
- A node must be able to maintain its session with ZooKeeper (via ZooKeeper’s heartbeat mechanism)
- If it is a follower it must replicate the writes happening on the leader and not fall “too far” behind
One of the brokers is elected as the “controller”. This controller detects failures at the broker level and is responsible for changing the leader of all affected partitions in a failed broker. If the controller fails, one of the surviving brokers will become the new controller.
If you choose “the number of acknowledgements required” and “the number of logs that must be compared” to elect a leader such that there is guaranteed to be an overlap, then this is called a Quorum.
How to Guarantee Order?
From each partition, multiple consumers can read from a topic in parallel.
Kafka guarantees order within a partition, but not across partitions in a topic.
It is also possible to have producers add a key to a message — all messages with the same key will go to the same partition.
While messages are added and stored within partitions in sequence, messages without keys are written to partitions in a round robin fashion.
By leveraging keys, you can guarantee the order of processing for messages in Kafka that share the same key.
References:
https://kafka.apache.org/documentation
https://www.tutorialspoint.com/apache_kafka/apache_kafka_cluster_architecture.htm
https://devshawn.com/blog/apache-kafka-introduction/
https://www.instaclustr.com/apache-kafka-architecture/
https://www.slideshare.net/FlorentRamiere/devoxx-university-kafka-de-haut-en-bas
https://stackoverflow.com/questions/29511521/is-key-required-as-part-of-sending-messages-to-kafka
Happy Coding!