Pages

Sunday, February 8, 2026

Why Kafka is Popular

Apache Kafka is a distributed log system designed to handle high-throughput data streams. Below is a structured summary of its architecture, strategies, and trade-offs based on the transcript provided.


Core Architecture

Kafka operates as a distributed streaming platform that allows systems to communicate asynchronously.

  • Decoupling: Kafka acts as a buffer between producers and consumers, allowing them to evolve independently and preventing systems from being overwhelmed by traffic spikes.

  • The Distributed Log: Messages are written to "partitions," which are append-only files stored on disk.

  • Brokers & Clusters: Partitions live on servers called brokers; a collection of brokers forms a Kafka cluster.

  • Topics: Messages are categorized into topics (e.g., payments, user clicks, or video uploads).

  • Message Structure: Each message typically includes a key, a value, a timestamp, and optional headers for metadata.


Partitioning & Scaling Strategies

Partitioning determines how effectively a system scales under heavy load.

  • Hot Partitions: Choosing the wrong key (like a movie ID) can lead to "hot partitions," where one server is overwhelmed while others remain idle.

  • Compound Keys: To balance load, developers use compound keys (e.g., combining a movie ID with a hash of a user ID) to spread events across multiple partitions.

  • Time-based Partitions: These are ideal for log data and simple retention policies but can complicate real-time data aggregation.


Consumption & Reliability

Kafka provides built-in mechanics to track progress and ensure data durability.

Offsets and Consumer Groups

  • Offsets: These act as bookmarks, allowing consumers to record their progress and pick up exactly where they left off after a crash.

  • Consumer Groups: This feature allows multiple consumers to divide the work, with Kafka ensuring each message is processed by only one consumer in the group.

Delivery Guarantees

  1. At Most Once: Fast execution, but carries a risk of message loss.

  2. At Least Once: Ensures no data is lost, but may result in duplicate processing.

  3. Exactly Once: The most reliable but the most complex and slowest to run.

Replication & Durability

  • Leader/Follower Model: Every partition has one "leader" for reads/writes and several "followers" that replicate the data.

  • Acknowledge Rights: Kafka can be configured to wait for all replicas to acknowledge a write, providing maximum safety at the cost of speed.


Production Patterns

  • Real-time Processing: Uber uses geographic partitioning to calculate driver surge pricing in real time for specific regions.

  • Event Sourcing: Using Kafka as a "source of truth" by appending every state change as an event, providing a complete audit trail and the ability to replay the system's state.


Summary of Trade-offs

While powerful, Kafka is not a "one-size-fits-all" solution.

FeatureTrade-off / Limitation
Throughput vs. LatencyOptimized for high throughput; batching and buffering make it unsuitable for request-response patterns needing low latency.
OrderingGuarantees order only within a single partition, not across an entire topic.
ParallelizationGlobal ordering requires a single partition, which prevents parallel processing.
ComplexityOffers immense power (replayability, decoupling) but adds significant operational complexity to the tech stack