On Kafka

https://on-kafka.netlify.app

What Is Kafka?

Apache Kafka is an open-source distributed event streaming platform

What Is Event Streaming?

  • Capturing data from event sources
  • Storing these event streams durably
  • Reacting to the event streams

So What Is Kafka?

  • Publish/subscribe messaging system
  • Distributed commit log
  • Data is stored durably, in order, and can be read deterministically

What is Publish/Subscribe Messaging?

  • Publish/subscribe messaging is a pattern
  • Publishers (producers) send messages to a topic
  • Subscribers (consumers) read messages from a topic
  • A topic is like a database table or folder in a filesystem and decouples publishers and subscribers
  • Pub/sub systems often have a broker, a central point where messages are published

Kafka Cluster

image credits

Producers

image credits

Consumers

image credits

Why Kafka?

  • Multiple producers
  • Multiple consumers
  • Disk-based retention
  • Scalable
  • High performance

Use Cases

  • Stream processing (data science, Hadoop)
  • Messaging
  • Activity tracking (websites, LinkedIn)
  • Metrics and logging
  • Commit log

Challenges

  • Distributed systems are complex (first law)
  • Kafka is difficult to operate
  • Many parameters: replication factor, ISR, retention, …
  • Logic within clients (consumers and producers)

References

whoami

https://github.com/johanngyger/on-kafka