What Is Kafka?
Apache Kafka is an open-source distributed event streaming platform
What Is Event Streaming?
- Capturing data from event sources
- Storing these event streams durably
- Reacting to the event streams
So What Is Kafka?
- Publish/subscribe messaging system
- Distributed commit log
- Data is stored durably, in order, and can be read deterministically
What is Publish/Subscribe Messaging?
- Publish/subscribe messaging is a pattern
- Publishers (producers) send messages to a topic
- Subscribers (consumers) read messages from a topic
- A topic is like a database table or folder in a filesystem and decouples publishers and subscribers
- Pub/sub systems often have a broker, a central point where messages are published
Why Kafka?
- Multiple producers
- Multiple consumers
- Disk-based retention
- Scalable
- High performance
Use Cases
- Stream processing (data science, Hadoop)
- Messaging
- Activity tracking (websites, LinkedIn)
- Metrics and logging
- Commit log
Challenges
- Distributed systems are complex (first law)
- Kafka is difficult to operate
- Many parameters: replication factor, ISR, retention, …
- Logic within clients (consumers and producers)