Choosing Apache Kafka For A New Project – A Questionnaire

August 29, 2023· #kafka #system-design

In any modern project where there is a need to process events – a set of messages or a stream of data – developers often propose Apache Kafka as the infrastructure solution. This is not always a weighted choice – where a classic broker like ActiveMQ will do, marketing sometimes prevails.

But let's assume that you have deliberately chosen Kafka, or that the infrastructure team has left you no alternative. Before setting up broker parameters and writing producers and consumers, what questions should you ask yourself? To ensure a smooth start, I have prepared the following checklist:

  1. The amount of data that is going to be generated by the producers → Will the network channel be sufficient for the entire system and its critical components? Is there an option to make use of message compression?

  2. Data retention policy: How long do you need to keep data? → Consider the business and data protection requirements of a product you are developing, and the cost of storing data.

  3. Message sending guarantees (Acks) → Finding the right balance between latency and reliability within replication (durability).

  4. Message delivery guarantees → How critical is message loss or duplication of messages to the business objective? Are idempotency and transactionality needed?

  5. What partitioning strategy will producers use? → Is the default strategy (Default partitioner) appropriate?

  6. For a particular topic, is it important to store the entire message log, or are the latest changes sufficient → Consider using Compacted topics.

  7. Do the created topics require a consumer group? → How do you plan to scale consumers and their bandwidth? What happens when the group is rebalanced?

That's it. The checklist/questionnaire is by no means exhaustive and it only covers the essentials. It leaves out a lot of things such as data encryption, authentication, authorization, and cluster configuration – assuming that the SRE team or some PAAS will take care of that for you.