Lessons Learned from Managing Kafka Costs
You probably have seen ads where someone claims that their app can save you money by finding subscriptions you forgot about. I have a hard time imaging someone with $100s of dollars of expenses they forgot about, but I have had the occasional one that was missed. The problem is that people are inefficient when it comes to managing “stuff”. That is why there are so many places to store “stuff”. That applies in our daily life where we have many things stored away in closets, here and there, and even in external storage spaces. We take pictures of everything and don’t often go back and clean them up. Our computers are full of documents and files that we’ve downloaded and who knows what. Why? Because getting rid of stuff is hard. But stuff is becoming increasingly important when dealing with cloud hosted services. Let me give you a couple more examples.
I recently did a Tech Talk Webinar on Kafka partitions. Not long thereafter I got a bill from my Confluent Cloud Kafka service. It reminded me that they bill by partition hour. That is each partition you define has a set charge whether you use it or not. Usage incurs additional charges. But it was just another reminder to me that you need to keep monitoring your usage and, in some cases, change behavior to accommodate what your costs may be.
Another case that I ran into was attending a training session for using Kafka on Confluent Cloud. As part of this, the trainer suggested that we set up a KSQL database. I set it up as required and it was pretty simple. At that point I kind of just forgot about it. However, that wasn’t the end of it. I was reviewing my billing statement for the month, and I noticed a charge I didn’t expect. On review, I found that it was for the KSQL. There is a small flat rate charge for it, so once I set it up, whether I used it or not, I still incurred a cost. I needed to go back and delete it and then the cost was reduced.
It is not just Confluent Cloud where something like this happens. Another time, one of our teams had wanted to try out Amazon MSK (their cloud instance of Kafka). While setting it up, they ran into some issues that they were unable to resolve and decided not to use MSK for their specific use case. However, similar to the scenario above, they did not delete the MSK instance that they had created. There was a charge for the servers and components that hosted this. Again, that was not found until the billing statement arrived and someone questioned why there was a charge for MSK.
While the examples I have used above were testing scenarios, the same applies to production applications. They have a lifecycle on how they’re used. For example, the application may start off with limited usage, and then may consume more resources as it increases in popularity. Then, at some point, it could become obsolete and still have resources allocated that are no longer being used. Production applications typically are using more expensive resources.
While cloud instances are convenient, they also demand more attention. Something as small as 25 cents an hour may not sound like much but over a period of a year is over $2000. Some cloud services are obviously much more expensive than that, and different cloud services charge for different components. Some may charge for the servers or the resources they consume, or the number of objects defined and so on. It is very important to know the basis on which your cloud services are billed. Even something as small as a half cent per hour adds up quickly with Kafka partitions when you have many of them for a single topic.
While not the typical use case associated with observability, it is related. You need to be able to see what is going on, what is defined, and how it is used over time. The meshIQ platform is very well suited for this as we can observe what is defined and how it is used, manage those definitions to ensure that they are needed, and correct, and track the application usage to determine which applications are using the most resources.