Kubernetes Optimization Engine

A MLOps company solving: Cost, Performance, and Capacity problems

Aug 18, 2020

Welcome to Katen.AI, A Kubernetes MLOps company.

Imagine trillions of containers running across clouds with ~$60 billion dollars of company wealth transferred every year to cloud providers, majority, with no power to negotiate.

Are you happy with what you are paying your cloud provider? Is it justified?

Do you have similar questions?

“Constant trial and error to figure out the right size of my compute.”
- A machine learning startup
“It’s complete nonsense that I have 4 machines with avg. usage of 43% CPU and 16% memory.”
- Movie scheduling company
“You are monitoring my workloads, why don’t you tell me how to distribute them optimally”
- Healthcare ML company
“My developers want to deploy and monitor their apps. They don’t care how, and don’t want to know how the Kubernetes or Prometheus sausage is made. They know what SLA’s (USE/RED) they want. What does it cost?”
— VP Engineering, E-Commerce Platform Company

In the meantime, tell your friends!

How do you justify what you are paying your cloud provider?

By plugging in a cost model on top of your performance monitoring solution because we all know, resources aren’t free. Someone is paying for them. Either your team or your company. The cloud providers will gladly take your money. There was no way to compare the cost across cloud providers for cloud services - until Kubernetes.

With Kubernetes providing the abstraction layer, we can now do a apples to apples comparison of what your cost as it relates to performance and optimize the capacity allocation at the right time, at the right price, in the right cloud. What was not possible in the past, is now very possible, albeit still a hard problem.

Developers, SRE, & DevOps teams should be in control of their costs so they can justify why they are spending what they are, easily. Performance SLA’s desired to achieve SLO’s is what developers know how to instrument, measure, and see. Understanding what these SLA’s cost in real-time will help you better optimize your Kubernetes workloads to maximize your capacity utilization while allocating and provisioning it optimally. Machine learning can make these decisions automatic over time.

Lets start with super imposing a pluggable cost model on to your existing observability stack.

The observability stack: Monitoring, Logging, and Tracing.

Monitoring is already standardized on Prometheus. Upcoming standards such as Open Telemetry (CNCF sandbox project) and Open Metrics will further expand the reach of Prometheus. Grafana Labs is already providing logging using Prometheus (Loki).

No proprietary agents required, no custom metrics formats needed. Prometheus exporters already show you the way to standardize reporting metrics.

We are doing a survey. It’s anonymous and takes < 5 minutes to complete. If you feel the pain of sending your money to cloud providers without having a clear justification for it, let us know what you think.

In an up coming post, we’ll go into a little more detail about the observability stack. We are NOT an observability company but we do ride on some of the base tech that used in observability. Unlike the platform for observability where very large amount of metrics are required at high granularity, we don’t need that so our infrastructure costs are manageable where we don’t to transfer a large cost to customers for metrics storage.

Kubernetes Survey

Please follow us on Twitter

Follow @katen_ai

kayten

Kubernetes Optimization Engine

A MLOps company solving: Cost, Performance, and Capacity problems

Are you happy with what you are paying your cloud provider? Is it justified?

Do you have similar questions?

How do you justify what you are paying your cloud provider?