Do you need managed Grafana? “Everything” is moving to the cloud, and the popular catch-all motivation is “you don’t have to manage your servers”, and “save time on patching servers and updating software”. That’s all fine in many situations. This article aims to give you some insights and help you on your way as you decide just how you want to run Grafana for your business.
When is the operational overhead way too small to consider as a reason to go for a fully managed cloud solution? Take for instance Grafana and Prometheus. For years I have been running a server with Grafana and Prometheus, and yes it had led to the occasional week where I had to put an hour into working on the backend (updates, changing configuration such as adding and removing hosts and system monitoring), but is that reason enough to lose the control of the server and pay an additional monthly amount (equivalent to a simple dinner out) for a cloud-based solution.
If you can’t spare the minutes and want to let go of the knowledge, then go full steam into a fully managed cloud solution. But if you want to understand what you are running, avoid integration points outside your chosen data center, and you want to know how your observability solution ties into your business and network endpoints, then stick with your own.
Open source and commercial options for visualization, dashboarding, and metrics collection
Monitoring software and the observability sphere are saturated with competing solutions. Some focus on logs, and some focus on time-based real-time monitoring of processes, memory, and storage. One common denominator is that they are becoming more accessible in terms of cost of ownership, and time to set up andmanage. There are a few high-priced options out there, but there are also open-source software such as Grafana (visualizing and dashboarding), Nagios Core (infrastructure monitoring platform), and Prometheus (metrics collection in time-series database).
Is managed Grafana worth it? Questions you should ask yourself
Let’s say you are working with AWS or Azure (the two most used cloud providers). Both have their product offerings Managed Grafana. Then there is the Grafana cloud.
- What is the core philosophy of your IT services infrastructure? On-premise, hybrid cloud, or simply one cloud provider?
- What will it cost you?
- Do you have the in-house competence to set up Grafana?
- Do you have the time to allocate to manage a server running Grafana, with its importers?
- What are some scenarios where you might roll your not-entirely-cloud-based observability solution?
Wait, you can self-host parts of your observability platform
Before making a decision, pick up knowledge about self-hosting Prometheus, Mimir, or Graphite. Grafana Cloud could then be used for visualizations.
If you want to start simple, to setup a Grafana service can be accomplished fast on your own server after doing a wget for the latest .deb (Debian-centric package as an example) Grafana release, and the using systemctl to start/stop the service. Or, go the managed Grafana route.
From there on, your focus becomes the collection of metrics — Also the vital data source for your monitoring setup. As you start the Prometheus metrics collector for the instances you want to monitor with your particular configuration file, one thing is clear: A basic setup is not much work as long as it is working fine. But when problems arise, the hours for troubleshooting can easily rack up and take valuable time away from your core business. That is also where a managed Grafana solution can be worth the price.
Managing infrastructure — Yay or nay?
From a broader perspective, the prospect of tedious and repetitive infrastructure management is all too known. But so is the concept of letting all control go of infrastructure, where one is left at the mercy of the various cloud providers’ decisions to change features, bugs, and sometimes mere “knowledge overhead”. The latter denotes the need to learn the concepts and details about a third party’s systems and manuals, so that integration can be completed.
“Cloud-first” is a mantra for some. But just how large is your observability platform and what are the possible contention points — Does it crash a lot, do you often have to jump on the command line tomake sure that the server is configured properly (or did you just write some Bash scripts and cron the entire thing)? You might be better off just trying both a self-hosted and a managed solution.
Observability according to AWS
Being partial to AWS as a public cloud provider (but I acknowledge that Azure can be used equally well), here is how AWS describes the pillars of full observability: Metrics, logs, and traces. Check out more information about the three main sources of full observability as presented by AWS here: https://docs.aws.amazon.com/whitepapers/latest/aws-caf-operations-perspective/observability.html
In short, metrics span collecting and visualizing data to show health and performance. Logs collect log data, usually as files. Traces are working through the paths of requests as they travel through different services, to identify and solve root causes of performance issues.
The Prometheus UI instead of Grafana Explore View, and Grafana and Prometheus as separate projects
After reaching out to Grafana to learn more about the ecosystem, I learned that there is an overlap between the Grafana and Prometheus communities, because contributors can be involved with both projects.
Moreover, the Prometheus UI can be used instead of the Grafana Explore View, although the latter has more features. There is no tight coupling between the Prometheus and Grafana software release cycles. However, thanks to the public APIs of both tools.
Next steps?
Install Grafana open source. Then try a demo of the Grafana-managed cloud solution. Don’t stop at the theory, simply run the software and involve your own business stakeholders to show what works and what does not in your particular environment.