Update Your Monitoring

Created on 2020-12-05 09:11

Published on 2020-12-05 09:21

Update Your Monitoring

From time to time you will need to go thru all your monitoring tooling and look what is outdated and what can still work fine. For those times I have a few suggestions. I will assume you are a DEV/OPS organization. 

I will spilt those suggestions in 5 monitoring area’s

·      Logging

·      Metrics

·      Real user

·      Tracing

·      Alerting

First a few basic things that apply to all of them.

·      Has the amount of data changed in the last year or will it change in the next year?

·      What is the budget for this item are we under budget or over?

·      What are your retention period that you need on the data created?

·      Where do you want to access the monitoring? From internal network, Internet or a mobile application.

·      I would always go if possible for decoupled monitoring.

·      You want the central part to be maintained by a Monitoring team.

·      The configuration for the DEV/OPS teams should be maintained by the DEV/OPS teams them self. 

·      Preferable by monitoring as code. So configuration in a GIT repro.

·      Security can be separated by DEV/OPS team.

·      Look for a product where there is large user base.

·      Always have minimal 3 environments Test, Acceptance & Production.

Logging

I would everybody to look at open source solutions, a few years (around 6) ago these solutions where not mature enough in my opinion for production but most of them are now in a way that they rival the payed products.

The scalability of this product should separate in 2 parts storage and search capacity. Do not make the mistake that they are linked that could become very expansive.

Metrics

Metrics has become the gold standard of monitoring and alerting. If you get this correct then you will use your logging just for investigation not for monitoring.

You need a low overhead system that can handle a lot of data quickly. Metrics is a lot of data but not very big.

For this there have been a lot of products in de open source market for years that do their job very well. So I would also go open source in this cases.

For metrics there are 2 streams on architecture. One is give every team a small system them self and then they can maintain it. I am for the second stream one “big” system this because you will always have things that need to be done for everybody. If everybody is in 1 system, the updating on that is a lot easier. 

Real User

I am sorry to say I have not found any open source ( none payed ) solution that works as good as the payed solutions. So for this you will need to pay.

What do you want to monitor? This is the main question. Is your application/webpage available from the internet, are the apis available from the internet?

Do you want to walk thru a website? 

Do you need extra authentication? 

From what countries are my consumers if it is on the internet?

All these things play a tool in deciding what tool to use.

Also be aware setting this up takes time and effort if you want to do it correctly and with out to much false positives.

Tracing

The most useful of tools in a large micro service environment.

Again go open source in the last 3 years a lot has changed in the world of transaction tracing. Before only expansive solutions where working correctly with a minimal in lag but now there are one or 2 open source solutions out there that do just as good.

This type of monitoring will create a bucket load of data so be careful with your retention periods. 

If possible go for the building to the application, Tracing libraries, solutions but there are also some good java agents out there but they always seem to create some lag in the transactions. Not a lot but you have to be aware of it.

Alerting

Why do I mention this separately? 

All the monitoring applications come with Alerting build in. The problem with that is that the DEV/OPS teams now have to maintain there telephone numbers , schedules, etc.. in all the applications. It would be a lot better if you have that centralized.   

So find you a Alerting applications that can route all the alerts from your monitoring applications. This by webhook or plugin. And then have 1 location where teams can keep their schedules. 

Conclusion

Every organization is different so keep an eye open for what is needed in your organization. But if you keep to the things i wrote i do not believe you will go wrong.

Now the Big one I will recommend 1 application per categories these are my chooses and nobody payed me for these chooses. For the real user monitoring I do not have a choose at this moment.

Logging : Elastic search https://www.elastic.co

Metrics : Prometheus https://prometheus.io

Tracing : Open Tracing platform https://opentracing.io

Alerting : PagerDuty https://www.pagerduty.com