Created on 2019-05-13 06:36
Published on 2019-05-15 14:36
Our applications and systems are generating more and more data. This is not just user generated data but also data generated by the Applications and Systems themselves. We are talking about logging from applications, systems, networks., etc., as well as metric from applications, systems, and networks. A small application can quickly generate 15 Megs of data-a-day, on a conservative side. (For comparison THE BIBLE is about 3.2 Megs). We as humans can’t read or comprehend this amount of data so when something goes wrong, most will search in the amount of data that they have to find out what is wrong.
Most people learn some of the errors by heart and search for those (90% of all data that is created is not used, ‘generally’ because we always look for the same data, same errors or metrics and ignore the rest, because there is too much data).
When we try to change the way we look at data with dashboards, look at “counters, averages and percentages”, this is a way of reducing the amount of data we have to look at. The side effects of this are 1-Dimension views of the data, and it gives you only the view of the person who created the dashboard. As well, “they” cluster data (which will always lose the fine granularity of the data), this can happen to time elements or ANY other element of DATA, such as loglines, response-times, etc.,
All this happens at a cost, not just missing things or longer search times, but a ‘Monetary Real Cost’, because we will still need to process and store all the data somewhere, regardless if it is used or not.
We need to look very carefully at ALL the data that is generated to see what is ‘really’ useful, to reduce the amount of data we create.
In most cases this has already happened, if not you will need to go through “logging and metric” with determination and eliminate everything that doesn’t help, and/or indicates a problem. Most applications will still produce a large amount of data. Then you will need to start looking into automation of your operations.
The easiest way to start automating your operations is to get applications in place that on basis of simple rules, evaluate the current data and send out notifications if problematic conditions arise. Some applications that could already be in use in your organization, like Elastic Search for logging or Grafana for Metrics, have features built in to do this, these applications are very generic and very easy to setup.
OR you can go one step further and setup ‘specialized’ applications, which are built for this purpose, like Prometheus (this application is specialized in Alerting on Metrics).
The above solutions have a few things in common.
What if that is not enough and you want more?
Artificial Intelligent and Machine Learning Applications, helps to make sense of your Data. The NEW MONITORING TECHNOLOGIES provide a unified view of “ALL” components of a services, from the Application Code to the Infrastructure. Most of the time if these systems are integrated into the operations environment it is called AIOps Applications.
The definition of AIOPS :
A multi-layered technology platform that automates and enhances IT operations by:
AIops applications bring together several different disciplines of your operations.
AIOps Applications can help you find solutions in the following area’s
What should a good AIOps Application be able to do;
A small side note, the term AIOps, is somewhat misleading, most of the applications that are now available are “rule and machine learning” based. Real artificial intelligence is not implemented anywhere (with good reason)! Machine learning has been used within the IT World for some time, such as; Large Social Media Firms, applications like Google Maps, Yelp, Waze and extensively used by Online Marketplaces. Any place where there is a need for reliable real time responses, dynamically changing conditions and user customization. There is a buildup knowledge of how these systems work, and confidence in the systems. This is NOT the case for REAL AL systems.
IT Operations Personnel are in general, conservative in the use of “new technologies”. We need to be able to adapt to the fast changing environments. We need to keep pace with new changes and deal with them directly. We need to handle big data and get useful information out of it. We need a good overview of the complex environments we manage and be able to manage application systems with less man- power. To meet these challenges I think we need these complex AIOps tools , or we will not see the trees due to the forest.
Marcel Koert. B.S.E.E.