Applying Machine Learning and Self-Healing techniques to the day operations of a production system has become common practices for most SREs. This post cover some real production use cases like automated failover, forecasting, anomalies detection, risk classification and so on.