When you manage your server with RunCloud, we make sure it always stays up and running. We understand that unexpected service failures can lead to downtime, impacting your users and business.

That’s why we have built the RunCloud Auto Healing feature, a proactive, automated system designed to be your first line of defense against such disruptions.

The auto-healing functionality is built directly into the RunCloud agent and constantly monitors the health of your server’s critical services.

If a service stops unexpectedly, Auto Healing automatically intervenes to bring it back online.

The services currently covered are:

  • Database: MariaDB / MySQL
  • Web Server: Apache / NGINX / OpenLiteSpeed (OLS)
  • Caching & In-Memory Stores: Redis, Memcached
  • Application Runtimes: PHP-FPM
  • Containerization: Docker
  • Queueing: Beanstalkd
  • Process Management: Supervisor
  • Security: Fail2Ban, Firewall (UFW/Firewalld)

How Auto Healing Works

The Auto-Healing process is designed to be both intelligent and transparent. It follows a clear, logical sequence to resolve issues without manual intervention.

1. Detection of an Unplanned Outage

The RunCloud agent performs continuous health checks on essential services. Our Auto-Healing feature does not use supervisord or any other external daemon process. The entire functionality is managed internally by the RunCloud agent’s own background process.

This means Auto-Healing is a self-contained feature that operates independently, making sure that it won’t interfere with any existing applications, custom scripts, or process managers that you might be running on your server.

The only dependency for Auto-Healing is the RunCloud agent itself. If the RunCloud agent goes down, the auto-healing capability will also be inactive until the agent is restored.

Additionally, auto-healing is triggered only when a service goes down for an unknown reason. If you intentionally stop a service using the RunCloud dashboard or the command-line interface (CLI), Auto Healing will respect your action and will not attempt to restart it.

2. Starting the Healing Process

The moment an unplanned outage is detected, the system immediately sends you a notification. This alert informs you that a specific service has failed and that the Auto Healing process is beginning its recovery attempts.

This ensures you are always aware of what’s happening on your server.

3. The Automated Restart Cycle

Once initiated, Auto Healing will attempt to restart the failed service. The system follows a retry policy to prevent an endless loop in case of a persistent underlying issue.

Auto Healing will try to restart the service a maximum of 5 times. Each attempt is logged, and the system waits a short interval between retries to allow the service time to initialize properly.

4. Successful Recovery and Counter Reset

If the service successfully restarts and remains stable at any point during the five attempts, the Auto-Healing process is considered a success.

  • When the service comes back online, its individual Auto Healing counter is reset to 0.
  • The system will then resume its normal monitoring state for that service.

Each service has its own independent counter. For example, if MariaDB fails and is restarted, its counter is reset without affecting the counters for NGINX or Redis.

5. Handling Persistent Failures

If Auto Healing attempts to restart the service 5 times and fails each time, the system concludes that the issue is persistent and likely requires manual investigation. After the fifth failed attempt, Auto Healing will stop trying to restart the service.

A final notification is sent to you, stating that the automated restart has failed and that manual intervention is now required. This prevents the system from masking a deeper problem (such as a misconfiguration, corrupted files, or resource exhaustion) and ensures you are alerted to take action.

Enabling and Disabling Auto Healing

The Auto Healing feature is enabled by default on all new and existing servers to ensure your server benefits from this protection immediately.

You have full control over this functionality.

To manage these settings, navigate to Server → Settings → Auto Healing Services Settings in your RunCloud dashboard.