Optimized healing function for port conflicts

This commit is contained in:
2025-04-15 15:27:33 +02:00
parent b5dc999584
commit 58229e2255
6 changed files with 81 additions and 16 deletions

View File

@@ -1,2 +1,28 @@
# heal-docker
docker-compose restart for containers which are unhealty or excited
# Docker Healer 🩺
## Description
This Ansible role automatically restarts Docker Compose configurations with exited or unhealthy containers on Arch Linux systems. It ensures the stability of containerized workloads by recovering from common error conditions like port binding issues.
## Overview
Tailored for Arch Linux, this role monitors containers for failure states and initiates a controlled restart of affected Compose configurations. If port conflicts prevent recovery, the role stops the affected stack, restarts Docker, and recreates the container environment.
## Purpose
The purpose of this role is to provide automated healing for Docker Compose environments, minimizing manual recovery effort and reducing downtime.
## Features
- **Container Health Monitoring:** Detects unhealthy or exited containers.
- **Automated Recovery:** Restarts failed containers and resolves port binding issues.
- **Run-once Setup Logic:** Ensures idempotent execution by controlling task flow with internal flags.
- **System Role Integration:** Seamlessly integrates with CyMaIS system maintenance logic.
## Credits 📝
Developed and maintained by **Kevin Veen-Birkenbach**.
Learn more at [www.veen.world](https://www.veen.world)
Part of the [CyMaIS Project](https://github.com/kevinveenbirkenbach/cymais)
License: [CyMaIS NonCommercial License (CNCL)](https://s.veen.world/cncl)

View File

@@ -12,10 +12,11 @@ def bash(command):
process = subprocess.Popen([command], stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
out, err = process.communicate()
stdout = out.splitlines()
stderr = err.decode("utf-8").strip() # decode stderr
output = [line.decode("utf-8") for line in stdout]
if process.wait() > bool(0):
if process.returncode > 0:
print(command, out, err)
raise Exception("Exitcode is greater than 0")
raise Exception(stderr) # pass the actual error text
return output
def list_to_string(lst):
@@ -60,10 +61,22 @@ def main(base_directory):
if compose_file_path:
print("Restarting unhealthy container in:", compose_file_path)
print_bash(f'cd {os.path.dirname(compose_file_path)} && docker-compose -p "{repo}" restart')
project_path = os.path.dirname(compose_file_path)
try:
print_bash(f'cd {project_path} && docker-compose -p "{repo}" restart')
except Exception as e:
if "port is already allocated" in str(e):
print("Detected port allocation problem. Executing recovery steps...")
print_bash(f'cd {project_path} && docker-compose down')
print_bash('systemctl restart docker')
print_bash(f'cd {project_path} && docker-compose -p "{repo}" up -d')
else:
print("Unhandled exception during restart:", e)
errors += 1
else:
print("Error: Docker Compose file not found for:", repo)
errors += 1
print("Finished restart procedure.")
exit(errors)

View File

@@ -1,4 +1,5 @@
- name: "reload heal-docker.cymais.service"
- name: restart heal-docker.cymais.service
systemd:
name: heal-docker.cymais.service
state: restarted
daemon_reload: yes

View File

@@ -1,2 +1,26 @@
---
galaxy_info:
author: "Kevin Veen-Birkenbach"
description: "Automated recovery for unhealthy or exited Docker Compose containers."
license: "CyMaIS NonCommercial License (CNCL)"
license_url: "https://s.veen.world/cncl"
company: |
Kevin Veen-Birkenbach
Consulting & Coaching Solutions
https://www.veen.world
min_ansible_version: "2.9"
platforms:
- name: Archlinux
versions:
- rolling
galaxy_tags:
- docker
- docker-compose
- systemd
- automation
- archlinux
repository: https://s.veen.world/cymais
issue_tracker_url: https://s.veen.world/cymaisissues
documentation: https://s.veen.world/cymais
dependencies:
- system-maintenance-lock
- system-maintenance-lock

View File

@@ -9,13 +9,14 @@
copy:
src: heal-docker.py
dest: "{{heal_docker}}heal-docker.py"
notify: restart heal-docker.cymais.service
when: run_once_heal_docker is not defined
- name: create heal-docker.cymais.service
template:
src: heal-docker.service.j2
dest: /etc/systemd/system/heal-docker.cymais.service
notify: reload heal-docker.cymais.service
notify: restart heal-docker.cymais.service
when: run_once_heal_docker is not defined
- name: set service_name to the name of the current role