computer-playbook/roles/sys-ctl-hlth-docker-container
Kevin Veen-Birkenbach 26b392ea76
refactor!: replace sys-systemctl with sys-service, add sys-daemon, and rename systemctl_* → system_service_* across repo
- Swap role includes: sys-systemctl → sys-service in all roles
- Rename variables everywhere: systemctl_* → system_service_* (incl. systemctl_id → system_service_id)
- Templates: ExecStart now uses {{ system_service_script_exec }}; add optional RuntimeMaxSec via SYS_SERVICE_DEFAULT_RUNTIME
- Move SYS_SERVICE defaults into roles/sys-service/defaults (remove SYS_SERVICE_ALL_ENABLED & SYS_SERVICE_DEFAULT_STATE from group_vars/07_services.yml)
- Tidy group_vars/all/08_timer.yml formatting
- Introduce roles/sys-daemon:
  - default manager timeouts (timeouts.conf)
  - optional purge of /etc/systemd/system.conf.d
  - validation via systemd-analyze verify
  - handlers for daemon-reload & daemon-reexec
- Refactor sys-timer to system_service_* variables (docs and templates updated)
- Move filter_plugins/filetype.py under sys-service
- Update meta/README to point to official systemd docs
- Touch many roles (backup/cleanup/health/repair/certs/nginx/csp/wireguard/ssd-hdd/keyboard/update-docker/alarm compose/email/telegram/etc.) to new naming

BREAKING CHANGE:
- Role path/name change: use `sys-service` instead of `sys-systemctl`
- All `systemctl_*` vars are now `system_service_*` (e.g., on_calendar, state, timer_enabled, script_exec, id)
- If you have custom templates, adopt RuntimeMaxSec and new variable names

Chat context: https://chatgpt.com/share/68a47568-312c-800f-af3f-e98575446327
2025-08-19 15:00:44 +02:00
..

Docker Container Health Check

Description

This role monitors the health status of Docker containers on the system. It detects containers that are either unhealthy or have exited with a non-zero code, and triggers alerts if issues are found.

Overview

The role installs a health check script along with a systemd service and timer to run these checks at scheduled intervals.
If unhealthy or failed containers are detected, the configured failure notifier (via sys-ctl-alm-compose) is triggered.

Purpose

The primary purpose of this role is to ensure that Docker-based services remain operational. By automatically monitoring container health, it enables administrators to react quickly to failures, reducing downtime and preventing unnoticed service degradation.

Features

  • Automated Health Checks: Detects containers in unhealthy state or exited with non-zero exit codes.
  • Systemd Integration: Installs a one-shot service and timer to run health checks on a schedule.
  • Alerting Support: Works with the sys-ctl-alm-compose role for failure notifications.
  • Configurable Script Location: Controlled via the PATH_ADMINISTRATOR_SCRIPTS variable.

Further Resources