5 Commits
v1.8.0 ... main

Author SHA1 Message Date
d6c37435f0 fix: приглушить httpcore/httpx DEBUG в логах; INFO при успешном сохранении алерта
All checks were successful
CI / test (push) Successful in 39s
- httpcore, httpx, asyncio, uvicorn.access → WARNING (убирает TCP-шум из /ui/logs)
- После успешного INSERT irm_alerts: INFO с alert_id, title, sev, team
- Теперь в /ui/logs видно: пришёл вебхук → сохранено (id=...) или ERROR

Made-with: Cursor
2026-04-03 16:05:40 +03:00
c9b97814a5 feat: логирование вебхука до БД + файловый лог с ротацией
All checks were successful
CI / test (push) Successful in 47s
- Каждый входящий POST /ingress/grafana: INFO-строка (status, кол-во алертов,
  первые лейблы) и DEBUG-блок с полным JSON телом (до 8КБ)
  — видно даже если БД упала с 500
- LOG_FILE в .env / env: RotatingFileHandler 10MB×5 файлов
- LOG_LEVEL=debug теперь показывает полные тела вебхуков
- basicConfig уровень DEBUG (uvicorn.access / asyncio приглушены)

Made-with: Cursor
2026-04-03 15:59:17 +03:00
80645713a0 feat: страница логов /ui/logs с SSE real-time потоком
Some checks failed
CI / test (push) Successful in 39s
Deploy / deploy (push) Failing after 15s
- log_buffer: RingBufferHandler, кольцевой буфер 600 записей, fan-out SSE
- ui_logs: GET /ui/logs (HTML), GET /ui/logs/stream (EventSource)
- main: install_log_handler при старте, подключён router логов
- nav_rail: ссылка Логи, root_html: кнопка-ссылка Логи
- Исправлено: NaN/Inf/NUL в теле вебхука → 500 от PostgreSQL jsonb
- Тесты: test_log_buffer, test_json_sanitize; 51 passed

Made-with: Cursor
2026-04-03 15:56:58 +03:00
18ba48e8d0 release: v1.10.0 — модуль команд (teams), team_id на алертах
Some checks failed
CI / test (push) Successful in 43s
Deploy / deploy (push) Failing after 17s
- Alembic 006: teams, team_label_rules, irm_alerts.team_id
- Вебхук: сопоставление команды по правилам лейблов (priority)
- API/UI Команды; алерты: JOIN team, фильтр team_id
- Тесты test_team_match, test_teams_api; обновлён test_root_ui

Made-with: Cursor
2026-04-03 15:34:46 +03:00
a8ccf1d35c release: v1.9.0 — IRM-алерты отдельно от инцидентов
Some checks failed
Deploy / deploy (push) Has been cancelled
CI / test (push) Successful in 37s
- Alembic 005: таблицы irm_alerts и incident_alert_links
- Модуль alerts: API/UI, Ack/Resolve, привязка к инциденту через alert_ids
- Вебхук Grafana: одна транзакция ingress + irm_alerts; разбор payload в grafana_payload
- По умолчанию инцидент из вебхука не создаётся (AUTO_INCIDENT_FROM_ALERT)
- Документация IRM_GRAFANA_PARITY.md, обновления IRM.md и CHANGELOG

Made-with: Cursor
2026-04-03 15:26:38 +03:00
32 changed files with 1974 additions and 70 deletions

View File

@ -3,9 +3,17 @@
HTTP_ADDR=0.0.0.0:8080 HTTP_ADDR=0.0.0.0:8080
LOG_LEVEL=info LOG_LEVEL=info
# Запись логов в файл с авто-ротацией (10 МБ × 5 файлов = ~50 МБ).
# Пусто = не писать в файл (логи только в stdout и страницу /ui/logs).
# Пример для docker-compose (volume /logs): LOG_FILE=/logs/onguard24.log
# LOG_FILE=
# Опционально: общий секрет для вебхуков (если у источника в JSON не задан свой webhook_secret) # Опционально: общий секрет для вебхуков (если у источника в JSON не задан свой webhook_secret)
# GRAFANA_WEBHOOK_SECRET= # GRAFANA_WEBHOOK_SECRET=
# Устаревшее: автосоздание инцидента на каждый вебхук (дублирует irm_alerts). Обычно не нужно.
# AUTO_INCIDENT_FROM_ALERT=1
# Несколько Grafana: JSON-массив. slug — часть URL вебхука: /api/v1/ingress/grafana/{slug} # Несколько Grafana: JSON-массив. slug — часть URL вебхука: /api/v1/ingress/grafana/{slug}
# Пример: [{"slug":"adibrov","api_url":"https://grafana-adibrov.example","api_token":"glsa_...","webhook_secret":"длинный-секрет"}] # Пример: [{"slug":"adibrov","api_url":"https://grafana-adibrov.example","api_token":"glsa_...","webhook_secret":"длинный-секрет"}]
# Если пусто, но задан GRAFANA_URL — один источник со slug "default" (вебхук /api/v1/ingress/grafana/default) # Если пусто, но задан GRAFANA_URL — один источник со slug "default" (вебхук /api/v1/ingress/grafana/default)

View File

@ -2,6 +2,49 @@
Формат: семантическое версионирование `MAJOR.MINOR.PATCH`. Git-теги `v1.0.0`, `v1.1.0` и т.д. — см. [docs/VERSIONING.md](docs/VERSIONING.md). Формат: семантическое версионирование `MAJOR.MINOR.PATCH`. Git-теги `v1.0.0`, `v1.1.0` и т.д. — см. [docs/VERSIONING.md](docs/VERSIONING.md).
## [1.10.1] — 2026-04-03
### Добавлено
- **Страница логов** `/ui/logs` — кольцевой буфер (600 записей) + **SSE real-time** поток; фильтр по уровню, авто-прокрутка; ссылка на главной и в nav rail (раздел «📋 Логи»).
### Исправлено
- **Вебхук Grafana:** санитизация тела перед записью в `jsonb``NaN` / `±Inf``None`, удаление `\x00` в строках (иначе PostgreSQL и строгий JSON часто давали **500** на реальных алертах с метриками, тогда как «Test contact point» оставался рабочим).
- Ошибки подписчиков **`alert.received`** после успешного коммита в БД больше не рвут ответ вебхука (логируются).
## [1.10.0] — 2026-04-03
Команды (teams) по лейблам, как ориентир на Grafana IRM **Team**.
### Добавлено
- **Alembic `006_teams`:** таблицы `teams`, `team_label_rules`, колонка **`irm_alerts.team_id`**.
- **Модуль «Команды»:** CRUD команд, правила лейблов (`priority`), UI список и карточка.
- **Вебхук Grafana:** подстановка `team_id` по первому совпадению правила.
- **Алерты:** в API и UI колонка команды, фильтр `team_id`, `GET /alerts/?team_id=…`.
### Изменено
- Документация [IRM_GRAFANA_PARITY.md](docs/IRM_GRAFANA_PARITY.md), [IRM.md](docs/IRM.md).
## [1.9.0] — 2026-04-03
Алерты отдельно от инцидентов (модель ближе к Grafana IRM).
### Добавлено
- **Alembic `005_irm_alerts`:** таблицы `irm_alerts`, `incident_alert_links`.
- **Модуль «Алерты»:** API и UI, статусы firing → acknowledged → resolved, полный JSON вебхука, кнопка «Создать инцидент».
- **Вебхук Grafana:** в одной транзакции `ingress_events` + `irm_alerts`.
- **`extract_alert_row_from_grafana_body`** — заголовок, severity, labels, fingerprint.
- **Документация:** [docs/IRM_GRAFANA_PARITY.md](docs/IRM_GRAFANA_PARITY.md).
### Изменено
- **Инцидент из вебхука по умолчанию не создаётся**; включение старого поведения: `AUTO_INCIDENT_FROM_ALERT=1`.
- **POST /incidents:** опционально `alert_ids` для привязки к `irm_alerts`.
## [1.8.0] — 2026-04-03 ## [1.8.0] — 2026-04-03
UI каталога Grafana и инцидентов; правки CI/CD деплоя. UI каталога Grafana и инцидентов; правки CI/CD деплоя.

View File

@ -0,0 +1,74 @@
"""IRM: алерты отдельно от инцидентов (ack/resolve), связь N:M инцидент↔алерт
Revision ID: 005_irm_alerts
Revises: 004_grafana_catalog
Create Date: 2026-04-03
"""
from typing import Sequence, Union
from alembic import op
revision: str = "005_irm_alerts"
down_revision: Union[str, None] = "004_grafana_catalog"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.execute(
"""
CREATE TABLE IF NOT EXISTS irm_alerts (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
ingress_event_id uuid NOT NULL UNIQUE REFERENCES ingress_events(id) ON DELETE CASCADE,
status text NOT NULL DEFAULT 'firing'
CHECK (status IN ('firing', 'acknowledged', 'resolved', 'silenced')),
title text NOT NULL DEFAULT '',
severity text NOT NULL DEFAULT 'warning',
source text NOT NULL DEFAULT 'grafana',
grafana_org_slug text,
service_name text,
labels jsonb NOT NULL DEFAULT '{}'::jsonb,
fingerprint text,
acknowledged_at timestamptz,
acknowledged_by text,
resolved_at timestamptz,
resolved_by text,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
"""
)
op.execute(
"""
CREATE INDEX IF NOT EXISTS irm_alerts_status_created_idx
ON irm_alerts (status, created_at DESC);
"""
)
op.execute(
"""
CREATE INDEX IF NOT EXISTS irm_alerts_ingress_event_id_idx
ON irm_alerts (ingress_event_id);
"""
)
op.execute(
"""
CREATE TABLE IF NOT EXISTS incident_alert_links (
incident_id uuid NOT NULL REFERENCES incidents(id) ON DELETE CASCADE,
alert_id uuid NOT NULL REFERENCES irm_alerts(id) ON DELETE CASCADE,
PRIMARY KEY (incident_id, alert_id)
);
"""
)
op.execute(
"""
CREATE INDEX IF NOT EXISTS incident_alert_links_alert_idx
ON incident_alert_links (alert_id);
"""
)
def downgrade() -> None:
op.execute("DROP TABLE IF EXISTS incident_alert_links;")
op.execute("DROP TABLE IF EXISTS irm_alerts;")

View File

@ -0,0 +1,78 @@
"""IRM: команды (teams) и правила сопоставления по лейблам алерта
Revision ID: 006_teams
Revises: 005_irm_alerts
Create Date: 2026-04-03
"""
from typing import Sequence, Union
from alembic import op
revision: str = "006_teams"
down_revision: Union[str, None] = "005_irm_alerts"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.execute(
"""
CREATE TABLE IF NOT EXISTS teams (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
slug text NOT NULL UNIQUE,
name text NOT NULL,
description text,
created_at timestamptz NOT NULL DEFAULT now()
);
"""
)
op.execute(
"""
CREATE INDEX IF NOT EXISTS teams_slug_idx ON teams (slug);
"""
)
op.execute(
"""
CREATE TABLE IF NOT EXISTS team_label_rules (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
team_id uuid NOT NULL REFERENCES teams(id) ON DELETE CASCADE,
label_key text NOT NULL,
label_value text NOT NULL,
priority integer NOT NULL DEFAULT 0,
created_at timestamptz NOT NULL DEFAULT now(),
CONSTRAINT team_label_rules_key_nonempty CHECK (
length(trim(label_key)) > 0
)
);
"""
)
op.execute(
"""
CREATE INDEX IF NOT EXISTS team_label_rules_team_idx ON team_label_rules (team_id);
"""
)
op.execute(
"""
CREATE INDEX IF NOT EXISTS team_label_rules_priority_idx
ON team_label_rules (priority DESC, id ASC);
"""
)
op.execute(
"""
ALTER TABLE irm_alerts
ADD COLUMN IF NOT EXISTS team_id uuid REFERENCES teams(id) ON DELETE SET NULL;
"""
)
op.execute(
"""
CREATE INDEX IF NOT EXISTS irm_alerts_team_id_idx ON irm_alerts (team_id);
"""
)
def downgrade() -> None:
op.execute("ALTER TABLE irm_alerts DROP COLUMN IF EXISTS team_id;")
op.execute("DROP TABLE IF EXISTS team_label_rules;")
op.execute("DROP TABLE IF EXISTS teams;")

View File

@ -6,9 +6,11 @@
| Область | Назначение | onGuard24 | Grafana / внешнее | | Область | Назначение | onGuard24 | Grafana / внешнее |
|---------|------------|-----------|-------------------| |---------|------------|-----------|-------------------|
| **Инциденты** | Учёт сбоев, статусы (open → resolved), связь с алертом | Модуль `incidents`: таблица `incidents`, API, UI, авто-создание из `alert.received` | Contact point **Webhook**`POST /api/v1/ingress/grafana`; правила алертинга в Grafana | | **Инциденты** | Учёт сбоев, статусы (open → resolved), связь с алертами | Модуль `incidents`: `incidents`, `incident_alert_links`, API, UI; создание вручную или с `alert_ids` | См. **Алерты**; [IRM_GRAFANA_PARITY.md](IRM_GRAFANA_PARITY.md) |
| **Алерты (IRM)** | Приём, Ack/Resolve, не смешивать с инцидентом | Модуль `alerts`: `irm_alerts`, UI/API, вебхук пишет в одной транзакции с `ingress_events` | Grafana IRM Alert Groups; у нас без группировки/эскалации на уровне алерта |
| **Задачи** | Подзадачи по инциденту (разбор, фикс) | Модуль `tasks`: таблица `tasks`, привязка к `incident_id` | Опционально: ссылки из алерта; основная работа в onGuard24 | | **Задачи** | Подзадачи по инциденту (разбор, фикс) | Модуль `tasks`: таблица `tasks`, привязка к `incident_id` | Опционально: ссылки из алерта; основная работа в onGuard24 |
| **Цепочки эскалаций** | Кого звать и в каком порядке при таймаутах | Модуль `escalations`: таблица `escalation_policies` (JSON `steps`), API/UI заготовка | Маршрутизация уведомлений может дублироваться в Grafana contact points; целевая логика — в onGuard24 | | **Цепочки эскалаций** | Кого звать и в каком порядке при таймаутах | Модуль `escalations`: таблица `escalation_policies` (JSON `steps`), API/UI заготовка | Маршрутизация уведомлений может дублироваться в Grafana contact points; целевая логика — в onGuard24 |
| **Команды (teams)** | Фильтр и маршрутизация как в IRM | Модуль `teams`: `teams`, `team_label_rules`, `irm_alerts.team_id`; правила по лейблам при вебхуке | Колонка **Team** в Grafana IRM; у нас сопоставление по `label_key` = `label_value` |
| **Календарь дежурств** | Кто в смене, расписание | Модуль `schedules` (развитие) | Календари/команды — данные в onGuard24; уведомления — через интеграции | | **Календарь дежурств** | Кто в смене, расписание | Модуль `schedules` (развитие) | Календари/команды — данные в onGuard24; уведомления — через интеграции |
| **Контакты** | Люди, каналы | Модуль `contacts` | Получатели в **Contact points** (email, Slack, webhook) | | **Контакты** | Люди, каналы | Модуль `contacts` | Получатели в **Contact points** (email, Slack, webhook) |
| **Светофор / статус сервисов** | Агрегат здоровья | Модуль `statusboard` | Источник метрик — Prometheus/Loki; правила — Grafana | | **Светофор / статус сервисов** | Агрегат здоровья | Модуль `statusboard` | Источник метрик — Prometheus/Loki; правила — Grafana |
@ -17,11 +19,12 @@
| **Пользователи / права** | RBAC | *Пока нет* | SSO Grafana, сеть за reverse proxy | | **Пользователи / права** | RBAC | *Пока нет* | SSO Grafana, сеть за reverse proxy |
| **SLO** | Цели по доступности | *Вне скоупа v1* | Grafana SLO / Mimir | | **SLO** | Цели по доступности | *Вне скоупа v1* | Grafana SLO / Mimir |
## Поток данных (алерт → инцидент) ## Поток данных (как в Grafana IRM)
1. Grafana срабатывает правило → шлёт JSON на **webhook** onGuard24. 1. Grafana срабатывает правило → JSON на **webhook** onGuard24.
2. Сервис пишет строку в `ingress_events`, публикует **`alert.received`**. 2. В одной транзакции: **`ingress_events`** + **`irm_alerts`** (статус `firing`), публикуется **`alert.received`**.
3. Модуль **incidents** подписан на событие и создаёт запись в **`incidents`** с ссылкой на `ingress_event_id`. 3. Дежурный в модуле **Алерты** читает заголовок, лейблы, **Acknowledge** / **Resolve** — это не создание инцидента.
4. **Инцидент** создаётся отдельно (вручную или из карточки алерта), опционально с привязкой **`alert_ids`**. Авто-инцидент из вебхука только при **`AUTO_INCIDENT_FROM_ALERT=1`** (legacy).
## Что настроить в Grafana (обязательно для приёма алертов) ## Что настроить в Grafana (обязательно для приёма алертов)

View File

@ -0,0 +1,40 @@
# Сравнение onGuard24 с Grafana IRM (Alerting / Incident)
Grafana Cloud / IRM даёт **группы алертов**, **Acknowledge / Resolve**, **инциденты**, **команды (teams)**, **эскалационные цепочки**, **расписания дежурств**. Ниже — что уже есть в onGuard24 и что планировать отдельно.
## Уже есть (после разделения алерт / инцидент)
| Grafana IRM (идея) | onGuard24 |
|--------------------|-----------|
| Входящие уведомления от интеграции | Webhook `POST /api/v1/ingress/grafana``ingress_events` + **`irm_alerts`** |
| Статусы firing / acknowledged / resolved | Поле **`irm_alerts.status`**, UI **Алерты**, API `PATCH …/acknowledge`, `…/resolve` |
| Просмотр labels / сырого payload | Карточка алерта в UI, JSON вебхука |
| Инцидент как отдельная сущность | **`incidents`**, создание вручную или кнопка «Создать инцидент» на алерте; связь **`incident_alert_links`** |
| Эскалации (JSON-шаги) | Модуль **Эскалации** (`escalation_policies`) — без автодвижка по таймерам |
| Контакты / каналы | Модуль **Контакты** |
| Расписания (заглушка) | **Календарь дежурств** — UI-задел |
| **Teams** (команда по лейблам) | Таблицы **`teams`**, **`team_label_rules`**, поле **`irm_alerts.team_id`**; вебхук подбирает команду по первому совпадению правила (priority); UI/API **Команды**, фильтр по команде в **Алертах** |
## Пока нет (зрелые следующие этапы)
| Функция Grafana IRM | Заметка |
|---------------------|---------|
| **Teams** как маршрутизация уведомлений (кому слать из коробки) | Команда назначена на алерт; **цепочки уведомлений по team** — впереди (связка с escalations / contact points) |
| **Alert groups** (несколько алертов в одной группе с общим ID) | Сейчас **одна строка `irm_alerts` на один webhook**; группировка fingerprint / rule_uid — отдельная задача |
| **Silence / Restart** из UI | Статус `silenced` в БД зарезервирован, логика не подключена |
| **Эскалация по таймеру** (wait 15m → notify next) | Политики есть, **фонового исполнителя** нет |
| **On-call из расписания** в цепочке | Нет связи schedules → escalation executor |
| **Пользователи / «Mine» / назначение** | Нет учётных записей onGuard24 для дежурного; `acknowledged_by` — свободный текст (можно расширить) |
| **Интеграция обратно в Grafana** (resolve в Grafana из IRM) | Не делалось |
## Переменные окружения
- **`AUTO_INCIDENT_FROM_ALERT`** — если `1` / `true`, сохраняется старое поведение: **каждый** вебхук ещё и создаёт строку в **`incidents`**. По умолчанию **выключено**: только **`irm_alerts`**.
## Рекомендуемый поток
1. Grafana → webhook → **алерт** (`firing`).
2. Дежурный в **Алертах**: прочитал → **Ack** → разобрался → **Resolve** (или сразу Resolve).
3. При необходимости **Создать инцидент** (документирование, задачи, эскалация вручную).
Так модель ближе к IRM, где **алерт** и **инцидент** разведены.

View File

@ -1,3 +1,3 @@
"""onGuard24 — модульный монолит (ядро + модули).""" """onGuard24 — модульный монолит (ядро + модули)."""
__version__ = "1.8.0" __version__ = "1.10.1"

View File

@ -34,6 +34,14 @@ class Settings(BaseSettings):
forgejo_url: str = Field(default="", validation_alias="FORGEJO_URL") forgejo_url: str = Field(default="", validation_alias="FORGEJO_URL")
forgejo_token: str = Field(default="", validation_alias="FORGEJO_TOKEN") forgejo_token: str = Field(default="", validation_alias="FORGEJO_TOKEN")
log_level: str = Field(default="info", validation_alias="LOG_LEVEL") log_level: str = Field(default="info", validation_alias="LOG_LEVEL")
# Путь к лог-файлу. Пусто = не писать в файл. Пример: /var/log/onguard24/app.log
# Файл ротируется: 10 МБ × 5 штук (~50 МБ суммарно).
log_file: str = Field(default="", validation_alias="LOG_FILE")
# Устаревшее: автосоздание инцидента на каждый вебхук (без учёта irm_alerts). По умолчанию выкл.
auto_incident_from_alert: bool = Field(
default=False,
validation_alias="AUTO_INCIDENT_FROM_ALERT",
)
def get_settings() -> Settings: def get_settings() -> Settings:

View File

@ -9,6 +9,9 @@ from starlette.responses import Response
from onguard24.domain.entities import Alert, Severity from onguard24.domain.entities import Alert, Severity
from onguard24.grafana_sources import sources_by_slug, webhook_authorized from onguard24.grafana_sources import sources_by_slug, webhook_authorized
from onguard24.ingress.grafana_payload import extract_alert_row_from_grafana_body
from onguard24.ingress.json_sanitize import sanitize_for_jsonb
from onguard24.ingress.team_match import resolve_team_id_for_labels
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
router = APIRouter(tags=["ingress"]) router = APIRouter(tags=["ingress"])
@ -79,6 +82,42 @@ def service_hint_from_grafana_body(body: dict, header_service: str | None) -> st
return None return None
def _log_incoming_webhook(body: object, raw: bytes, path_slug: str | None) -> None:
"""Логирует каждый входящий вебхук: краткое резюме INFO + полное тело DEBUG."""
slug_tag = f"[/{path_slug}]" if path_slug else "[/]"
raw_len = len(raw)
if not isinstance(body, dict):
logger.info("grafana webhook %s raw=%dB (non-dict body)", slug_tag, raw_len)
logger.debug("grafana webhook %s raw body: %s", slug_tag, raw[:4000].decode(errors="replace"))
return
alerts = body.get("alerts") or []
n_alerts = len(alerts) if isinstance(alerts, list) else 0
status = body.get("status", "?")
title = str(body.get("title") or body.get("ruleName") or "")[:120]
first_labels: dict = {}
if isinstance(alerts, list) and alerts and isinstance(alerts[0], dict):
first_labels = alerts[0].get("labels") or {}
logger.info(
"grafana webhook %s status=%s alerts=%d title=%r labels=%s raw=%dB",
slug_tag,
status,
n_alerts,
title,
json.dumps(first_labels, ensure_ascii=False)[:300],
raw_len,
)
try:
pretty = json.dumps(body, ensure_ascii=False, indent=2)
if len(pretty) > 8000:
pretty = pretty[:8000] + "\n…(обрезано)"
except Exception:
pretty = raw[:8000].decode(errors="replace")
logger.debug("grafana webhook %s full body:\n%s", slug_tag, pretty)
async def _grafana_webhook_impl( async def _grafana_webhook_impl(
request: Request, request: Request,
pool, pool,
@ -95,8 +134,14 @@ async def _grafana_webhook_impl(
body = json.loads(raw.decode() or "{}") body = json.loads(raw.decode() or "{}")
except json.JSONDecodeError: except json.JSONDecodeError:
body = {} body = {}
# Логируем входящий вебхук ДО любой обработки — чтобы видеть при любой ошибке
_log_incoming_webhook(body, raw, path_slug)
if not isinstance(body, dict): if not isinstance(body, dict):
body = {} body = {}
else:
body = sanitize_for_jsonb(body)
derived = extract_grafana_source_key(body) derived = extract_grafana_source_key(body)
path_key: str | None = None path_key: str | None = None
@ -119,19 +164,47 @@ async def _grafana_webhook_impl(
logger.warning("ingress: database not configured, event not persisted") logger.warning("ingress: database not configured, event not persisted")
return Response(status_code=202) return Response(status_code=202)
title_row, sev_row, labels_row, fp_row = extract_alert_row_from_grafana_body(body)
async with pool.acquire() as conn: async with pool.acquire() as conn:
row = await conn.fetchrow( async with conn.transaction():
""" row = await conn.fetchrow(
INSERT INTO ingress_events (source, body, org_slug, service_name) """
VALUES ($1, $2::jsonb, $3, $4) INSERT INTO ingress_events (source, body, org_slug, service_name)
RETURNING id VALUES ($1, $2::jsonb, $3, $4)
""", RETURNING id
"grafana", """,
json.dumps(body), "grafana",
stored_org_slug, json.dumps(body, ensure_ascii=False, allow_nan=False),
service_name, stored_org_slug,
) service_name,
raw_id = row["id"] if row else None )
raw_id = row["id"] if row else None
if raw_id is not None:
team_id = await resolve_team_id_for_labels(conn, labels_row)
await conn.execute(
"""
INSERT INTO irm_alerts (
ingress_event_id, status, title, severity, source,
grafana_org_slug, service_name, labels, fingerprint, team_id
)
VALUES ($1, 'firing', $2, $3, 'grafana', $4, $5, $6::jsonb, $7, $8)
""",
raw_id,
title_row or "",
sev_row,
stored_org_slug,
service_name,
json.dumps(labels_row, ensure_ascii=False, allow_nan=False),
fp_row,
team_id,
)
logger.info(
"grafana webhook saved: alert_id=%s title=%r sev=%s team=%s",
raw_id,
(title_row or "")[:80],
sev_row,
str(team_id) if team_id else "",
)
bus = getattr(request.app.state, "event_bus", None) bus = getattr(request.app.state, "event_bus", None)
if bus and raw_id is not None: if bus and raw_id is not None:
title = str(body.get("title") or body.get("ruleName") or "")[:500] title = str(body.get("title") or body.get("ruleName") or "")[:500]
@ -142,12 +215,17 @@ async def _grafana_webhook_impl(
payload=body, payload=body,
received_at=datetime.now(timezone.utc), received_at=datetime.now(timezone.utc),
) )
await bus.publish_alert_received( try:
alert, await bus.publish_alert_received(
raw_payload_ref=raw_id, alert,
grafana_org_slug=stored_org_slug, raw_payload_ref=raw_id,
service_name=service_name, grafana_org_slug=stored_org_slug,
) service_name=service_name,
)
except Exception:
logger.exception(
"ingress: событие alert.received не доставлено подписчикам (БД уже сохранена)"
)
return Response(status_code=202) return Response(status_code=202)

View File

@ -0,0 +1,53 @@
"""Извлечение полей для учёта алерта из тела вебхука Grafana (Unified Alerting)."""
from __future__ import annotations
import json
from typing import Any
def extract_alert_row_from_grafana_body(body: dict[str, Any]) -> tuple[str, str, dict[str, Any], str | None]:
"""
Возвращает: title, severity (info|warning|critical), labels (dict), fingerprint.
"""
title = str(body.get("title") or body.get("ruleName") or "").strip()[:500]
alerts = body.get("alerts")
labels: dict[str, Any] = {}
fingerprint: str | None = None
sev = "warning"
if isinstance(alerts, list) and alerts and isinstance(alerts[0], dict):
a0 = alerts[0]
fp = a0.get("fingerprint")
if fp is not None:
fingerprint = str(fp)[:500]
if isinstance(a0.get("labels"), dict):
labels.update(a0["labels"])
ann = a0.get("annotations")
if isinstance(ann, dict) and not title:
title = str(ann.get("summary") or ann.get("description") or "").strip()[:500]
cl = body.get("commonLabels")
if isinstance(cl, dict):
for k, v in cl.items():
labels.setdefault(k, v)
if not title and isinstance(alerts, list) and alerts and isinstance(alerts[0], dict):
title = str(alerts[0].get("labels", {}).get("alertname") or "").strip()[:500]
raw_s = None
if isinstance(labels.get("severity"), str):
raw_s = labels["severity"].lower()
elif isinstance(labels.get("priority"), str):
raw_s = labels["priority"].lower()
if raw_s in ("critical", "error", "fatal"):
sev = "critical"
elif raw_s in ("warning", "warn"):
sev = "warning"
elif raw_s in ("info", "informational", "none"):
sev = "info"
# JSONB: только JSON-совместимые значения
clean_labels = {str(k): v for k, v in labels.items() if isinstance(v, (str, int, float, bool, type(None)))}
return title, sev, clean_labels, fingerprint

View File

@ -0,0 +1,26 @@
"""Подготовка структур из JSON к записи в PostgreSQL jsonb."""
from __future__ import annotations
import math
from typing import Any
def sanitize_for_jsonb(obj: Any) -> Any:
"""
- float NaN / ±Inf → None (иначе json.dumps даёт невалидный JSON для PG / сюрпризы при записи).
- Символ NUL в строках убрать (PostgreSQL text/jsonb NUL в строке не принимает).
"""
if isinstance(obj, float):
if math.isnan(obj) or math.isinf(obj):
return None
return obj
if isinstance(obj, str):
if "\x00" not in obj:
return obj
return obj.replace("\x00", "")
if isinstance(obj, dict):
return {k: sanitize_for_jsonb(v) for k, v in obj.items()}
if isinstance(obj, list):
return [sanitize_for_jsonb(x) for x in obj]
return obj

View File

@ -0,0 +1,51 @@
"""Сопоставление входящего алерта с командой по правилам лейблов (как Team в Grafana IRM)."""
from __future__ import annotations
from typing import Any, Sequence
from uuid import UUID
import asyncpg
def match_team_for_labels(
labels: dict[str, Any],
rules: Sequence[asyncpg.Record | tuple[UUID, str, str]],
) -> UUID | None:
"""
rules — упорядочены по приоритету (выше priority — раньше проверка).
Первое совпадение label_key == label_value возвращает team_id.
"""
if not labels or not rules:
return None
flat: dict[str, str] = {
str(k): "" if v is None else str(v) for k, v in labels.items()
}
for row in rules:
if isinstance(row, tuple):
tid, key, val = row[0], row[1], row[2]
else:
tid = row["team_id"]
key = row["label_key"]
val = row["label_value"]
if flat.get(str(key)) == str(val):
return tid if isinstance(tid, UUID) else UUID(str(tid))
return None
async def fetch_team_rules(conn: asyncpg.Connection) -> list[asyncpg.Record]:
return await conn.fetch(
"""
SELECT team_id, label_key, label_value
FROM team_label_rules
ORDER BY priority DESC, id ASC
"""
)
async def resolve_team_id_for_labels(
conn: asyncpg.Connection,
labels: dict[str, Any],
) -> UUID | None:
rules = await fetch_team_rules(conn)
return match_team_for_labels(labels, list(rules))

119
onguard24/log_buffer.py Normal file
View File

@ -0,0 +1,119 @@
"""Кольцевой буфер логов + fan-out в SSE-очереди подписчиков.
Подключается через RingBufferHandler в main.py (install_log_handler()).
Потокобезопасен: emit() вызывается в любом потоке, asyncio-очереди
обновляются через loop.call_soon_threadsafe.
"""
from __future__ import annotations
import asyncio
import collections
import logging
import threading
from datetime import datetime, timezone
from typing import Any
MAX_HISTORY = 600
_lock = threading.Lock()
_ring: collections.deque[dict[str, Any]] = collections.deque(maxlen=MAX_HISTORY)
_subscribers: list[asyncio.Queue[dict[str, Any]]] = []
_loop: asyncio.AbstractEventLoop | None = None
def set_event_loop(loop: asyncio.AbstractEventLoop) -> None:
global _loop
_loop = loop
def get_history() -> list[dict[str, Any]]:
with _lock:
return list(_ring)
def subscribe() -> asyncio.Queue[dict[str, Any]]:
q: asyncio.Queue[dict[str, Any]] = asyncio.Queue(maxsize=300)
with _lock:
_subscribers.append(q)
return q
def unsubscribe(q: asyncio.Queue[dict[str, Any]]) -> None:
with _lock:
try:
_subscribers.remove(q)
except ValueError:
pass
def _push_to_subscriber(q: asyncio.Queue[dict[str, Any]], entry: dict[str, Any]) -> None:
try:
q.put_nowait(entry)
except asyncio.QueueFull:
pass
class RingBufferHandler(logging.Handler):
"""Logging handler — пишет в кольцевой буфер и раздаёт SSE-подписчикам."""
def emit(self, record: logging.LogRecord) -> None:
try:
msg = self.format(record)
ts = datetime.fromtimestamp(record.created, tz=timezone.utc).strftime(
"%Y-%m-%d %H:%M:%S"
)
entry: dict[str, Any] = {
"ts": ts,
"level": record.levelname,
"name": record.name,
"msg": msg,
}
with _lock:
_ring.append(entry)
subs = list(_subscribers)
if subs and _loop is not None and _loop.is_running():
for q in subs:
_loop.call_soon_threadsafe(_push_to_subscriber, q, entry)
except Exception:
self.handleError(record)
def install_log_handler(
loop: asyncio.AbstractEventLoop,
log_file: str = "",
) -> None:
"""Вызывается один раз при старте: регистрирует handler на корневом логгере."""
set_event_loop(loop)
fmt = logging.Formatter(
"%(asctime)s %(levelname)-8s %(name)s %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
root = logging.getLogger()
# Кольцевой буфер (SSE-страница логов)
if not any(isinstance(h, RingBufferHandler) for h in root.handlers):
ring_h = RingBufferHandler()
ring_h.setFormatter(logging.Formatter("%(name)s %(message)s"))
ring_h.setLevel(logging.DEBUG)
root.addHandler(ring_h)
# Файл с ротацией (если задан LOG_FILE)
if log_file.strip():
import os
from logging.handlers import RotatingFileHandler
log_path = log_file.strip()
os.makedirs(os.path.dirname(log_path) if os.path.dirname(log_path) else ".", exist_ok=True)
if not any(isinstance(h, RotatingFileHandler) for h in root.handlers):
file_h = RotatingFileHandler(
log_path,
maxBytes=10 * 1024 * 1024, # 10 МБ
backupCount=5,
encoding="utf-8",
)
file_h.setFormatter(fmt)
file_h.setLevel(logging.DEBUG)
root.addHandler(file_h)
logging.getLogger("onguard24").info(
"file logging enabled: %s (rotate 10MB×5)", log_path
)

View File

@ -1,3 +1,4 @@
import asyncio
import logging import logging
from contextlib import asynccontextmanager from contextlib import asynccontextmanager
@ -5,17 +6,27 @@ from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
from starlette.responses import HTMLResponse, Response from starlette.responses import HTMLResponse, Response
from onguard24 import __version__ as app_version
from onguard24.config import get_settings from onguard24.config import get_settings
from onguard24.db import create_pool from onguard24.db import create_pool
from onguard24.domain.events import InMemoryEventBus from onguard24.domain.events import InMemoryEventBus
from onguard24.ingress import grafana as grafana_ingress from onguard24.ingress import grafana as grafana_ingress
from onguard24.log_buffer import install_log_handler
from onguard24.modules.registry import MODULE_MOUNTS, register_module_events from onguard24.modules.registry import MODULE_MOUNTS, register_module_events
from onguard24.root_html import render_root_page from onguard24.root_html import render_root_page
from onguard24.status_snapshot import build as build_status from onguard24.status_snapshot import build as build_status
from onguard24 import __version__ as app_version from onguard24.ui_logs import router as logs_router
logging.basicConfig(level=logging.INFO) logging.basicConfig(level=logging.DEBUG)
logging.getLogger("httpx").setLevel(logging.WARNING) # Приглушаем шумные низкоуровневые библиотеки
for _noisy in (
"httpx",
"httpcore",
"asyncio",
"uvicorn.access",
"uvicorn.error",
):
logging.getLogger(_noisy).setLevel(logging.WARNING)
log = logging.getLogger("onguard24") log = logging.getLogger("onguard24")
@ -32,13 +43,14 @@ def parse_addr(http_addr: str) -> tuple[str, int]:
@asynccontextmanager @asynccontextmanager
async def lifespan(app: FastAPI): async def lifespan(app: FastAPI):
settings = get_settings() settings = get_settings()
install_log_handler(asyncio.get_event_loop(), log_file=settings.log_file)
pool = await create_pool(settings) pool = await create_pool(settings)
bus = InMemoryEventBus() bus = InMemoryEventBus()
register_module_events(bus, pool) register_module_events(bus, pool)
app.state.pool = pool app.state.pool = pool
app.state.settings = settings app.state.settings = settings
app.state.event_bus = bus app.state.event_bus = bus
log.info("onGuard24 started, db=%s", "ok" if pool else "disabled") log.info("onGuard24 started v%s, db=%s", app_version, "ok" if pool else "disabled")
yield yield
if pool: if pool:
await pool.close() await pool.close()
@ -82,6 +94,7 @@ def create_app() -> FastAPI:
return await build_status(request) return await build_status(request)
app.include_router(grafana_ingress.router, prefix="/api/v1") app.include_router(grafana_ingress.router, prefix="/api/v1")
app.include_router(logs_router)
for mount in MODULE_MOUNTS: for mount in MODULE_MOUNTS:
app.include_router(mount.router, prefix=mount.url_prefix) app.include_router(mount.router, prefix=mount.url_prefix)
if mount.ui_router is not None: if mount.ui_router is not None:

457
onguard24/modules/alerts.py Normal file
View File

@ -0,0 +1,457 @@
"""Учёт входящих алертов (отдельно от инцидентов): firing → acknowledged → resolved."""
from __future__ import annotations
import html
import json
import logging
from uuid import UUID
import asyncpg
from fastapi import APIRouter, Depends, HTTPException, Request
from fastapi.responses import HTMLResponse
from pydantic import BaseModel, Field
from onguard24.deps import get_pool
from onguard24.domain.events import EventBus
from onguard24.modules.ui_support import wrap_module_html_page
log = logging.getLogger(__name__)
router = APIRouter(tags=["module-alerts"])
ui_router = APIRouter(tags=["web-alerts"], include_in_schema=False)
_VALID_STATUS = frozenset({"firing", "acknowledged", "resolved", "silenced"})
def register_events(_bus: EventBus, _pool: asyncpg.Pool | None = None) -> None:
pass
class AckBody(BaseModel):
by_user: str | None = Field(default=None, max_length=200, description="Кто подтвердил")
class ResolveBody(BaseModel):
by_user: str | None = Field(default=None, max_length=200)
def _row_to_item(r: asyncpg.Record) -> dict:
tid = r.get("team_id")
out = {
"id": str(r["id"]),
"ingress_event_id": str(r["ingress_event_id"]),
"status": r["status"],
"title": r["title"],
"severity": r["severity"],
"source": r["source"],
"grafana_org_slug": r["grafana_org_slug"],
"service_name": r["service_name"],
"labels": r["labels"] if isinstance(r["labels"], dict) else {},
"fingerprint": r["fingerprint"],
"team_id": str(tid) if tid is not None else None,
"team_slug": r.get("team_slug"),
"team_name": r.get("team_name"),
"acknowledged_at": r["acknowledged_at"].isoformat() if r["acknowledged_at"] else None,
"acknowledged_by": r["acknowledged_by"],
"resolved_at": r["resolved_at"].isoformat() if r["resolved_at"] else None,
"resolved_by": r["resolved_by"],
"created_at": r["created_at"].isoformat() if r["created_at"] else None,
"updated_at": r["updated_at"].isoformat() if r["updated_at"] else None,
}
return out
_ALERT_SELECT = """
SELECT a.*, t.slug AS team_slug, t.name AS team_name
FROM irm_alerts a
LEFT JOIN teams t ON t.id = a.team_id
"""
async def _fetch_alert_with_team(conn: asyncpg.Connection, alert_id: UUID) -> asyncpg.Record | None:
return await conn.fetchrow(
f"{_ALERT_SELECT} WHERE a.id = $1::uuid",
alert_id,
)
@router.get("/")
async def list_alerts_api(
pool: asyncpg.Pool | None = Depends(get_pool),
status: str | None = None,
team_id: UUID | None = None,
limit: int = 100,
):
if pool is None:
return {"items": [], "database": "disabled"}
limit = min(max(limit, 1), 200)
st = (status or "").strip().lower()
if st and st not in _VALID_STATUS:
raise HTTPException(status_code=400, detail="invalid status filter")
async with pool.acquire() as conn:
if st and team_id is not None:
rows = await conn.fetch(
f"""
{_ALERT_SELECT}
WHERE a.status = $1 AND a.team_id = $2::uuid
ORDER BY a.created_at DESC LIMIT $3
""",
st,
team_id,
limit,
)
elif st:
rows = await conn.fetch(
f"""
{_ALERT_SELECT}
WHERE a.status = $1
ORDER BY a.created_at DESC LIMIT $2
""",
st,
limit,
)
elif team_id is not None:
rows = await conn.fetch(
f"""
{_ALERT_SELECT}
WHERE a.team_id = $1::uuid
ORDER BY a.created_at DESC LIMIT $2
""",
team_id,
limit,
)
else:
rows = await conn.fetch(
f"""
{_ALERT_SELECT}
ORDER BY a.created_at DESC LIMIT $1
""",
limit,
)
return {"items": [_row_to_item(r) for r in rows]}
@router.get("/{alert_id}")
async def get_alert_api(alert_id: UUID, pool: asyncpg.Pool | None = Depends(get_pool)):
if pool is None:
raise HTTPException(status_code=503, detail="database disabled")
async with pool.acquire() as conn:
row = await _fetch_alert_with_team(conn, alert_id)
raw = None
if row and row.get("ingress_event_id"):
raw = await conn.fetchrow(
"SELECT id, body, received_at FROM ingress_events WHERE id = $1::uuid",
row["ingress_event_id"],
)
if not row:
raise HTTPException(status_code=404, detail="not found")
out = _row_to_item(row)
if raw:
out["raw_received_at"] = raw["received_at"].isoformat() if raw["received_at"] else None
body = raw["body"]
out["raw_body"] = dict(body) if hasattr(body, "keys") else body
else:
out["raw_received_at"] = None
out["raw_body"] = None
return out
@router.patch("/{alert_id}/acknowledge", status_code=200)
async def acknowledge_alert_api(
alert_id: UUID,
body: AckBody,
pool: asyncpg.Pool | None = Depends(get_pool),
):
if pool is None:
raise HTTPException(status_code=503, detail="database disabled")
who = (body.by_user or "").strip() or None
async with pool.acquire() as conn:
uid = await conn.fetchval(
"""
UPDATE irm_alerts SET
status = 'acknowledged',
acknowledged_at = now(),
acknowledged_by = COALESCE($2, acknowledged_by),
updated_at = now()
WHERE id = $1::uuid AND status = 'firing'
RETURNING id
""",
alert_id,
who,
)
if not uid:
raise HTTPException(status_code=409, detail="alert not in firing state or not found")
row = await _fetch_alert_with_team(conn, alert_id)
assert row is not None
return _row_to_item(row)
@router.patch("/{alert_id}/resolve", status_code=200)
async def resolve_alert_api(
alert_id: UUID,
body: ResolveBody,
pool: asyncpg.Pool | None = Depends(get_pool),
):
if pool is None:
raise HTTPException(status_code=503, detail="database disabled")
who = (body.by_user or "").strip() or None
async with pool.acquire() as conn:
uid = await conn.fetchval(
"""
UPDATE irm_alerts SET
status = 'resolved',
resolved_at = now(),
resolved_by = COALESCE($2, resolved_by),
updated_at = now()
WHERE id = $1::uuid AND status IN ('firing', 'acknowledged')
RETURNING id
""",
alert_id,
who,
)
if not uid:
raise HTTPException(
status_code=409,
detail="alert cannot be resolved from current state or not found",
)
row = await _fetch_alert_with_team(conn, alert_id)
assert row is not None
return _row_to_item(row)
_SYNC_BTN_STYLE = """
<script>
function ogAck(aid){fetch('/api/v1/modules/alerts/'+aid+'/acknowledge',{method:'PATCH',headers:{'Content-Type':'application/json'},body:JSON.stringify({})}).then(r=>{if(r.ok)location.reload();else r.text().then(t=>alert('Ошибка '+r.status+': '+t.slice(0,200)));});}
function ogRes(aid){fetch('/api/v1/modules/alerts/'+aid+'/resolve',{method:'PATCH',headers:{'Content-Type':'application/json'},body:JSON.stringify({})}).then(r=>{if(r.ok)location.reload();else r.text().then(t=>alert('Ошибка '+r.status+': '+t.slice(0,200)));});}
function ogInc(aid,title){var t=prompt('Заголовок инцидента',title||'');if(t===null)return;fetch('/api/v1/modules/incidents/',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({title:t,alert_ids:[aid]})}).then(r=>{if(r.ok)r.json().then(j=>location.href='/ui/modules/incidents/'+j.id);else r.text().then(x=>alert('Ошибка '+r.status+': '+x.slice(0,200)));});}
</script>
"""
@ui_router.get("/", response_class=HTMLResponse)
async def alerts_ui_list(request: Request):
pool = get_pool(request)
body = ""
filter_tid: UUID | None = None
raw_team = (request.query_params.get("team_id") or "").strip()
if raw_team:
try:
filter_tid = UUID(raw_team)
except ValueError:
filter_tid = None
if pool is None:
body = "<p>База не настроена.</p>"
else:
try:
async with pool.acquire() as conn:
teams_opts = await conn.fetch(
"SELECT id, slug, name FROM teams ORDER BY name"
)
if filter_tid is not None:
rows = await conn.fetch(
"""
SELECT a.id, a.status, a.title, a.severity, a.grafana_org_slug,
a.service_name, a.created_at, a.fingerprint,
t.slug AS team_slug, t.name AS team_name, a.team_id
FROM irm_alerts a
LEFT JOIN teams t ON t.id = a.team_id
WHERE a.team_id = $1::uuid
ORDER BY a.created_at DESC
LIMIT 150
""",
filter_tid,
)
else:
rows = await conn.fetch(
"""
SELECT a.id, a.status, a.title, a.severity, a.grafana_org_slug,
a.service_name, a.created_at, a.fingerprint,
t.slug AS team_slug, t.name AS team_name, a.team_id
FROM irm_alerts a
LEFT JOIN teams t ON t.id = a.team_id
ORDER BY a.created_at DESC
LIMIT 150
"""
)
if not rows:
body = "<p>Пока нет алертов. События появляются после вебхука Grafana.</p>"
else:
trs = []
for r in rows:
aid = str(r["id"])
ts = r.get("team_slug")
tn = r.get("team_name")
tid = r.get("team_id")
team_cell = ""
if tid and ts:
team_cell = (
f"<a href=\"/ui/modules/teams/{html.escape(str(tid), quote=True)}\">"
f"{html.escape(ts)}</a>"
)
elif tn:
team_cell = html.escape(str(tn))
trs.append(
"<tr>"
f"<td>{html.escape(r['status'])}</td>"
f"<td><a href=\"/ui/modules/alerts/{html.escape(aid, quote=True)}\">"
f"{html.escape(aid[:8])}…</a></td>"
f"<td>{html.escape((r['title'] or '')[:200])}</td>"
f"<td>{html.escape(r['severity'])}</td>"
f"<td>{team_cell}</td>"
f"<td>{html.escape(str(r['grafana_org_slug'] or ''))}</td>"
f"<td>{html.escape(str(r['service_name'] or ''))}</td>"
f"<td>{html.escape(r['created_at'].isoformat() if r['created_at'] else '')}</td>"
"</tr>"
)
opts = ["<option value=''>Все команды</option>"]
for t in teams_opts:
tid = str(t["id"])
sel = " selected" if filter_tid and str(filter_tid) == tid else ""
opts.append(
f"<option value='{html.escape(tid, quote=True)}'{sel}>"
f"{html.escape(t['slug'])}{html.escape(t['name'])}</option>"
)
filter_form = (
"<form method='get' class='og-filter-bar' style='margin-bottom:1rem'>"
"<label>Команда <select name='team_id' onchange='this.form.submit()'>"
+ "".join(opts)
+ "</select></label></form>"
)
body = (
"<p class='gc-muted'>Алерт — запись о входящем уведомлении. "
"<strong>Инцидент</strong> создаётся вручную (из карточки алерта или раздела «Инциденты») "
"и может ссылаться на один или несколько алертов. Команда назначается по "
"<a href=\"/ui/modules/teams/\">правилам лейблов</a>.</p>"
+ filter_form
+ "<table class='irm-table'><thead><tr><th>Статус</th><th>ID</th><th>Заголовок</th>"
"<th>Важность</th><th>Команда</th><th>Grafana slug</th><th>Сервис</th><th>Создан</th></tr></thead><tbody>"
+ "".join(trs)
+ "</tbody></table>"
)
except Exception as e:
body = f"<p class='module-err'>{html.escape(str(e))}</p>"
page = f"<h1>Алерты</h1>{body}{_SYNC_BTN_STYLE}"
return HTMLResponse(
wrap_module_html_page(
document_title="Алерты — onGuard24",
current_slug="alerts",
main_inner_html=page,
)
)
@ui_router.get("/{alert_id:uuid}", response_class=HTMLResponse)
async def alerts_ui_detail(request: Request, alert_id: UUID):
pool = get_pool(request)
if pool is None:
return HTMLResponse(
wrap_module_html_page(
document_title="Алерт — onGuard24",
current_slug="alerts",
main_inner_html="<h1>Алерт</h1><p>База не настроена.</p>",
)
)
try:
async with pool.acquire() as conn:
row = await _fetch_alert_with_team(conn, alert_id)
raw = None
if row and row.get("ingress_event_id"):
raw = await conn.fetchrow(
"SELECT body, received_at FROM ingress_events WHERE id = $1::uuid",
row["ingress_event_id"],
)
except Exception as e:
return HTMLResponse(
wrap_module_html_page(
document_title="Алерт — onGuard24",
current_slug="alerts",
main_inner_html=f"<h1>Алерт</h1><p class='module-err'>{html.escape(str(e))}</p>",
)
)
if not row:
inner = "<p>Не найдено.</p>"
else:
aid = str(row["id"])
st = row["status"]
title_js = json.dumps(row["title"] or "")
btns = []
if st == "firing":
btns.append(
f"<button type='button' class='og-btn og-btn-primary' "
f"onclick=\"ogAck('{html.escape(aid, quote=True)}')\">Подтвердить (Ack)</button>"
)
if st in ("firing", "acknowledged"):
btns.append(
f"<button type='button' class='og-btn' "
f"onclick=\"ogRes('{html.escape(aid, quote=True)}')\">Resolve</button>"
)
btns.append(
f"<button type='button' class='og-btn' "
f"onclick=\"ogInc('{html.escape(aid, quote=True)}',{title_js})\">"
"Создать инцидент</button>"
)
lab = row["labels"]
lab_s = json.dumps(dict(lab), ensure_ascii=False, indent=2) if isinstance(lab, dict) else "{}"
raw_pre = ""
if raw:
b = raw["body"]
pretty = json.dumps(dict(b), ensure_ascii=False, indent=2) if hasattr(b, "keys") else str(b)
if len(pretty) > 14000:
pretty = pretty[:14000] + "\n"
raw_pre = (
"<h2 style='font-size:1.05rem;margin-top:1rem'>Полное тело вебхука</h2>"
f"<pre style='overflow:auto;max-height:26rem;font-size:0.78rem;"
f"background:#18181b;color:#e4e4e7;padding:0.75rem;border-radius:8px'>"
f"{html.escape(pretty)}</pre>"
)
team_dd = ""
if row.get("team_id") and row.get("team_slug"):
team_dd = (
f"<dt>Команда</dt><dd><a href=\"/ui/modules/teams/{html.escape(str(row['team_id']), quote=True)}\">"
f"{html.escape(row['team_slug'])}</a> ({html.escape(row.get('team_name') or '')})</dd>"
)
elif row.get("team_id"):
team_dd = f"<dt>Команда</dt><dd><code>{html.escape(str(row['team_id']))}</code></dd>"
inner = (
f"<p><a href=\"/ui/modules/alerts/\">← К списку алертов</a></p>"
f"<h1>Алерт</h1><div class='og-sync-bar'>{''.join(btns)}</div>"
f"<dl style='display:grid;grid-template-columns:11rem 1fr;gap:0.35rem 1rem;font-size:0.9rem'>"
f"<dt>ID</dt><dd><code>{html.escape(aid)}</code></dd>"
f"<dt>Статус</dt><dd>{html.escape(st)}</dd>"
f"<dt>Заголовок</dt><dd>{html.escape(row['title'] or '')}</dd>"
f"<dt>Важность</dt><dd>{html.escape(row['severity'])}</dd>"
f"<dt>Grafana slug</dt><dd>{html.escape(str(row['grafana_org_slug'] or ''))}</dd>"
f"<dt>Сервис</dt><dd>{html.escape(str(row['service_name'] or ''))}</dd>"
+ team_dd
+ f"<dt>Fingerprint</dt><dd><code>{html.escape(str(row['fingerprint'] or ''))}</code></dd>"
f"<dt>Labels</dt><dd><pre style='margin:0;font-size:0.8rem'>{html.escape(lab_s)}</pre></dd>"
f"</dl>{raw_pre}"
)
page = f"{inner}{_SYNC_BTN_STYLE}"
return HTMLResponse(
wrap_module_html_page(
document_title="Алерт — onGuard24",
current_slug="alerts",
main_inner_html=page,
)
)
async def render_home_fragment(request: Request) -> str:
pool = get_pool(request)
if pool is None:
return '<p class="module-note">Нужна БД для учёта алертов.</p>'
try:
async with pool.acquire() as conn:
n = await conn.fetchval("SELECT count(*)::int FROM irm_alerts")
nf = await conn.fetchval(
"SELECT count(*)::int FROM irm_alerts WHERE status = 'firing'"
)
except Exception:
return '<p class="module-note">Таблица алертов недоступна (миграция 005?).</p>'
return (
f'<div class="module-fragment"><p>Алертов в учёте: <strong>{int(n)}</strong> '
f'(<strong>{int(nf)}</strong> firing). '
f'<a href="/ui/modules/alerts/">Открыть</a></p></div>'
)

View File

@ -12,6 +12,7 @@ from fastapi import APIRouter, Depends, HTTPException, Request
from fastapi.responses import HTMLResponse from fastapi.responses import HTMLResponse
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from onguard24.config import get_settings
from onguard24.deps import get_pool from onguard24.deps import get_pool
from onguard24.domain.events import AlertReceived, DomainEvent, EventBus from onguard24.domain.events import AlertReceived, DomainEvent, EventBus
from onguard24.modules.ui_support import wrap_module_html_page from onguard24.modules.ui_support import wrap_module_html_page
@ -26,6 +27,7 @@ class IncidentCreate(BaseModel):
title: str = Field(..., min_length=1, max_length=500) title: str = Field(..., min_length=1, max_length=500)
status: str = Field(default="open", max_length=64) status: str = Field(default="open", max_length=64)
severity: str = Field(default="warning", max_length=32) severity: str = Field(default="warning", max_length=32)
alert_ids: list[UUID] = Field(default_factory=list, description="Привязка к irm_alerts")
class IncidentPatch(BaseModel): class IncidentPatch(BaseModel):
@ -39,6 +41,8 @@ def register_events(bus: EventBus, pool: asyncpg.Pool | None = None) -> None:
return return
async def on_alert(ev: DomainEvent) -> None: async def on_alert(ev: DomainEvent) -> None:
if not get_settings().auto_incident_from_alert:
return
if not isinstance(ev, AlertReceived) or ev.raw_payload_ref is None: if not isinstance(ev, AlertReceived) or ev.raw_payload_ref is None:
return return
a = ev.alert a = ev.alert
@ -136,17 +140,29 @@ async def create_incident_api(
if pool is None: if pool is None:
raise HTTPException(status_code=503, detail="database disabled") raise HTTPException(status_code=503, detail="database disabled")
async with pool.acquire() as conn: async with pool.acquire() as conn:
row = await conn.fetchrow( async with conn.transaction():
""" row = await conn.fetchrow(
INSERT INTO incidents (title, status, severity, source, grafana_org_slug, service_name) """
VALUES ($1, $2, $3, 'manual', NULL, NULL) INSERT INTO incidents (title, status, severity, source, grafana_org_slug, service_name)
RETURNING id, title, status, severity, source, ingress_event_id, created_at, updated_at, VALUES ($1, $2, $3, 'manual', NULL, NULL)
grafana_org_slug, service_name RETURNING id, title, status, severity, source, ingress_event_id, created_at, updated_at,
""", grafana_org_slug, service_name
body.title.strip(), """,
body.status, body.title.strip(),
body.severity, body.status,
) body.severity,
)
iid = row["id"]
for aid in body.alert_ids[:50]:
await conn.execute(
"""
INSERT INTO incident_alert_links (incident_id, alert_id)
VALUES ($1::uuid, $2::uuid)
ON CONFLICT DO NOTHING
""",
iid,
aid,
)
return { return {
"id": str(row["id"]), "id": str(row["id"]),
"title": row["title"], "title": row["title"],
@ -312,7 +328,7 @@ async def incidents_ui_home(request: Request):
<thead><tr><th>ID</th><th>Заголовок</th><th>Статус</th><th>Важность</th><th>Источник</th><th>Grafana slug</th><th>Сервис</th><th>Создан</th></tr></thead> <thead><tr><th>ID</th><th>Заголовок</th><th>Статус</th><th>Важность</th><th>Источник</th><th>Grafana slug</th><th>Сервис</th><th>Создан</th></tr></thead>
<tbody>{rows_html or '<tr><td colspan="8">Пока нет записей</td></tr>'}</tbody> <tbody>{rows_html or '<tr><td colspan="8">Пока нет записей</td></tr>'}</tbody>
</table> </table>
<p><small>Создание из Grafana: webhook → <code>ingress_events</code> → событие → строка здесь. Пустой заголовок бывает при тестовом JSON без полей алерта.</small></p>""" <p><small>Сначала вебхук создаёт <a href="/ui/modules/alerts/">алерт</a> (учёт, Ack/Resolve). Инцидент — отдельная сущность: создаётся вручную или из карточки алерта, к нему можно привязать один или несколько алертов. Пустой заголовок в списке — часто тестовый JSON без полей правила.</small></p>"""
return HTMLResponse( return HTMLResponse(
wrap_module_html_page( wrap_module_html_page(
document_title="Инциденты — onGuard24", document_title="Инциденты — onGuard24",

View File

@ -14,6 +14,7 @@ from starlette.requests import Request
from onguard24.domain.events import EventBus from onguard24.domain.events import EventBus
from onguard24.modules import ( from onguard24.modules import (
alerts,
contacts, contacts,
escalations, escalations,
grafana_catalog, grafana_catalog,
@ -21,6 +22,7 @@ from onguard24.modules import (
schedules, schedules,
statusboard, statusboard,
tasks, tasks,
teams,
) )
# async (Request) -> str — фрагмент HTML для главной страницы (опционально) # async (Request) -> str — фрагмент HTML для главной страницы (опционально)
@ -52,6 +54,24 @@ def _mounts() -> list[ModuleMount]:
ui_router=grafana_catalog.ui_router, ui_router=grafana_catalog.ui_router,
render_home_fragment=grafana_catalog.render_home_fragment, render_home_fragment=grafana_catalog.render_home_fragment,
), ),
ModuleMount(
router=alerts.router,
url_prefix="/api/v1/modules/alerts",
register_events=alerts.register_events,
slug="alerts",
title="Алерты",
ui_router=alerts.ui_router,
render_home_fragment=alerts.render_home_fragment,
),
ModuleMount(
router=teams.router,
url_prefix="/api/v1/modules/teams",
register_events=teams.register_events,
slug="teams",
title="Команды",
ui_router=teams.ui_router,
render_home_fragment=teams.render_home_fragment,
),
ModuleMount( ModuleMount(
router=incidents.router, router=incidents.router,
url_prefix="/api/v1/modules/incidents", url_prefix="/api/v1/modules/incidents",

378
onguard24/modules/teams.py Normal file
View File

@ -0,0 +1,378 @@
"""IRM: команды (teams) — как Team в Grafana IRM; правила сопоставления по лейблам алерта."""
from __future__ import annotations
import html
import re
from uuid import UUID
import asyncpg
from fastapi import APIRouter, Depends, HTTPException, Request
from fastapi.responses import HTMLResponse
from pydantic import BaseModel, Field
from onguard24.deps import get_pool
from onguard24.domain.events import EventBus
from onguard24.modules.ui_support import wrap_module_html_page
router = APIRouter(tags=["module-teams"])
ui_router = APIRouter(tags=["web-teams"], include_in_schema=False)
_SLUG_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,62}$")
def register_events(_bus: EventBus, _pool: asyncpg.Pool | None = None) -> None:
pass
def _normalize_slug(raw: str) -> str:
s = raw.strip().lower()
if not _SLUG_RE.match(s):
raise HTTPException(
status_code=400,
detail="slug: 163 символа, a-z 0-9 _ -, начинается с буквы или цифры",
)
return s
class TeamCreate(BaseModel):
slug: str = Field(..., min_length=1, max_length=63)
name: str = Field(..., min_length=1, max_length=200)
description: str | None = Field(default=None, max_length=2000)
class TeamPatch(BaseModel):
name: str | None = Field(default=None, min_length=1, max_length=200)
description: str | None = Field(default=None, max_length=2000)
class RuleCreate(BaseModel):
label_key: str = Field(..., min_length=1, max_length=500)
label_value: str = Field(..., min_length=1, max_length=2000)
priority: int = Field(default=0, ge=-1000, le=100000)
def _team_row(r: asyncpg.Record) -> dict:
return {
"id": str(r["id"]),
"slug": r["slug"],
"name": r["name"],
"description": r["description"],
"created_at": r["created_at"].isoformat() if r["created_at"] else None,
}
def _rule_row(r: asyncpg.Record) -> dict:
return {
"id": str(r["id"]),
"team_id": str(r["team_id"]),
"label_key": r["label_key"],
"label_value": r["label_value"],
"priority": int(r["priority"]),
"created_at": r["created_at"].isoformat() if r["created_at"] else None,
}
@router.get("/")
async def list_teams_api(pool: asyncpg.Pool | None = Depends(get_pool)):
if pool is None:
return {"items": [], "database": "disabled"}
async with pool.acquire() as conn:
rows = await conn.fetch(
"""
SELECT id, slug, name, description, created_at
FROM teams
ORDER BY name
"""
)
return {"items": [_team_row(r) for r in rows]}
@router.post("/", status_code=201)
async def create_team_api(body: TeamCreate, pool: asyncpg.Pool | None = Depends(get_pool)):
if pool is None:
raise HTTPException(status_code=503, detail="database disabled")
slug = _normalize_slug(body.slug)
async with pool.acquire() as conn:
try:
row = await conn.fetchrow(
"""
INSERT INTO teams (slug, name, description)
VALUES ($1, $2, $3)
RETURNING id, slug, name, description, created_at
""",
slug,
body.name.strip(),
(body.description or "").strip() or None,
)
except asyncpg.UniqueViolationError:
raise HTTPException(status_code=409, detail="team slug already exists") from None
return _team_row(row)
@router.get("/{team_id}")
async def get_team_api(team_id: UUID, pool: asyncpg.Pool | None = Depends(get_pool)):
if pool is None:
raise HTTPException(status_code=503, detail="database disabled")
async with pool.acquire() as conn:
row = await conn.fetchrow(
"""
SELECT id, slug, name, description, created_at
FROM teams WHERE id = $1::uuid
""",
team_id,
)
if not row:
raise HTTPException(status_code=404, detail="not found")
return _team_row(row)
@router.patch("/{team_id}", status_code=200)
async def patch_team_api(
team_id: UUID,
body: TeamPatch,
pool: asyncpg.Pool | None = Depends(get_pool),
):
if pool is None:
raise HTTPException(status_code=503, detail="database disabled")
name = body.name.strip() if body.name is not None else None
desc = body.description
if desc is not None:
desc = desc.strip() or None
async with pool.acquire() as conn:
row = await conn.fetchrow(
"""
UPDATE teams SET
name = COALESCE($2, name),
description = COALESCE($3, description)
WHERE id = $1::uuid
RETURNING id, slug, name, description, created_at
""",
team_id,
name,
desc,
)
if not row:
raise HTTPException(status_code=404, detail="not found")
return _team_row(row)
@router.delete("/{team_id}", status_code=204)
async def delete_team_api(team_id: UUID, pool: asyncpg.Pool | None = Depends(get_pool)):
if pool is None:
raise HTTPException(status_code=503, detail="database disabled")
async with pool.acquire() as conn:
r = await conn.execute("DELETE FROM teams WHERE id = $1::uuid", team_id)
if r == "DELETE 0":
raise HTTPException(status_code=404, detail="not found")
@router.get("/{team_id}/rules")
async def list_rules_api(team_id: UUID, pool: asyncpg.Pool | None = Depends(get_pool)):
if pool is None:
return {"items": [], "database": "disabled"}
async with pool.acquire() as conn:
ok = await conn.fetchval("SELECT 1 FROM teams WHERE id = $1::uuid", team_id)
if not ok:
raise HTTPException(status_code=404, detail="team not found")
rows = await conn.fetch(
"""
SELECT id, team_id, label_key, label_value, priority, created_at
FROM team_label_rules
WHERE team_id = $1::uuid
ORDER BY priority DESC, id ASC
""",
team_id,
)
return {"items": [_rule_row(r) for r in rows]}
@router.post("/{team_id}/rules", status_code=201)
async def create_rule_api(
team_id: UUID,
body: RuleCreate,
pool: asyncpg.Pool | None = Depends(get_pool),
):
if pool is None:
raise HTTPException(status_code=503, detail="database disabled")
async with pool.acquire() as conn:
ok = await conn.fetchval("SELECT 1 FROM teams WHERE id = $1::uuid", team_id)
if not ok:
raise HTTPException(status_code=404, detail="team not found")
row = await conn.fetchrow(
"""
INSERT INTO team_label_rules (team_id, label_key, label_value, priority)
VALUES ($1::uuid, $2, $3, $4)
RETURNING id, team_id, label_key, label_value, priority, created_at
""",
team_id,
body.label_key.strip(),
body.label_value.strip(),
body.priority,
)
return _rule_row(row)
@router.delete("/{team_id}/rules/{rule_id}", status_code=204)
async def delete_rule_api(
team_id: UUID,
rule_id: UUID,
pool: asyncpg.Pool | None = Depends(get_pool),
):
if pool is None:
raise HTTPException(status_code=503, detail="database disabled")
async with pool.acquire() as conn:
r = await conn.execute(
"""
DELETE FROM team_label_rules
WHERE id = $1::uuid AND team_id = $2::uuid
""",
rule_id,
team_id,
)
if r == "DELETE 0":
raise HTTPException(status_code=404, detail="not found")
@ui_router.get("/", response_class=HTMLResponse)
async def teams_ui_list(request: Request):
pool = get_pool(request)
inner = ""
if pool is None:
inner = "<p>База не настроена.</p>"
else:
try:
async with pool.acquire() as conn:
rows = await conn.fetch(
"""
SELECT t.id, t.slug, t.name,
(SELECT count(*)::int FROM team_label_rules r WHERE r.team_id = t.id) AS n_rules,
(SELECT count(*)::int FROM irm_alerts a WHERE a.team_id = t.id) AS n_alerts
FROM teams t
ORDER BY t.name
"""
)
if not rows:
inner = (
"<p>Команд пока нет. Создайте команду через API "
"<code>POST /api/v1/modules/teams/</code> и добавьте правила лейблов — "
"новые алерты из Grafana получат <code>team_id</code> при совпадении.</p>"
)
else:
trs = []
for r in rows:
tid = str(r["id"])
trs.append(
"<tr>"
f"<td><a href=\"/ui/modules/teams/{html.escape(tid, quote=True)}\">"
f"{html.escape(r['slug'])}</a></td>"
f"<td>{html.escape(r['name'])}</td>"
f"<td>{int(r['n_rules'])}</td>"
f"<td>{int(r['n_alerts'])}</td>"
"</tr>"
)
inner = (
"<p class='gc-muted'>Команда соответствует колонке <strong>Team</strong> в Grafana IRM. "
"Сопоставление: первое правило по приоритету, у которого совпали ключ и значение лейбла.</p>"
"<table class='irm-table'><thead><tr><th>Slug</th><th>Название</th>"
"<th>Правил</th><th>Алертов</th></tr></thead><tbody>"
+ "".join(trs)
+ "</tbody></table>"
)
except Exception as e:
inner = f"<p class='module-err'>{html.escape(str(e))}</p>"
page = f"<h1>Команды</h1>{inner}"
return HTMLResponse(
wrap_module_html_page(
document_title="Команды — onGuard24",
current_slug="teams",
main_inner_html=page,
)
)
@ui_router.get("/{team_id:uuid}", response_class=HTMLResponse)
async def teams_ui_detail(request: Request, team_id: UUID):
pool = get_pool(request)
if pool is None:
return HTMLResponse(
wrap_module_html_page(
document_title="Команда — onGuard24",
current_slug="teams",
main_inner_html="<h1>Команда</h1><p>База не настроена.</p>",
)
)
try:
async with pool.acquire() as conn:
team = await conn.fetchrow(
"SELECT id, slug, name, description FROM teams WHERE id = $1::uuid",
team_id,
)
rules = await conn.fetch(
"""
SELECT label_key, label_value, priority, id
FROM team_label_rules
WHERE team_id = $1::uuid
ORDER BY priority DESC, id ASC
""",
team_id,
)
except Exception as e:
return HTMLResponse(
wrap_module_html_page(
document_title="Команда — onGuard24",
current_slug="teams",
main_inner_html=f"<h1>Команда</h1><p class='module-err'>{html.escape(str(e))}</p>",
)
)
if not team:
inner = "<p>Не найдено.</p>"
else:
tid = str(team["id"])
rows_html = []
for ru in rules:
rows_html.append(
"<tr>"
f"<td><code>{html.escape(ru['label_key'])}</code></td>"
f"<td><code>{html.escape(ru['label_value'])}</code></td>"
f"<td>{int(ru['priority'])}</td>"
f"<td><code>{html.escape(str(ru['id']))}</code></td>"
"</tr>"
)
desc = html.escape(team["description"] or "")
inner = (
f"<p><a href=\"/ui/modules/teams/\">← К списку команд</a></p>"
f"<h1>{html.escape(team['name'])}</h1>"
f"<p><strong>slug:</strong> <code>{html.escape(team['slug'])}</code></p>"
f"<p><strong>Описание:</strong> {desc}</p>"
"<h2 style='font-size:1.05rem;margin-top:1rem'>Правила лейблов</h2>"
"<p class='gc-muted'>Пример: <code>team</code> = <code>infra</code> — как в ваших алертах Grafana.</p>"
"<table class='irm-table'><thead><tr><th>Ключ</th><th>Значение</th><th>Priority</th><th>ID правила</th></tr></thead><tbody>"
+ ("".join(rows_html) or "<tr><td colspan='4'>Правил нет — добавьте через API.</td></tr>")
+ "</tbody></table>"
f"<p style='margin-top:1rem;font-size:0.85rem'>API: "
f"<code>POST /api/v1/modules/teams/{tid}/rules</code> с JSON "
"<code>{\"label_key\":\"team\",\"label_value\":\"infra\",\"priority\":10}</code></p>"
)
return HTMLResponse(
wrap_module_html_page(
document_title="Команда — onGuard24",
current_slug="teams",
main_inner_html=inner,
)
)
async def render_home_fragment(request: Request) -> str:
pool = get_pool(request)
if pool is None:
return '<p class="module-note">Нужна БД для команд.</p>'
try:
async with pool.acquire() as conn:
n = await conn.fetchval("SELECT count(*)::int FROM teams")
except Exception:
return '<p class="module-note">Таблица teams недоступна (миграция 006?).</p>'
return (
f'<div class="module-fragment"><p>Команд: <strong>{int(n)}</strong>. '
f'<a href="/ui/modules/teams/">Открыть</a></p></div>'
)

View File

@ -46,6 +46,7 @@ APP_SHELL_CSS = """
.gc-subtable th, .gc-subtable td { border: 1px solid #e4e4e7; padding: 0.3rem 0.45rem; } .gc-subtable th, .gc-subtable td { border: 1px solid #e4e4e7; padding: 0.3rem 0.45rem; }
.gc-subtable th { background: #fafafa; } .gc-subtable th { background: #fafafa; }
.gc-orphan { margin-top: 1rem; padding: 0.75rem; background: #fffbeb; border: 1px solid #fcd34d; border-radius: 8px; font-size: 0.88rem; } .gc-orphan { margin-top: 1rem; padding: 0.75rem; background: #fffbeb; border: 1px solid #fcd34d; border-radius: 8px; font-size: 0.88rem; }
.rail-item--util { border-top: 1px solid #e4e4e7; margin-top: 0.4rem; padding-top: 0.4rem; }
""" """
@ -72,6 +73,11 @@ def nav_rail_html(current_slug: str | None = None) -> str:
items.append( items.append(
f'<li class="{licls}"><a href="{html.escape(href)}"{cur}>{html.escape(m.title)}</a></li>' f'<li class="{licls}"><a href="{html.escape(href)}"{cur}>{html.escape(m.title)}</a></li>'
) )
logs_active = current_slug == "__logs__"
items.append(
'<li class="rail-item rail-item--util' + (" is-active" if logs_active else "") + '">'
'<a href="/ui/logs"' + (' aria-current="page"' if logs_active else "") + ">📋 Логи</a></li>"
)
lis = "".join(items) lis = "".join(items)
return ( return (
'<aside class="app-rail" role="navigation" aria-label="Разделы приложения">' '<aside class="app-rail" role="navigation" aria-label="Разделы приложения">'

View File

@ -128,6 +128,7 @@ async def render_root_page(request: Request) -> str:
<a href="/openapi.json">OpenAPI</a> <a href="/openapi.json">OpenAPI</a>
<a href="/health">/health</a> <a href="/health">/health</a>
<a href="/api/v1/status">JSON статус</a> <a href="/api/v1/status">JSON статус</a>
<a href="/ui/logs" style="font-weight:600">📋 Логи</a>
</p> </p>
<h2>Проверки доступа</h2> <h2>Проверки доступа</h2>
<table> <table>

234
onguard24/ui_logs.py Normal file
View File

@ -0,0 +1,234 @@
"""Страница просмотра логов в реальном времени (SSE).
Маршруты:
GET /ui/logs — HTML-страница с историей + EventSource
GET /ui/logs/stream — Server-Sent Events (text/event-stream)
"""
from __future__ import annotations
import asyncio
import html as _html
import json
from collections.abc import AsyncGenerator
from fastapi import APIRouter
from starlette.requests import Request
from starlette.responses import HTMLResponse, StreamingResponse
from onguard24 import log_buffer
from onguard24.modules.ui_support import APP_SHELL_CSS, nav_rail_html
router = APIRouter(include_in_schema=False, tags=["web-logs"])
_LEVEL_COLOR: dict[str, str] = {
"DEBUG": "#71717a",
"INFO": "#a3e635",
"WARNING": "#fbbf24",
"ERROR": "#f87171",
"CRITICAL": "#ef4444",
}
_LOG_CSS = """
.log-wrap { background:#0f0f10; border-radius:10px; padding:0.75rem; min-height:20rem;
max-height:75vh; overflow-y:auto; font-family:monospace; font-size:0.8rem;
line-height:1.55; color:#e4e4e7; }
.log-line { display:flex; gap:0.5rem; border-bottom:1px solid #1e1e21; padding:0.12rem 0; }
.log-line:last-child { border-bottom: none; }
.log-ts { color:#52525b; flex-shrink:0; }
.log-lv { flex-shrink:0; width:5.5rem; font-weight:600; }
.log-name { flex-shrink:0; width:16rem; overflow:hidden; text-overflow:ellipsis;
white-space:nowrap; color:#a1a1aa; }
.log-msg { flex:1; word-break:break-all; white-space:pre-wrap; color:#e4e4e7; }
.lv-DEBUG { color:#71717a; }
.lv-INFO { color:#a3e635; }
.lv-WARNING { color:#fbbf24; }
.lv-ERROR { color:#f87171; }
.lv-CRITICAL { color:#ef4444; background:#3f0000; border-radius:3px; padding:0 2px; }
.log-controls { display:flex; gap:0.75rem; align-items:center; margin-bottom:0.75rem; }
.log-controls label { font-size:0.85rem; color:#52525b; display:flex; align-items:center; gap:0.3rem; }
.badge-live { display:inline-block; width:8px; height:8px; border-radius:50%;
background:#a3e635; box-shadow:0 0 6px #a3e635; animation: pulse 1.6s infinite; }
.badge-live.disconnected { background:#f87171; box-shadow:0 0 6px #f87171; animation:none; }
@keyframes pulse { 0%,100%{opacity:1} 50%{opacity:.4} }
#status-bar { font-size:0.8rem; color:#52525b; margin-bottom:0.5rem; }
"""
_LOG_JS = """
<script>
(function(){
const wrap = document.getElementById('log-wrap');
const cntEl = document.getElementById('log-count');
const statusEl = document.getElementById('status-bar');
const dot = document.getElementById('live-dot');
const autoCheck = document.getElementById('auto-scroll');
const levelFilter = document.getElementById('level-filter');
let count = parseInt(cntEl.textContent || '0', 10);
const LEVEL_COLOR = {
DEBUG:'#71717a', INFO:'#a3e635', WARNING:'#fbbf24', ERROR:'#f87171', CRITICAL:'#ef4444'
};
function makeRow(d) {
const row = document.createElement('div');
row.className = 'log-line';
row.dataset.level = d.level;
const lvl = d.level || 'INFO';
const col = LEVEL_COLOR[lvl] || '#e4e4e7';
row.innerHTML =
'<span class="log-ts">' + esc(d.ts || '') + '</span>' +
'<span class="log-lv lv-' + lvl + '">' + esc(lvl) + '</span>' +
'<span class="log-name">' + esc((d.name||'').slice(0,36)) + '</span>' +
'<span class="log-msg">' + esc(d.msg||'') + '</span>';
return row;
}
function esc(s){ const d=document.createElement('div');d.textContent=s;return d.innerHTML; }
function applyFilter() {
const lv = levelFilter.value;
const ORDER = ['DEBUG','INFO','WARNING','ERROR','CRITICAL'];
const minIdx = lv ? ORDER.indexOf(lv) : 0;
wrap.querySelectorAll('.log-line').forEach(function(el){
const idx = ORDER.indexOf(el.dataset.level);
el.style.display = (idx >= minIdx) ? '' : 'none';
});
}
levelFilter.addEventListener('change', applyFilter);
applyFilter();
function scrollBottom() {
if (autoCheck.checked) wrap.scrollTop = wrap.scrollHeight;
}
scrollBottom();
const src = new EventSource('/ui/logs/stream');
src.onopen = function(){
dot.className = 'badge-live';
statusEl.textContent = 'Live — соединение установлено';
};
src.onmessage = function(e){
try {
const d = JSON.parse(e.data);
const row = makeRow(d);
wrap.appendChild(row);
count++;
cntEl.textContent = count;
// keep DOM manageable: trim oldest
while (wrap.children.length > 1000) wrap.removeChild(wrap.firstChild);
const lv = levelFilter.value;
if (lv) {
const ORDER = ['DEBUG','INFO','WARNING','ERROR','CRITICAL'];
if (ORDER.indexOf(d.level) < ORDER.indexOf(lv)) row.style.display='none';
}
scrollBottom();
} catch(ex){}
};
src.onerror = function(){
dot.className = 'badge-live disconnected';
statusEl.textContent = 'Соединение потеряно — попытка переподключения…';
};
})();
</script>
"""
def _line_html(entry: dict) -> str:
ts = _html.escape(entry.get("ts", ""))
lvl = entry.get("level", "INFO")
name = _html.escape((entry.get("name") or "")[:36])
msg = _html.escape(entry.get("msg") or "")
return (
f'<div class="log-line" data-level="{_html.escape(lvl)}">'
f'<span class="log-ts">{ts}</span>'
f'<span class="log-lv lv-{_html.escape(lvl)}">{_html.escape(lvl)}</span>'
f'<span class="log-name">{name}</span>'
f'<span class="log-msg">{msg}</span>'
f"</div>"
)
@router.get("/ui/logs", response_class=HTMLResponse)
async def logs_page(request: Request) -> HTMLResponse:
history = log_buffer.get_history()
lines_html = "\n".join(_line_html(e) for e in history)
count = len(history)
rail = nav_rail_html("__logs__")
page = f"""<!DOCTYPE html>
<html lang="ru">
<head>
<meta charset="utf-8"/>
<meta name="viewport" content="width=device-width,initial-scale=1"/>
<title>Логи — onGuard24</title>
<style>
{APP_SHELL_CSS}
{_LOG_CSS}
</style>
</head>
<body>
<div class="app-shell">
<main class="app-main module-page-main">
<h1>Логи приложения</h1>
<div class="log-controls">
<span><span id="live-dot" class="badge-live"></span> real-time</span>
<span>Записей: <strong id="log-count">{count}</strong></span>
<label><input type="checkbox" id="auto-scroll" checked> авто-прокрутка</label>
<label>
Уровень:
<select id="level-filter">
<option value="">Все</option>
<option value="DEBUG">DEBUG+</option>
<option value="INFO">INFO+</option>
<option value="WARNING">WARNING+</option>
<option value="ERROR">ERROR+</option>
<option value="CRITICAL">CRITICAL</option>
</select>
</label>
<a href="/ui/logs" class="og-btn" style="text-decoration:none;padding:0.3rem 0.7rem;font-size:0.8rem">Обновить</a>
</div>
<div id="status-bar">Подключаемся к потоку…</div>
<div class="log-wrap" id="log-wrap">
{lines_html}
</div>
</main>
{rail}
</div>
{_LOG_JS}
</body>
</html>"""
return HTMLResponse(page)
@router.get("/ui/logs/stream")
async def logs_stream(request: Request) -> StreamingResponse:
q = log_buffer.subscribe()
async def generator() -> AsyncGenerator[bytes, None]:
try:
while True:
if await request.is_disconnected():
break
try:
entry = await asyncio.wait_for(q.get(), timeout=20.0)
payload = json.dumps(entry, ensure_ascii=False)
yield f"data: {payload}\n\n".encode()
except asyncio.TimeoutError:
yield b": keepalive\n\n"
finally:
log_buffer.unsubscribe(q)
return StreamingResponse(
generator(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
"Connection": "keep-alive",
},
)

View File

@ -1,6 +1,6 @@
[project] [project]
name = "onguard24" name = "onguard24"
version = "1.8.0" version = "1.10.1"
description = "onGuard24 — модульный сервис (аналог IRM)" description = "onGuard24 — модульный сервис (аналог IRM)"
readme = "README.md" readme = "README.md"
requires-python = ">=3.11" requires-python = ">=3.11"

View File

@ -25,15 +25,28 @@ class Row:
return self._data.get(key, default) return self._data.get(key, default)
class _FakeTxn:
async def __aenter__(self) -> None:
return None
async def __aexit__(self, *args: Any) -> None:
return None
class IrmFakeConn: class IrmFakeConn:
def __init__(self, store: IrmFakeStore) -> None: def __init__(self, store: IrmFakeStore) -> None:
self.store = store self.store = store
def transaction(self) -> _FakeTxn:
return _FakeTxn()
def _q(self, query: str) -> str: def _q(self, query: str) -> str:
return " ".join(query.split()) return " ".join(query.split())
async def execute(self, query: str, *args: Any) -> str: async def execute(self, query: str, *args: Any) -> str:
q = self._q(query) q = self._q(query)
if "INSERT INTO incident_alert_links" in q:
return "INSERT 0 1"
if "INSERT INTO incidents" in q and "ingress_event_id" in q: if "INSERT INTO incidents" in q and "ingress_event_id" in q:
self.store.insert_incident_alert( self.store.insert_incident_alert(
args[0], args[1], args[2], args[3], args[4] args[0], args[1], args[2], args[3], args[4]

9
tests/test_alerts_api.py Normal file
View File

@ -0,0 +1,9 @@
"""API модуля алертов без БД."""
from fastapi.testclient import TestClient
def test_alerts_list_no_db(client: TestClient) -> None:
r = client.get("/api/v1/modules/alerts/")
assert r.status_code == 200
assert r.json() == {"items": [], "database": "disabled"}

View File

@ -0,0 +1,26 @@
"""Парсинг полей из тела вебхука Grafana."""
from onguard24.ingress.grafana_payload import extract_alert_row_from_grafana_body
def test_extract_title_and_severity_from_unified() -> None:
body = {
"title": "RuleName",
"alerts": [
{
"labels": {"severity": "critical", "alertname": "X"},
"fingerprint": "abc",
}
],
}
title, sev, labels, fp = extract_alert_row_from_grafana_body(body)
assert title == "RuleName"
assert sev == "critical"
assert labels.get("alertname") == "X"
assert fp == "abc"
def test_extract_empty_title_uses_alertname() -> None:
body = {"alerts": [{"labels": {"alertname": "HostDown"}}]}
title, _, _, _ = extract_alert_row_from_grafana_body(body)
assert title == "HostDown"

View File

@ -1,9 +1,24 @@
import json import json
from unittest.mock import AsyncMock, MagicMock, patch from unittest.mock import AsyncMock, MagicMock, patch
from fastapi.testclient import TestClient from fastapi.testclient import TestClient
def _webhook_mock_pool(mock_conn: AsyncMock) -> MagicMock:
"""Пул с транзакцией и execute — как после вставки ingress + irm_alerts."""
tx = AsyncMock()
tx.__aenter__ = AsyncMock(return_value=None)
tx.__aexit__ = AsyncMock(return_value=None)
mock_conn.transaction = MagicMock(return_value=tx)
mock_conn.fetch = AsyncMock(return_value=[])
mock_conn.execute = AsyncMock()
mock_cm = AsyncMock()
mock_cm.__aenter__ = AsyncMock(return_value=mock_conn)
mock_cm.__aexit__ = AsyncMock(return_value=None)
mock_pool = MagicMock()
mock_pool.acquire = MagicMock(return_value=mock_cm)
return mock_pool
def test_grafana_webhook_no_db(client: TestClient) -> None: def test_grafana_webhook_no_db(client: TestClient) -> None:
"""Без пула БД — 202, запись не падает.""" """Без пула БД — 202, запись не падает."""
r = client.post( r = client.post(
@ -41,12 +56,7 @@ def test_grafana_webhook_inserts_with_mock_pool(client: TestClient) -> None:
mock_conn = AsyncMock() mock_conn = AsyncMock()
uid = uuid4() uid = uuid4()
mock_conn.fetchrow = AsyncMock(return_value={"id": uid}) mock_conn.fetchrow = AsyncMock(return_value={"id": uid})
mock_cm = AsyncMock() mock_pool = _webhook_mock_pool(mock_conn)
mock_cm.__aenter__ = AsyncMock(return_value=mock_conn)
mock_cm.__aexit__ = AsyncMock(return_value=None)
mock_pool = MagicMock()
mock_pool.acquire = MagicMock(return_value=mock_cm)
app = client.app app = client.app
real_pool = app.state.pool real_pool = app.state.pool
@ -59,6 +69,7 @@ def test_grafana_webhook_inserts_with_mock_pool(client: TestClient) -> None:
) )
assert r.status_code == 202 assert r.status_code == 202
mock_conn.fetchrow.assert_called_once() mock_conn.fetchrow.assert_called_once()
mock_conn.execute.assert_called_once()
finally: finally:
app.state.pool = real_pool app.state.pool = real_pool
@ -69,11 +80,7 @@ def test_grafana_webhook_auto_org_from_external_url(client: TestClient) -> None:
mock_conn = AsyncMock() mock_conn = AsyncMock()
uid = uuid4() uid = uuid4()
mock_conn.fetchrow = AsyncMock(return_value={"id": uid}) mock_conn.fetchrow = AsyncMock(return_value={"id": uid})
mock_cm = AsyncMock() mock_pool = _webhook_mock_pool(mock_conn)
mock_cm.__aenter__ = AsyncMock(return_value=mock_conn)
mock_cm.__aexit__ = AsyncMock(return_value=None)
mock_pool = MagicMock()
mock_pool.acquire = MagicMock(return_value=mock_cm)
app = client.app app = client.app
real_pool = app.state.pool real_pool = app.state.pool
@ -99,11 +106,7 @@ def test_grafana_webhook_publishes_alert_received(client: TestClient) -> None:
mock_conn = AsyncMock() mock_conn = AsyncMock()
uid = uuid4() uid = uuid4()
mock_conn.fetchrow = AsyncMock(return_value={"id": uid}) mock_conn.fetchrow = AsyncMock(return_value={"id": uid})
mock_cm = AsyncMock() mock_pool = _webhook_mock_pool(mock_conn)
mock_cm.__aenter__ = AsyncMock(return_value=mock_conn)
mock_cm.__aexit__ = AsyncMock(return_value=None)
mock_pool = MagicMock()
mock_pool.acquire = MagicMock(return_value=mock_cm)
app = client.app app = client.app
bus = app.state.event_bus bus = app.state.event_bus
@ -130,11 +133,7 @@ def test_grafana_webhook_org_any_slug_without_json_config(client: TestClient) ->
mock_conn = AsyncMock() mock_conn = AsyncMock()
uid = uuid4() uid = uuid4()
mock_conn.fetchrow = AsyncMock(return_value={"id": uid}) mock_conn.fetchrow = AsyncMock(return_value={"id": uid})
mock_cm = AsyncMock() mock_pool = _webhook_mock_pool(mock_conn)
mock_cm.__aenter__ = AsyncMock(return_value=mock_conn)
mock_cm.__aexit__ = AsyncMock(return_value=None)
mock_pool = MagicMock()
mock_pool.acquire = MagicMock(return_value=mock_cm)
app = client.app app = client.app
real_pool = app.state.pool real_pool = app.state.pool
@ -157,11 +156,7 @@ def test_grafana_webhook_org_ok(client: TestClient) -> None:
mock_conn = AsyncMock() mock_conn = AsyncMock()
uid = uuid4() uid = uuid4()
mock_conn.fetchrow = AsyncMock(return_value={"id": uid}) mock_conn.fetchrow = AsyncMock(return_value={"id": uid})
mock_cm = AsyncMock() mock_pool = _webhook_mock_pool(mock_conn)
mock_cm.__aenter__ = AsyncMock(return_value=mock_conn)
mock_cm.__aexit__ = AsyncMock(return_value=None)
mock_pool = MagicMock()
mock_pool.acquire = MagicMock(return_value=mock_cm)
app = client.app app = client.app
real_json = app.state.settings.grafana_sources_json real_json = app.state.settings.grafana_sources_json

View File

@ -29,8 +29,41 @@ def test_escalations_api_list_no_db(client: TestClient) -> None:
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_incident_inserted_on_alert_received() -> None: async def test_incident_not_created_from_alert_by_default() -> None:
"""При пуле БД подписка создаёт инцидент (INSERT).""" """По умолчанию AUTO_INCIDENT_FROM_ALERT выкл — инцидент из вебхука не создаётся."""
calls: list = []
async def fake_execute(_query, *args):
calls.append(args)
return "INSERT 0 1"
mock_conn = AsyncMock()
mock_conn.execute = fake_execute
mock_cm = AsyncMock()
mock_cm.__aenter__ = AsyncMock(return_value=mock_conn)
mock_cm.__aexit__ = AsyncMock(return_value=None)
mock_pool = MagicMock()
mock_pool.acquire = MagicMock(return_value=mock_cm)
from onguard24.domain.events import InMemoryEventBus
from onguard24.modules import incidents as inc_mod
bus = InMemoryEventBus()
inc_mod.register_events(bus, mock_pool)
uid = uuid4()
ev = AlertReceived(
alert=Alert(source="grafana", title="CPU high", severity=Severity.WARNING),
raw_payload_ref=uid,
)
await bus.publish(ev)
assert calls == []
@pytest.mark.asyncio
async def test_incident_inserted_on_alert_when_auto_enabled(monkeypatch: pytest.MonkeyPatch) -> None:
"""При AUTO_INCIDENT_FROM_ALERT=1 подписка снова создаёт инцидент (legacy)."""
monkeypatch.setenv("AUTO_INCIDENT_FROM_ALERT", "1")
inserted: dict = {} inserted: dict = {}
async def fake_execute(_query, *args): async def fake_execute(_query, *args):

View File

@ -0,0 +1,26 @@
import json
import math
from onguard24.ingress.json_sanitize import sanitize_for_jsonb
def test_sanitize_nan_inf_to_none() -> None:
raw = json.loads('{"a": NaN, "b": Infinity, "c": -Infinity, "d": 1.5}')
out = sanitize_for_jsonb(raw)
assert math.isnan(raw["a"])
assert out["a"] is None
assert out["b"] is None
assert out["c"] is None
assert out["d"] == 1.5
def test_sanitize_strips_nul_in_strings() -> None:
assert sanitize_for_jsonb({"x": "a\x00b"}) == {"x": "ab"}
def test_dumps_after_sanitize_is_valid_json() -> None:
raw = json.loads('{"v": NaN}')
clean = sanitize_for_jsonb(raw)
s = json.dumps(clean, allow_nan=False)
assert "NaN" not in s
assert json.loads(s)["v"] is None

45
tests/test_log_buffer.py Normal file
View File

@ -0,0 +1,45 @@
"""Кольцевой буфер логов и SSE-страница."""
import logging
import pytest
from fastapi.testclient import TestClient
from onguard24 import log_buffer
def test_ring_buffer_captures_log_records() -> None:
log_buffer._ring.clear()
handler = log_buffer.RingBufferHandler()
handler.setFormatter(logging.Formatter("%(name)s %(message)s"))
logger = logging.getLogger("test.ring")
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)
try:
logger.info("hello ring")
history = log_buffer.get_history()
assert any("hello ring" in e["msg"] for e in history)
finally:
logger.removeHandler(handler)
log_buffer._ring.clear()
def test_logs_page_returns_html(client: TestClient) -> None:
r = client.get("/ui/logs")
assert r.status_code == 200
assert "text/html" in r.headers.get("content-type", "")
assert "Логи" in r.text
assert "log-wrap" in r.text
assert "EventSource" in r.text or "event-stream" in r.text or "ui/logs/stream" in r.text
def test_logs_page_in_nav_rail(client: TestClient) -> None:
r = client.get("/ui/logs")
assert r.status_code == 200
assert "/ui/logs" in r.text
def test_root_has_logs_link(client: TestClient) -> None:
r = client.get("/")
assert r.status_code == 200
assert "/ui/logs" in r.text

View File

@ -33,6 +33,8 @@ def test_rail_lists_all_registered_ui_modules(client: TestClient) -> None:
t = r.text t = r.text
expected = ( expected = (
("grafana-catalog", "Каталог Grafana"), ("grafana-catalog", "Каталог Grafana"),
("alerts", "Алерты"),
("teams", "Команды"),
("incidents", "Инциденты"), ("incidents", "Инциденты"),
("tasks", "Задачи"), ("tasks", "Задачи"),
("escalations", "Эскалации"), ("escalations", "Эскалации"),
@ -49,6 +51,8 @@ def test_each_module_page_single_active_nav_item(client: TestClient) -> None:
"""На странице модуля ровно один пункт с aria-current (текущий раздел).""" """На странице модуля ровно один пункт с aria-current (текущий раздел)."""
for slug in ( for slug in (
"grafana-catalog", "grafana-catalog",
"alerts",
"teams",
"incidents", "incidents",
"tasks", "tasks",
"escalations", "escalations",

38
tests/test_team_match.py Normal file
View File

@ -0,0 +1,38 @@
"""Сопоставление лейблов с командой (без БД)."""
from uuid import UUID, uuid4
from onguard24.ingress.team_match import match_team_for_labels
def test_match_returns_none_when_empty() -> None:
assert match_team_for_labels({}, []) is None
assert match_team_for_labels({"a": "b"}, []) is None
def test_match_first_rule_wins_in_order() -> None:
u_infra = uuid4()
u_other = uuid4()
labels = {"team": "infra", "env": "prod"}
rules: list[tuple[UUID, str, str]] = [
(u_infra, "team", "infra"),
(u_other, "env", "prod"),
]
assert match_team_for_labels(labels, rules) == u_infra
def test_match_skips_until_value_matches() -> None:
u = uuid4()
labels = {"x": "1"}
rules: list[tuple[UUID, str, str]] = [
(uuid4(), "x", "2"),
(u, "x", "1"),
]
assert match_team_for_labels(labels, rules) == u
def test_match_coerces_label_values_to_str() -> None:
u = uuid4()
labels = {"port": 8080}
rules: list[tuple[UUID, str, str]] = [(u, "port", "8080")]
assert match_team_for_labels(labels, rules) == u

9
tests/test_teams_api.py Normal file
View File

@ -0,0 +1,9 @@
"""API модуля команд без БД."""
from fastapi.testclient import TestClient
def test_teams_list_no_db(client: TestClient) -> None:
r = client.get("/api/v1/modules/teams/")
assert r.status_code == 200
assert r.json() == {"items": [], "database": "disabled"}