Commit graph

6 commits

Author SHA1 Message Date
DannyDannyDanny
2e9441f367 Retire dotfiles-rebuild, switch to dm-pull-deploy push timer
- Drop modules/dotfiles-rebuild.nix and its imports in clan.nix;
  sunken-ship + phantom-ship no longer ship the legacy 15-min
  rebuild-from-git timer.
- Add dm-pull-deploy-push systemd timer on sunken-ship: every 15min
  runs dm-send-deploy to announce origin/main rev via data-mesher
  gossip (sunken is the dm-pull-deploy push node).
- Fix mulbo-pull service path: add openssh so 'git fetch' over an
  SSH remote stops failing with 'cannot run ssh'.
- vps-relay authorized_keys: rename Mac key comment to mac-admin,
  add sunken-ship's actual ed25519 key for ZT mesh debugging.
- home.nix: add cinny-desktop (Matrix client).
- neovim: enable cursorline.
2026-05-20 19:31:22 +02:00
DannyDannyDanny
e8158e6c0f monitoring: fix prometheus → alertmanager loopback (IPv4 vs IPv6)
Alertmanager binds [::1]:9093 but Prometheus was dialing
127.0.0.1:9093 — connection refused, so alerts fired internally
but never reached Alertmanager. Switch the target to [::1]:9093
to match the bind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 16:47:37 +02:00
DannyDannyDanny
dc7895e3b2 monitoring: bracket IPv6 listenAddress for node_exporter
The NixOS module concatenates listenAddress and port as `${a}:${p}`,
so "::" became ":::9100" and node_exporter rejected it ("too many
colons in address"). Use "[::]" so the result is "[::]:9100".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 16:17:28 +02:00
DannyDannyDanny
3b6f4545b4 monitoring: prometheus + alertmanager + grafana on sunken-ship
node_exporter on all three hosts (port 9100, ZT-only). Prometheus
server scrapes via the clan ZT IPv6s. Alertmanager routes alerts to
@HarakatBot (chat 66070351); critical repeats every 1h, others 4h.
Starter rule: HostDown when up==0 for 5m. Grafana on :3000 over ZT,
provisioned with the local Prometheus as default datasource.

Manual secrets on sunken-ship: /etc/alertmanager/telegram-token and
/etc/grafana/secret-key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 16:12:08 +02:00
DannyDannyDanny
771cc58076 feat: vps fail2ban + shared server-debug-tools module 🛡️
VPS public SSH: enable fail2ban with bantime-increment so brute-force
probers get evicted with exponential backoff (1h → 4h → 16h → 2.7d →
10.7d, capped at 30d). Default jail covers sshd; maxretry=5 in 10m.

server-debug-tools: htop, tcpdump, dnsutils, jq, curl. Imported by
sunken-ship + phantom-ship via flake.nixosModules.server-debug-tools.
These are the practical bits we'd otherwise pick up by enabling
clan.core.enableRecommendedDefaults — but the full clan defaults flip
systemd-networkd/resolved on, which broke dnsmasq + navidrome's resolv
.conf bind-mount on the homelab servers, so we cherry-pick instead.
2026-04-25 13:51:19 +02:00
DannyDannyDanny
88c51399d0 refactor(nix): move flake to repo root 🚚
clan-cli silently ignores the `?dir=` URL parameter when resolving a
flake source, so with the flake at nixos/flake.nix `clan machines
update` fails with "flake.nix does not exist". Move the flake tree up
so the repo root contains flake.nix, flake.lock, flake-modules/, lib/,
modules/, sops/, and vars/. Host-specific NixOS modules stay in
nixos/{hosts,home,fish.nix,neovim.nix,…}; flake-module paths updated
accordingly.

- dotfiles-rebuild flakeRef is now "${dotfilesDir}#<host>" (was
  "${dotfilesDir}/nixos#<host>").
- CLAUDE.md build commands + clan section updated. nixupdate fish alias
  updated. sunken-ship hostsfile comment updated.
- Existing /etc/dotfiles checkouts on the servers will pick up the new
  layout on the next `dotfiles-rebuild` timer tick; the rebuild service
  was pre-updated via rsync so its flakeRef matches before the pull.

Also includes 4b follow-through: zerotier identities are now live on
both servers (sunken-ship=d553a2de33 controller, phantom-ship=6c048abbdc
peer) and IPv6 ping across the ZT mesh works.
2026-04-19 15:19:59 +02:00