commit bece440c4744f1fe92ba0ce31890fad7b83e61c8 Author: loveuer Date: Mon Dec 8 22:23:45 2025 +0800 wip v1.0.0 diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..87e49d2 --- /dev/null +++ b/.gitignore @@ -0,0 +1,6 @@ +.qoder +.vscode +.idea + +dist +x-* diff --git a/README.md b/README.md new file mode 100644 index 0000000..db02427 --- /dev/null +++ b/README.md @@ -0,0 +1,236 @@ +# go-alived + +A lightweight, dependency-free VRRP (Virtual Router Redundancy Protocol) implementation in Go, designed as a simple alternative to keepalived. + +## Features + +✅ **Phase 1: Core VRRP Functionality (Completed)** +- VRRP protocol implementation (RFC 3768/5798) +- Virtual IP management (add/remove VIPs) +- State machine (INIT/BACKUP/MASTER/FAULT) +- Priority-based master election +- Gratuitous ARP for network updates +- Raw socket VRRP packet send/receive +- Timer management (advertisement & master-down timers) +- VRRP instance manager with multi-instance support +- Configuration hot-reload (SIGHUP) + +✅ **Phase 2: Health Checking (Completed)** +- Health checker interface with rise/fall logic +- TCP health checks +- HTTP/HTTPS health checks +- ICMP ping checks +- Script-based checks (custom commands) +- Periodic health check scheduling +- Health check integration with VRRP priority +- Track scripts: automatic priority adjustment on health changes + +🚧 **Phase 3: Enhanced Features (Planned)** +- State transition scripts (notify_master/backup/fault) +- Email/Webhook notifications +- Sync groups +- Virtual MAC support +- Metrics export + +## Installation + +### Build from source + +```bash +git clone https://github.com/loveuer/go-alived.git +cd go-alived +go build -o go-alived . +``` + +## Quick Start + +### 1. Test Your Environment + +Before deployment, test if your environment supports VRRP: + +```bash +# Basic test (auto-detect network interface) +sudo ./go-alived test + +# Test specific interface +sudo ./go-alived test -i eth0 + +# Full test with VIP +sudo ./go-alived test -i eth0 -v 192.168.1.100/24 +``` + +### 2. Run the Service + +```bash +# Run with minimal config +sudo ./go-alived run -c config.mini.yaml -d + +# Run with full config +sudo ./go-alived -c config.yaml + +# Install as systemd service +sudo ./deployment/install.sh +sudo systemctl start go-alived +``` + +## Usage + +### Commands + +``` +go-alived # Run VRRP service (default) +go-alived run # Run VRRP service +go-alived test # Test environment for VRRP support +go-alived --help # Show help +go-alived --version # Show version +``` + +### Global Flags + +``` +-c, --config string Path to configuration file (default "/etc/go-alived/config.yaml") +-d, --debug Enable debug mode +-h, --help Show help +-v, --version Show version +``` + +### Test Command Flags + +``` +-i, --interface string Network interface to test (auto-detect if not specified) +-v, --vip string Test VIP address (e.g., 192.168.1.100/24) +``` + +See [USAGE.md](USAGE.md) for detailed usage documentation. + +## Configuration + +### Minimal Configuration + +```yaml +# config.mini.yaml - VRRP only +global: + router_id: "node1" + +vrrp_instances: + - name: "VI_1" + interface: "eth0" + state: "BACKUP" + virtual_router_id: 51 + priority: 100 + advert_interval: 1 + auth_type: "PASS" + auth_pass: "secret123" + virtual_ips: + - "192.168.1.100/24" +``` + +### Full Configuration Example + +See `config.example.yaml` for complete configuration with health checking. + +### Signals + +- `SIGHUP`: Reload configuration +- `SIGINT/SIGTERM`: Graceful shutdown + +## Architecture + +``` +go-alived/ +├── main.go # Application entry point +├── internal/ +│ ├── cmd/ # Cobra commands +│ │ ├── root.go # Root command +│ │ ├── run.go # Run service command +│ │ └── test.go # Environment test command +│ ├── vrrp/ # VRRP implementation +│ │ ├── packet.go # VRRP packet structure & marshaling +│ │ ├── socket.go # Raw socket operations +│ │ ├── state.go # State machine & timers +│ │ ├── arp.go # Gratuitous ARP +│ │ ├── instance.go # VRRP instance logic +│ │ └── manager.go # Instance manager +│ └── health/ # Health check system +│ ├── checker.go # Checker interface & state +│ ├── monitor.go # Health check scheduler +│ ├── tcp.go # TCP health checker +│ ├── http.go # HTTP/HTTPS health checker +│ ├── ping.go # ICMP ping checker +│ ├── script.go # Script checker +│ └── factory.go # Checker factory +├── pkg/ +│ ├── config/ # Configuration loading & validation +│ ├── logger/ # Logging system +│ └── netif/ # Network interface management +└── deployment/ # Deployment files + ├── go-alived.service # Systemd service file + ├── install.sh # Installation script + ├── uninstall.sh # Uninstallation script + ├── check-env.sh # Environment check script + ├── README.md # Deployment documentation + └── COMPATIBILITY.md # Environment compatibility guide +``` + +## Environment Compatibility + +### ✅ Fully Supported +- Physical servers +- KVM/QEMU virtual machines +- Proxmox VE +- VMware ESXi (with promiscuous mode) +- VirtualBox (with bridged network + promiscuous mode) + +### ⚠️ Limited Support +- Private cloud (depends on network configuration) +- Docker containers (requires `--privileged` and `--net=host`) +- Kubernetes (requires hostNetwork mode) + +### ❌ Not Supported +- AWS EC2 (multicast disabled) +- Aliyun ECS (multicast disabled) +- Azure VM (requires special configuration) +- Google Cloud (multicast disabled by default) + +**Why?** Public clouds typically disable multicast protocols (224.0.0.18) at the network virtualization layer. + +**Alternative**: Use cloud-native solutions like Elastic IP (AWS), SLB/HaVip (Aliyun), Load Balancer (Azure/GCP). + +See [deployment/COMPATIBILITY.md](deployment/COMPATIBILITY.md) for detailed compatibility information. + +## Requirements + +- Go 1.21+ (for building) +- Linux/macOS with root privileges (for raw sockets and interface management) +- Network interface with IPv4 address +- Multicast support (for VRRP) + +## Dependencies + +Minimal external dependencies: +- `github.com/vishvananda/netlink` - Network interface management +- `github.com/mdlayher/arp` - ARP packet handling +- `github.com/spf13/cobra` - CLI framework +- `golang.org/x/net/ipv4` - IPv4 raw socket support +- `golang.org/x/net/icmp` - ICMP ping support +- `gopkg.in/yaml.v3` - YAML configuration parsing + +## Documentation + +- [USAGE.md](USAGE.md) - Detailed usage guide +- [TESTING.md](TESTING.md) - Testing guide +- [deployment/README.md](deployment/README.md) - Deployment guide +- [deployment/COMPATIBILITY.md](deployment/COMPATIBILITY.md) - Environment compatibility +- [roadmap.md](roadmap.md) - Implementation roadmap + +## Roadmap + +See [roadmap.md](roadmap.md) for detailed implementation plan. + +## License + +MIT License + +## Contributing + +Contributions are welcome! Please feel free to submit issues and pull requests. diff --git a/TESTING.md b/TESTING.md new file mode 100644 index 0000000..6002744 --- /dev/null +++ b/TESTING.md @@ -0,0 +1,301 @@ +# VRRP 功能测试指南 + +## 测试环境准备 + +### 1. 单机测试(使用虚拟网卡) + +```bash +# macOS 创建虚拟网卡(lo0 回环接口别名) +sudo ifconfig lo0 alias 192.168.100.1/24 + +# Linux 创建虚拟网卡(使用 dummy 模块) +sudo modprobe dummy +sudo ip link add dummy0 type dummy +sudo ip addr add 192.168.100.1/24 dev dummy0 +sudo ip link set dummy0 up +``` + +### 2. 双机测试(推荐,真实场景) + +需要两台机器(虚拟机或物理机),在同一网段: +- Node1: 192.168.1.10/24 +- Node2: 192.168.1.20/24 +- VIP: 192.168.1.100/24 + +## 测试配置文件 + +### Node1 配置 (config-node1.yaml) + +```yaml +global: + router_id: "node1" + notification_email: "admin@example.com" + +vrrp_instances: + - name: "VI_1" + interface: "eth0" # 修改为实际网卡名 + state: "BACKUP" + virtual_router_id: 51 + priority: 100 # 较高优先级 + advert_interval: 1 + auth_type: "PASS" + auth_pass: "secret123" + virtual_ips: + - "192.168.1.100/24" # 修改为实际网段 +``` + +### Node2 配置 (config-node2.yaml) + +```yaml +global: + router_id: "node2" + notification_email: "admin@example.com" + +vrrp_instances: + - name: "VI_1" + interface: "eth0" # 修改为实际网卡名 + state: "BACKUP" + virtual_router_id: 51 + priority: 90 # 较低优先级 + advert_interval: 1 + auth_type: "PASS" + auth_pass: "secret123" + virtual_ips: + - "192.168.1.100/24" # 修改为实际网段 +``` + +## 测试步骤 + +### 测试 1: 启动和日志检查 + +**Node1:** +```bash +sudo ./go-alived --config config-node1.yaml --debug +``` + +**预期输出:** +``` +[2025-12-05 14:25:51] INFO: starting go-alived... +[2025-12-05 14:25:51] INFO: loading configuration from: config-node1.yaml +[2025-12-05 14:25:51] INFO: configuration loaded successfully +[2025-12-05 14:25:51] INFO: loaded VRRP instance: VI_1 +[2025-12-05 14:25:51] INFO: starting VRRP instance (VRID=51, Priority=100, Interface=eth0) +[2025-12-05 14:25:51] INFO: [VI_1] state changed: INIT -> BACKUP +[2025-12-05 14:25:51] INFO: [VI_1] transitioning to BACKUP state +``` + +**Node2:** +```bash +sudo ./go-alived --config config-node2.yaml --debug +``` + +### 测试 2: Master 选举 + +启动两个节点后,优先级高的 Node1 应该成为 MASTER。 + +**Node1 预期输出:** +``` +[2025-12-05 14:25:54] INFO: [VI_1] master down timer expired, becoming master +[2025-12-05 14:25:54] INFO: [VI_1] state changed: BACKUP -> MASTER +[2025-12-05 14:25:54] INFO: [VI_1] transitioning to MASTER state +[2025-12-05 14:25:54] INFO: [VI_1] adding virtual IPs +[2025-12-05 14:25:54] INFO: [VI_1] added VIP 192.168.1.100/32 +[2025-12-05 14:25:54] DEBUG: [VI_1] sent advertisement (priority=100) +``` + +**验证 VIP:** +```bash +# Node1 上执行 +ip addr show eth0 | grep 192.168.1.100 +# 应该能看到 VIP 已添加 +``` + +**Node2 保持 BACKUP:** +``` +[2025-12-05 14:25:54] DEBUG: [VI_1] received advertisement from 192.168.1.10 (priority=100, state=BACKUP) +# Node2 应该保持 BACKUP 状态 +``` + +### 测试 3: 故障切换 + +在 Node1 上停止 go-alived: + +```bash +# Node1 上按 Ctrl+C 或发送 SIGTERM +sudo pkill -SIGTERM go-alived +``` + +**Node1 预期输出:** +``` +[2025-12-05 14:26:10] INFO: received signal terminated, shutting down... +[2025-12-05 14:26:10] INFO: cleaning up resources... +[2025-12-05 14:26:10] INFO: [VI_1] stopping VRRP instance +[2025-12-05 14:26:10] INFO: [VI_1] removing virtual IPs +[2025-12-05 14:26:10] INFO: [VI_1] removed VIP 192.168.1.100/32 +``` + +**Node2 应该接管 (3秒内):** +``` +[2025-12-05 14:26:13] INFO: [VI_1] master down timer expired, becoming master +[2025-12-05 14:26:13] INFO: [VI_1] state changed: BACKUP -> MASTER +[2025-12-05 14:26:13] INFO: [VI_1] transitioning to MASTER state +[2025-12-05 14:26:13] INFO: [VI_1] adding virtual IPs +[2025-12-05 14:26:13] INFO: [VI_1] added VIP 192.168.1.100/32 +``` + +**验证 VIP 迁移:** +```bash +# Node2 上执行 +ip addr show eth0 | grep 192.168.1.100 +# 应该能看到 VIP 已添加 + +# 从第三台机器 ping VIP,应该不中断 +ping 192.168.1.100 +``` + +### 测试 4: 抢占测试 + +重新启动 Node1(优先级更高): + +```bash +# Node1 上执行 +sudo ./go-alived --config config-node1.yaml --debug +``` + +**Node1 预期行为:** +``` +[2025-12-05 14:27:00] INFO: [VI_1] state changed: INIT -> BACKUP +[2025-12-05 14:27:03] INFO: [VI_1] master down timer expired, becoming master +[2025-12-05 14:27:03] INFO: [VI_1] state changed: BACKUP -> MASTER +``` + +**Node2 预期行为 (检测到更高优先级后退位):** +``` +[2025-12-05 14:27:03] WARN: [VI_1] received higher priority advertisement, stepping down +[2025-12-05 14:27:03] INFO: [VI_1] state changed: MASTER -> BACKUP +[2025-12-05 14:27:03] INFO: [VI_1] transitioning to BACKUP state +[2025-12-05 14:27:03] INFO: [VI_1] removing virtual IPs +[2025-12-05 14:27:03] INFO: [VI_1] removed VIP 192.168.1.100/32 +``` + +### 测试 5: 配置热加载 + +修改 Node1 配置文件,改变优先级: + +```yaml +priority: 80 # 从 100 改为 80 +``` + +发送 SIGHUP 信号: + +```bash +sudo pkill -SIGHUP go-alived +``` + +**预期输出:** +``` +[2025-12-05 14:28:00] INFO: received SIGHUP, reloading configuration... +[2025-12-05 14:28:00] INFO: reloading VRRP configuration... +[2025-12-05 14:28:00] INFO: stopping all VRRP instances +[2025-12-05 14:28:00] INFO: loaded VRRP instance: VI_1 +[2025-12-05 14:28:00] INFO: starting VRRP instance (VRID=51, Priority=80, Interface=eth0) +[2025-12-05 14:28:00] INFO: VRRP configuration reloaded successfully +``` + +## 网络抓包验证 + +使用 tcpdump 抓取 VRRP 报文: + +```bash +# 抓取 VRRP 协议报文 (协议号 112) +sudo tcpdump -i eth0 -n proto 112 + +# 或者抓取组播地址 +sudo tcpdump -i eth0 -n dst 224.0.0.18 +``` + +**预期输出:** +``` +14:25:55.123456 IP 192.168.1.10 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s +14:25:56.123456 IP 192.168.1.10 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s +``` + +## 常见问题排查 + +### 1. 权限错误 +``` +failed to create raw socket: operation not permitted +``` +**解决:** 使用 `sudo` 运行 + +### 2. 接口不存在 +``` +failed to get interface eth0: no such network interface +``` +**解决:** 检查并修改配置文件中的 `interface` 字段为实际网卡名 +```bash +ip link show # 查看所有网卡 +``` + +### 3. VIP 添加失败 +``` +failed to add VIP: file exists +``` +**解决:** VIP 可能已存在,先删除: +```bash +sudo ip addr del 192.168.1.100/24 dev eth0 +``` + +### 4. 无法接收 VRRP 报文 +**检查防火墙:** +```bash +# Linux +sudo iptables -A INPUT -p 112 -j ACCEPT + +# macOS +# 系统偏好设置 -> 安全性与隐私 -> 防火墙 -> 防火墙选项 -> 允许 go-alived +``` + +### 5. macOS 特定问题 +macOS 不支持 `SO_BINDTODEVICE`,代码已自动兼容,但可能需要禁用防火墙: +```bash +sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate off +``` + +## 快速验证脚本 + +```bash +#!/bin/bash +# test-vrrp.sh + +echo "=== VRRP 功能测试 ===" + +# 1. 检查 VIP 是否添加 +echo "1. 检查 VIP..." +ip addr show | grep "192.168.1.100" && echo "✓ VIP 已添加" || echo "✗ VIP 未添加" + +# 2. 检查进程 +echo "2. 检查进程..." +pgrep -f go-alived && echo "✓ 进程运行中" || echo "✗ 进程未运行" + +# 3. 抓包 5 秒 +echo "3. 抓取 VRRP 报文 (5秒)..." +timeout 5 sudo tcpdump -i eth0 -n proto 112 -c 5 + +# 4. Ping VIP +echo "4. Ping VIP..." +ping -c 3 192.168.1.100 && echo "✓ VIP 可达" || echo "✗ VIP 不可达" + +echo "=== 测试完成 ===" +``` + +## 预期测试结果 + +✅ **通过标准:** +1. 双节点启动后,高优先级节点成为 MASTER +2. MASTER 节点成功添加 VIP +3. 停止 MASTER 后,BACKUP 在 3 秒内接管 +4. VIP 无缝迁移,ping 不中断 +5. 高优先级节点重启后成功抢占 MASTER +6. 配置热加载正常工作 +7. tcpdump 能抓到周期性的 VRRP Advertisement 报文 diff --git a/USAGE.md b/USAGE.md new file mode 100644 index 0000000..d8db144 --- /dev/null +++ b/USAGE.md @@ -0,0 +1,419 @@ +# go-alived 使用文档 + +## 命令概览 + +```bash +go-alived # 运行 VRRP 服务(默认命令) +go-alived run # 运行 VRRP 服务 +go-alived test # 测试环境是否支持 VRRP +go-alived --help # 显示帮助信息 +go-alived --version # 显示版本信息 +``` + +## 1. 环境测试 (test) + +在部署 go-alived 之前,建议先运行环境检测: + +```bash +# 基本检测(自动选择网卡) +sudo ./go-alived test + +# 指定网卡进行检测 +sudo ./go-alived test -i eth0 +sudo ./go-alived test --interface eth0 + +# 指定网卡和测试 VIP +sudo ./go-alived test -i eth0 -v 192.168.1.100/24 +sudo ./go-alived test --interface eth0 --vip 192.168.1.100/24 +``` + +**检测项目**: +- ✓ Root 权限检查 +- ✓ 网络接口状态 +- ✓ VIP 添加/删除功能 +- ✓ 组播支持 +- ✓ 防火墙配置 +- ✓ 内核参数 +- ✓ 服务冲突检测 +- ✓ 虚拟化环境识别 +- ✓ 云环境限制检测 + +**示例输出**: +``` +=== go-alived 环境测试 === + +检查运行权限... +检查网络接口... +自动选择网卡: eth0 +测试VIP添加/删除功能... +检查组播支持... +检查防火墙设置... +检查内核参数... +检查冲突服务... +检查虚拟化环境... +检查云环境... + +=== 测试结果 === + +✓ Root权限 以root用户运行 +✓ 网络接口 网卡 eth0 存在且已启动 +✓ VIP添加 成功添加VIP 192.168.1.100/32 +✓ VIP验证 VIP已成功添加到网卡 +✓ VIP可达性 VIP可以ping通 +✓ VIP删除 VIP删除成功 +✓ 组播支持 网卡支持组播 +⚠ 防火墙VRRP 防火墙未配置VRRP规则,建议添加: iptables -A INPUT -p 112 -j ACCEPT +✓ ip_forward ip_forward = 1 (正常) +✓ 服务冲突 未发现冲突的服务 +✓ 虚拟化 KVM/QEMU虚拟机(通常支持良好) +✓ 云环境 未检测到公有云环境限制 + +=== 总结 === + +⚠ 环境基本支持,但有 1 个警告 + 建议修复警告项以获得更好的稳定性 +``` + +### 2. 运行服务 (run) + +```bash +# 使用默认配置文件运行 +sudo ./go-alived + +# 或显式使用 run 命令 +sudo ./go-alived run + +# 指定配置文件 +sudo ./go-alived run -c /etc/go-alived/config.yaml +sudo ./go-alived run --config config.yaml + +# 启用调试模式 +sudo ./go-alived run -c config.yaml -d +sudo ./go-alived run --config config.yaml --debug + +# 简写形式(使用全局参数) +sudo ./go-alived -c config.yaml -d +``` + +### 3. 信号控制 + +```bash +# 重载配置(发送 SIGHUP) +sudo kill -HUP $(pgrep go-alived) + +# 或使用 systemctl(如果安装为服务) +sudo systemctl reload go-alived + +# 优雅停止 +sudo kill -TERM $(pgrep go-alived) +# 或 +sudo systemctl stop go-alived +``` + +## 命令行参数 + +### 全局参数(适用于所有命令) + +``` +-c, --config string 配置文件路径(默认: /etc/go-alived/config.yaml) +-d, --debug 启用调试日志 +-h, --help 显示帮助信息 +-v, --version 显示版本信息 +``` + +### run 命令参数 + +``` +-c, --config string 配置文件路径(默认: /etc/go-alived/config.yaml) +-d, --debug 启用调试日志 +``` + +### test 命令参数 + +``` +-i, --interface string 指定测试网卡名称(如 eth0) +-v, --vip string 指定测试 VIP(如 192.168.1.100/24) +``` + +## 配置文件 + +### 最小配置示例 + +```yaml +# config.mini.yaml - 仅 VRRP 功能 +global: + router_id: "node1" + +vrrp_instances: + - name: "VI_1" + interface: "eth0" + state: "BACKUP" + virtual_router_id: 51 + priority: 100 + advert_interval: 1 + auth_type: "PASS" + auth_pass: "secret123" + virtual_ips: + - "192.168.1.100/24" +``` + +### 完整配置示例 + +```yaml +# config.example.yaml - 包含健康检查 +global: + router_id: "node1" + notification_email: "admin@example.com" + +vrrp_instances: + - name: "VI_1" + interface: "eth0" + state: "BACKUP" + virtual_router_id: 51 + priority: 100 + advert_interval: 1 + auth_type: "PASS" + auth_pass: "secret123" + virtual_ips: + - "192.168.1.100/24" + - "192.168.1.101/24" + track_scripts: + - "check_nginx" + +health_checkers: + - name: "check_nginx" + type: "tcp" + interval: 3s + timeout: 2s + rise: 3 + fall: 2 + config: + host: "127.0.0.1" + port: 80 +``` + +## 部署方式 + +### 方式 1: 直接运行 + +```bash +# 编译 +go build -o go-alived . + +# 运行测试 +sudo ./go-alived test --test-interface eth0 + +# 启动服务 +sudo ./go-alived --config config.yaml --debug +``` + +### 方式 2: Systemd 服务 + +```bash +# 使用安装脚本 +sudo ./deployment/install.sh + +# 编辑配置 +sudo vim /etc/go-alived/config.yaml + +# 启动服务 +sudo systemctl start go-alived + +# 查看状态 +sudo systemctl status go-alived + +# 查看日志 +sudo journalctl -u go-alived -f + +# 设置开机自启 +sudo systemctl enable go-alived +``` + +## 常见使用场景 + +### 场景 1: Web 服务高可用 + +**配置示例**: +```yaml +vrrp_instances: + - name: "WEB_HA" + interface: "eth0" + virtual_router_id: 51 + priority: 100 # 主节点 + virtual_ips: + - "192.168.1.100/24" + track_scripts: + - "check_nginx" + +health_checkers: + - name: "check_nginx" + type: "http" + interval: 3s + timeout: 2s + rise: 3 + fall: 2 + config: + url: "http://127.0.0.1/health" + expected_status: 200 +``` + +**工作原理**: +1. Nginx 正常时,主节点(priority=100)持有 VIP +2. Nginx 故障时,健康检查失败,主节点优先级降低(100-10=90) +3. 备节点(priority=90)优先级更高,接管 VIP +4. Nginx 恢复后,主节点优先级恢复,重新接管 VIP + +### 场景 2: 数据库主备 + +**主节点配置**: +```yaml +vrrp_instances: + - name: "DB_MASTER" + interface: "eth0" + priority: 100 + virtual_ips: + - "192.168.1.200/24" + track_scripts: + - "check_mysql" + +health_checkers: + - name: "check_mysql" + type: "tcp" + interval: 5s + config: + host: "127.0.0.1" + port: 3306 +``` + +**备节点配置**: +```yaml +vrrp_instances: + - name: "DB_MASTER" + interface: "eth0" + priority: 90 # 优先级较低 + virtual_ips: + - "192.168.1.200/24" + track_scripts: + - "check_mysql" +``` + +### 场景 3: 多 VIP 负载均衡 + +```yaml +vrrp_instances: + - name: "VI_WEB" + virtual_router_id: 51 + priority: 100 + virtual_ips: + - "192.168.1.100/24" + + - name: "VI_API" + virtual_router_id: 52 + priority: 90 + virtual_ips: + - "192.168.1.101/24" +``` + +## 故障排查 + +### 查看日志 + +```bash +# Systemd 日志 +sudo journalctl -u go-alived -f + +# 查看最近 100 行 +sudo journalctl -u go-alived -n 100 + +# 查看某个时间段 +sudo journalctl -u go-alived --since "1 hour ago" +``` + +### 抓包调试 + +```bash +# 抓取 VRRP 报文 +sudo tcpdump -i eth0 proto 112 -v + +# 抓取指定 VIP 的流量 +sudo tcpdump -i eth0 host 192.168.1.100 + +# 抓取组播报文 +sudo tcpdump -i eth0 dst 224.0.0.18 +``` + +### 手动测试 VIP + +```bash +# 添加 VIP +sudo ip addr add 192.168.1.100/24 dev eth0 + +# 发送免费 ARP +sudo arping -c 3 -A -I eth0 192.168.1.100 + +# 验证 +ip addr show eth0 | grep 192.168.1.100 + +# 删除 VIP +sudo ip addr del 192.168.1.100/24 dev eth0 +``` + +### 检查网卡状态 + +```bash +# 查看网卡 +ip link show + +# 查看 IP 地址 +ip addr show eth0 + +# 查看路由 +ip route show + +# 查看组播组 +ip maddr show eth0 +``` + +## 性能优化 + +### 1. 减少 Advertisement 间隔 + +```yaml +advert_interval: 1 # 默认 1 秒,可以更快切换 +``` + +### 2. 调整健康检查频率 + +```yaml +health_checkers: + - interval: 2s # 更频繁的检查 + timeout: 1s # 更短的超时 + rise: 2 # 更快恢复 + fall: 2 # 更快检测故障 +``` + +### 3. 内核参数优化 + +```bash +# 允许非本地 IP 绑定 +echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind + +# ARP 优化 +echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore +echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce +``` + +## 安全建议 + +1. **使用强密码**: `auth_pass` 使用复杂密码 +2. **网络隔离**: 将 VRRP 流量放在独立 VLAN +3. **限制访问**: 使用防火墙限制 VRRP 报文来源 +4. **日志审计**: 定期检查状态变化日志 +5. **配置备份**: 定期备份配置文件 + +## 更多资源 + +- [GitHub 仓库](https://github.com/loveuer/go-alived) +- [部署文档](deployment/README.md) +- [兼容性说明](deployment/COMPATIBILITY.md) +- [测试指南](TESTING.md) \ No newline at end of file diff --git a/deployment/COMPATIBILITY.md b/deployment/COMPATIBILITY.md new file mode 100644 index 0000000..5af0240 --- /dev/null +++ b/deployment/COMPATIBILITY.md @@ -0,0 +1,269 @@ +# VRRP 环境兼容性说明 + +## 支持的环境 + +### ✅ 完全支持 +- **物理服务器**: 完全支持所有功能 +- **本地虚拟机(网络配置正确)**: + - KVM/QEMU: 完全支持 + - Proxmox VE: 完全支持 + - VMware ESXi: 需要启用混杂模式 + - VirtualBox: 需要桥接网络 + 混杂模式 + - Hyper-V: 需要外部网络交换机 + +### ⚠️ 部分支持 +- **某些私有云环境**: 取决于网络配置 +- **Docker 容器**: 需要 `--privileged` 和 `--net=host` 模式 +- **Kubernetes**: 需要 hostNetwork 模式 + +### ❌ 不支持 +- **AWS EC2**: 不支持组播,无法运行 VRRP +- **阿里云 ECS**: 不支持组播,无法运行 VRRP +- **Azure VM**: 默认不支持,需要特殊配置 +- **Google Cloud**: 默认不支持组播 +- **大多数公有云**: 网络虚拟化层面禁用了组播 + +## 为什么云环境不支持 VRRP? + +1. **组播协议限制**: VRRP 使用 IP 组播地址 224.0.0.18,云环境网络虚拟化层通常过滤组播流量 +2. **安全考虑**: 云厂商不希望用户自行管理 IP 漂移,避免 IP 冲突 +3. **网络架构**: SDN (软件定义网络) 架构不支持传统的 MAC 地址漂移 + +## 云环境替代方案 + +### AWS +```yaml +方案1: Elastic IP (EIP) +- 使用 AWS API 动态绑定/解绑 EIP +- 结合健康检查脚本实现故障切换 + +方案2: Application Load Balancer (ALB) +- 7层负载均衡 +- 自动健康检查和故障切换 + +方案3: Network Load Balancer (NLB) +- 4层负载均衡 +- 支持静态 IP +``` + +### 阿里云 +```yaml +方案1: 高可用虚拟IP (HaVip) +- 阿里云提供的 VRRP 替代方案 +- 支持主备切换 + +方案2: 负载均衡 SLB +- 4层/7层负载均衡 +- 自动健康检查 +``` + +### Azure +```yaml +方案1: Azure Load Balancer +- 标准负载均衡器 +- 支持高可用性 + +方案2: Azure Traffic Manager +- DNS 级别的流量管理 +- 支持多区域故障切换 +``` + +## 虚拟化环境配置指南 + +### VMware ESXi +1. 选择虚拟机 +2. 编辑设置 → 网络适配器 +3. 展开 "高级选项" +4. 混杂模式: **允许** +5. MAC 地址更改: **允许** +6. 伪传输: **允许** + +### VirtualBox +1. 虚拟机设置 → 网络 +2. 连接方式: **桥接网卡** +3. 高级 → 混杂模式: **全部允许** +4. 高级 → 接入网线: **勾选** + +### KVM/libvirt +```xml + + + +
+ +``` + +### Proxmox VE +默认配置即可支持,使用 vmbr0 桥接网络。 + +## 检测脚本使用 + +运行环境检测脚本: + +```bash +# 下载并运行检测脚本 +sudo ./deployment/check-env.sh +``` + +脚本会自动检测: +1. ✓ 运行权限(root) +2. ✓ 操作系统兼容性 +3. ✓ 网络接口状态 +4. ✓ VIP 添加能力 +5. ✓ VRRP 协议支持 +6. ✓ 防火墙配置 +7. ✓ 内核参数 +8. ✓ 服务冲突检测 +9. ✓ 组播支持 +10. ✓ 虚拟化环境 +11. ✓ 云环境限制 + +## 常见问题排查 + +### 1. VIP 无法添加 + +**症状**: `ip addr add` 命令失败 + +**可能原因**: +- 权限不足(需要 root) +- IP 地址冲突 +- 网络接口不存在或未启动 +- 子网掩码错误 + +**解决方法**: +```bash +# 检查网卡状态 +ip link show eth0 + +# 检查 IP 冲突 +arping -I eth0 192.168.1.100 + +# 手动测试添加 +sudo ip addr add 192.168.1.100/24 dev eth0 +``` + +### 2. VIP 添加成功但无法 Ping 通 + +**可能原因**: +- 防火墙阻止 ICMP +- 路由配置错误 +- ARP 表未更新 +- 网络隔离(VLAN) + +**解决方法**: +```bash +# 发送免费 ARP +arping -c 3 -A -I eth0 192.168.1.100 + +# 检查路由 +ip route show + +# 检查防火墙 +iptables -L -n | grep ICMP +``` + +### 3. VRRP 报文无法发送/接收 + +**症状**: 双节点无法选举 Master + +**可能原因**: +- 组播被过滤 +- 防火墙阻止协议 112 +- 网络交换机禁用组播 +- 虚拟机混杂模式未启用 + +**解决方法**: +```bash +# 抓包验证 VRRP 报文 +sudo tcpdump -i eth0 proto 112 -v + +# 检查组播路由 +ip maddr show eth0 + +# 添加防火墙规则 +sudo iptables -A INPUT -p 112 -j ACCEPT +sudo iptables -A OUTPUT -p 112 -j ACCEPT +``` + +### 4. 云环境 VRRP 不工作 + +**确认方法**: +```bash +# 运行检测脚本 +sudo ./deployment/check-env.sh + +# 手动检查云环境 +curl -s -m 1 http://169.254.169.254/latest/meta-data/instance-id +``` + +**解决方案**: 使用云厂商提供的高可用方案(见上方"云环境替代方案") + +## 网络环境要求 + +### 必需条件 +- [x] 二层网络连通(同一 VLAN/子网) +- [x] 支持组播(224.0.0.18) +- [x] 允许 ARP 广播 +- [x] 网卡支持混杂模式(虚拟机环境) + +### 推荐配置 +- [x] 千兆以上网络 +- [x] 低延迟网络(< 10ms) +- [x] 禁用 STP 或配置 PortFast(交换机) +- [x] 专用 VLAN(生产环境) + +## 测试步骤 + +### 1. 基础网络测试 +```bash +# 测试网卡连通性 +ping -c 3 <对端IP> + +# 测试组播连通性(需要两台机器) +# 机器 A +iperf3 -s -B 224.0.0.18 + +# 机器 B +iperf3 -c 224.0.0.18 -u -b 1M +``` + +### 2. VIP 手动测试 +```bash +# 添加 VIP +sudo ip addr add 192.168.1.100/24 dev eth0 + +# 发送免费 ARP +sudo arping -c 3 -A -I eth0 192.168.1.100 + +# 从其他机器 ping VIP +ping 192.168.1.100 + +# 删除 VIP +sudo ip addr del 192.168.1.100/24 dev eth0 +``` + +### 3. VRRP 功能测试 +```bash +# 使用最小配置启动 +sudo ./go-alived --config config.mini.yaml --debug + +# 另一个终端监控网卡 +watch -n 1 "ip addr show eth0 | grep inet" + +# 抓包验证 +sudo tcpdump -i eth0 proto 112 -v +``` + +## 生产环境部署建议 + +1. **使用专用网络**: 将 VRRP 流量与业务流量隔离 +2. **配置监控**: 监控 VIP 状态、VRRP 状态变化 +3. **测试故障切换**: 定期测试主备切换是否正常 +4. **文档记录**: 记录网络拓扑、IP 分配、故障处理流程 +5. **备份配置**: 定期备份 go-alived 配置文件 + +## 参考文档 + +- [VRRP RFC 3768](https://tools.ietf.org/html/rfc3768) +- [Linux IP 命令手册](https://man7.org/linux/man-pages/man8/ip.8.html) +- [iptables VRRP 配置](https://www.netfilter.org/) diff --git a/deployment/README.md b/deployment/README.md new file mode 100644 index 0000000..fe65d2b --- /dev/null +++ b/deployment/README.md @@ -0,0 +1,166 @@ +# go-alived Deployment + +本目录包含 go-alived 的部署文件和安装脚本。 + +## Systemd Service + +### 安装步骤 + +1. **编译二进制文件** +```bash +go build -o go-alived . +``` + +2. **安装二进制文件** +```bash +sudo cp go-alived /usr/local/bin/ +sudo chmod +x /usr/local/bin/go-alived +``` + +3. **创建配置目录** +```bash +sudo mkdir -p /etc/go-alived +sudo mkdir -p /etc/go-alived/scripts +``` + +4. **复制配置文件** +```bash +sudo cp config.example.yaml /etc/go-alived/config.yaml +sudo vim /etc/go-alived/config.yaml # 根据实际环境修改配置 +``` + +5. **安装 systemd 服务** +```bash +sudo cp deployment/go-alived.service /etc/systemd/system/ +sudo systemctl daemon-reload +``` + +6. **启动服务** +```bash +# 启动服务 +sudo systemctl start go-alived + +# 查看状态 +sudo systemctl status go-alived + +# 查看日志 +sudo journalctl -u go-alived -f + +# 设置开机自启 +sudo systemctl enable go-alived +``` + +### 服务管理命令 + +```bash +# 启动服务 +sudo systemctl start go-alived + +# 停止服务 +sudo systemctl stop go-alived + +# 重启服务 +sudo systemctl restart go-alived + +# 重载配置(发送 SIGHUP 信号) +sudo systemctl reload go-alived + +# 查看服务状态 +sudo systemctl status go-alived + +# 查看实时日志 +sudo journalctl -u go-alived -f + +# 查看最近的日志 +sudo journalctl -u go-alived -n 100 + +# 启用开机自启 +sudo systemctl enable go-alived + +# 禁用开机自启 +sudo systemctl disable go-alived +``` + +## Service 文件说明 + +### 主要配置项 + +- **ExecStart**: 服务启动命令,指向 `/usr/local/bin/go-alived` +- **ExecReload**: 重载配置命令(发送 SIGHUP 信号) +- **User/Group**: 以 root 用户运行(需要 raw socket 和网络接口管理权限) +- **Restart**: 失败时自动重启,间隔 5 秒 + +### 安全设置 + +- **Capabilities**: + - `CAP_NET_ADMIN`: 管理网络接口(添加/删除 IP) + - `CAP_NET_RAW`: 创建原始 socket(VRRP 协议) + - `CAP_NET_BIND_SERVICE`: 绑定特权端口(可选) + +- **Protection**: + - `ProtectSystem=strict`: 保护系统目录只读 + - `ProtectHome=true`: 保护用户主目录 + - `PrivateTmp=true`: 使用私有临时目录 + - `ReadWritePaths=/etc/go-alived`: 仅允许写入配置目录 + +### 资源限制 + +- `LimitNOFILE=65535`: 最大打开文件数 +- `LimitNPROC=512`: 最大进程数 + +## 配置文件位置 + +默认配置文件位置:`/etc/go-alived/config.yaml` + +推荐的目录结构: +``` +/etc/go-alived/ +├── config.yaml # 主配置文件 +└── scripts/ # 脚本目录 + ├── notify_master.sh # Master 状态通知脚本 + ├── notify_backup.sh # Backup 状态通知脚本 + ├── notify_fault.sh # Fault 状态通知脚本 + └── check_service.sh # 健康检查脚本 +``` + +## 卸载 + +```bash +# 停止并禁用服务 +sudo systemctl stop go-alived +sudo systemctl disable go-alived + +# 删除服务文件 +sudo rm /etc/systemd/system/go-alived.service +sudo systemctl daemon-reload + +# 删除二进制文件 +sudo rm /usr/local/bin/go-alived + +# 删除配置文件(可选) +sudo rm -rf /etc/go-alived +``` + +## 故障排查 + +### 查看服务状态 +```bash +sudo systemctl status go-alived +``` + +### 查看详细日志 +```bash +sudo journalctl -u go-alived -n 100 --no-pager +``` + +### 测试配置文件 +```bash +/usr/local/bin/go-alived --config /etc/go-alived/config.yaml --debug +``` + +### 常见问题 + +1. **权限错误**: 确保服务以 root 运行或具有 CAP_NET_ADMIN/CAP_NET_RAW 权限 +2. **网卡不存在**: 检查配置文件中的 interface 是否正确 +3. **端口冲突**: 确保没有其他 keepalived 或 VRRP 服务在运行 +4. **VIP 添加失败**: 检查网络配置和 IP 地址是否冲突 diff --git a/deployment/check-env.sh b/deployment/check-env.sh new file mode 100755 index 0000000..e7967ce --- /dev/null +++ b/deployment/check-env.sh @@ -0,0 +1,334 @@ +#!/bin/bash + +# VIP 环境检测脚本 +# 用于检测当前环境是否支持 VRRP 和 VIP 功能 + +set -e + +COLOR_RED='\033[0;31m' +COLOR_GREEN='\033[0;32m' +COLOR_YELLOW='\033[1;33m' +COLOR_BLUE='\033[0;34m' +COLOR_NC='\033[0m' + +ERRORS=0 +WARNINGS=0 + +echo -e "${COLOR_BLUE}=== go-alived 环境检测工具 ===${COLOR_NC}" +echo "" + +check_pass() { + echo -e "${COLOR_GREEN}✓${COLOR_NC} $1" +} + +check_fail() { + echo -e "${COLOR_RED}✗${COLOR_NC} $1" + ERRORS=$((ERRORS + 1)) +} + +check_warn() { + echo -e "${COLOR_YELLOW}⚠${COLOR_NC} $1" + WARNINGS=$((WARNINGS + 1)) +} + +# 1. 检查是否 root 用户 +echo "1. 检查运行权限..." +if [ "$EUID" -eq 0 ]; then + check_pass "以 root 用户运行" +else + check_fail "需要 root 权限,请使用 sudo 运行此脚本" +fi +echo "" + +# 2. 检查操作系统 +echo "2. 检查操作系统..." +OS=$(uname -s) +if [ "$OS" = "Linux" ]; then + check_pass "操作系统: $OS" + DISTRO=$(cat /etc/os-release | grep ^NAME= | cut -d'"' -f2 || echo "Unknown") + echo " 发行版: $DISTRO" +elif [ "$OS" = "Darwin" ]; then + check_warn "操作系统: macOS - 功能受限,仅支持部分 VRRP 功能" + echo " macOS 不支持某些 Linux 特有的网络功能" +else + check_fail "不支持的操作系统: $OS" +fi +echo "" + +# 3. 检查网络接口 +echo "3. 检查网络接口..." +read -p "请输入要使用的网卡名称(如 eth0, ens33, en0): " INTERFACE + +if ip link show "$INTERFACE" > /dev/null 2>&1; then + check_pass "网卡 $INTERFACE 存在" + + # 检查接口状态 + STATE=$(ip link show "$INTERFACE" | grep -o "state [A-Z]*" | awk '{print $2}') + if [ "$STATE" = "UP" ]; then + check_pass "网卡状态: UP" + else + check_fail "网卡状态: $STATE (需要是 UP)" + fi + + # 检查是否有 IPv4 地址 + IP_ADDR=$(ip -4 addr show "$INTERFACE" | grep "inet " | awk '{print $2}' | head -n1) + if [ -n "$IP_ADDR" ]; then + check_pass "网卡已配置 IPv4 地址: $IP_ADDR" + else + check_fail "网卡未配置 IPv4 地址" + fi +else + check_fail "网卡 $INTERFACE 不存在" + echo " 可用网卡列表:" + ip link show | grep "^[0-9]" | awk '{print " - " $2}' | sed 's/:$//' +fi +echo "" + +# 4. 检查 VIP 是否可以添加 +echo "4. 测试 VIP 添加功能..." +read -p "请输入要测试的 VIP (如 192.168.1.100/24): " TEST_VIP + +if [ -n "$TEST_VIP" ] && [ -n "$INTERFACE" ]; then + # 检查 VIP 格式 + if [[ $TEST_VIP =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/[0-9]+$ ]]; then + check_pass "VIP 格式正确: $TEST_VIP" + + # 尝试添加 VIP + if ip addr add "$TEST_VIP" dev "$INTERFACE" 2>/dev/null; then + check_pass "VIP 添加成功" + + # 验证 VIP 是否真的添加了 + if ip addr show "$INTERFACE" | grep -q "$TEST_VIP"; then + check_pass "VIP 已添加到网卡" + + # 测试 VIP 是否可达(本机 ping) + VIP_ADDR=$(echo $TEST_VIP | cut -d'/' -f1) + if ping -c 1 -W 1 "$VIP_ADDR" > /dev/null 2>&1; then + check_pass "VIP 可以 ping 通" + else + check_warn "VIP ping 失败(可能需要配置路由)" + fi + else + check_fail "VIP 添加后无法在网卡上找到" + fi + + # 清理:删除测试 VIP + echo " 清理测试 VIP..." + ip addr del "$TEST_VIP" dev "$INTERFACE" 2>/dev/null || true + check_pass "测试 VIP 已删除" + else + check_fail "VIP 添加失败(可能是权限问题或 IP 冲突)" + fi + else + check_fail "VIP 格式错误,正确格式: 192.168.1.100/24" + fi +fi +echo "" + +# 5. 检查 VRRP 协议支持 +echo "5. 检查 VRRP 协议支持..." + +# 检查是否可以创建 raw socket +if [ "$OS" = "Linux" ]; then + if [ -e /proc/sys/net/ipv4/ip_forward ]; then + IP_FORWARD=$(cat /proc/sys/net/ipv4/ip_forward) + if [ "$IP_FORWARD" = "1" ]; then + check_pass "IP 转发已启用" + else + check_warn "IP 转发未启用(某些场景需要)" + echo " 启用命令: echo 1 > /proc/sys/net/ipv4/ip_forward" + fi + fi +fi + +# 检查防火墙 +echo "" +echo "6. 检查防火墙设置..." +if [ "$OS" = "Linux" ]; then + # 检查 iptables + if command -v iptables > /dev/null 2>&1; then + if iptables -L INPUT -n | grep -q "112"; then + check_pass "防火墙已允许 VRRP 协议 (112)" + else + check_warn "防火墙未配置 VRRP 规则" + echo " 添加规则: iptables -A INPUT -p 112 -j ACCEPT" + fi + fi + + # 检查 firewalld + if command -v firewall-cmd > /dev/null 2>&1; then + if systemctl is-active --quiet firewalld; then + if firewall-cmd --list-protocols | grep -q vrrp; then + check_pass "firewalld 已允许 VRRP 协议" + else + check_warn "firewalld 未配置 VRRP 规则" + echo " 添加规则: firewall-cmd --permanent --add-protocol=vrrp" + echo " 重载配置: firewall-cmd --reload" + fi + fi + fi +fi +echo "" + +# 7. 检查内核参数 +echo "7. 检查内核参数..." +if [ "$OS" = "Linux" ]; then + # 检查 ARP 相关参数 + if [ -e /proc/sys/net/ipv4/conf/all/arp_ignore ]; then + ARP_IGNORE=$(cat /proc/sys/net/ipv4/conf/all/arp_ignore) + if [ "$ARP_IGNORE" = "0" ]; then + check_pass "ARP 配置正常" + else + check_warn "ARP ignore 设置为 $ARP_IGNORE,可能影响 VIP" + fi + fi + + # 检查 rp_filter + if [ -e /proc/sys/net/ipv4/conf/all/rp_filter ]; then + RP_FILTER=$(cat /proc/sys/net/ipv4/conf/all/rp_filter) + if [ "$RP_FILTER" = "0" ] || [ "$RP_FILTER" = "2" ]; then + check_pass "反向路径过滤配置正常" + else + check_warn "rp_filter 设置为 $RP_FILTER,建议设置为 0 或 2" + echo " 修改命令: echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter" + fi + fi +fi +echo "" + +# 8. 检查是否有其他 VRRP 服务 +echo "8. 检查冲突的服务..." +CONFLICT_SERVICES=("keepalived" "vrrpd") +for service in "${CONFLICT_SERVICES[@]}"; do + if systemctl is-active --quiet "$service" 2>/dev/null; then + check_warn "发现运行中的 $service 服务,可能冲突" + echo " 停止命令: systemctl stop $service" + fi +done + +if pgrep -x "keepalived" > /dev/null; then + check_warn "发现运行中的 keepalived 进程" +fi +echo "" + +# 9. 检查组播支持 +echo "9. 检查组播支持..." +if [ -n "$INTERFACE" ]; then + if ip maddr show "$INTERFACE" > /dev/null 2>&1; then + check_pass "网卡支持组播" + + # 尝试 ping 组播地址 + if timeout 2 ping -c 1 -I "$INTERFACE" 224.0.0.18 > /dev/null 2>&1; then + check_pass "可以发送组播报文" + else + check_warn "组播报文发送可能受限(正常情况)" + fi + else + check_warn "无法查询组播配置" + fi +fi +echo "" + +# 10. 虚拟化环境检测 +echo "10. 检查虚拟化环境..." +if [ -e /sys/class/dmi/id/product_name ]; then + PRODUCT=$(cat /sys/class/dmi/id/product_name 2>/dev/null || echo "Unknown") + case $PRODUCT in + *VMware*) + check_warn "检测到 VMware 虚拟机" + echo " VMware 需要启用混杂模式才能支持 VRRP" + echo " 设置: 虚拟机 -> 网络适配器 -> 高级 -> 混杂模式: 允许全部" + ;; + *VirtualBox*) + check_warn "检测到 VirtualBox 虚拟机" + echo " VirtualBox 需要使用桥接模式且启用混杂模式" + echo " 设置: 网络 -> 桥接网卡 -> 高级 -> 混杂模式: 全部允许" + ;; + *KVM*|*QEMU*) + check_pass "检测到 KVM/QEMU 虚拟机(通常支持良好)" + ;; + *Amazon*|*EC2*) + check_fail "检测到 AWS EC2 实例 - 不支持 VRRP" + echo " AWS 不支持组播协议,请使用 AWS Elastic IP 替代" + ;; + *) + if [ "$PRODUCT" != "Unknown" ]; then + echo " 物理机或未识别的虚拟化: $PRODUCT" + fi + ;; + esac +elif command -v systemd-detect-virt > /dev/null 2>&1; then + VIRT=$(systemd-detect-virt) + if [ "$VIRT" != "none" ]; then + check_warn "检测到虚拟化环境: $VIRT" + fi +fi +echo "" + +# 11. 云环境检测 +echo "11. 检查云环境限制..." +CLOUD_DETECTED=0 + +# 检查 AWS +if curl -s -m 1 http://169.254.169.254/latest/meta-data/instance-id > /dev/null 2>&1; then + check_fail "检测到 AWS 环境 - 不支持 VRRP" + echo " AWS 不支持 VRRP 协议,请使用:" + echo " - Elastic IP (EIP) 实现 IP 漂移" + echo " - Application Load Balancer (ALB)" + echo " - Network Load Balancer (NLB)" + CLOUD_DETECTED=1 +fi + +# 检查 阿里云 +if curl -s -m 1 http://100.100.100.200/latest/meta-data/instance-id > /dev/null 2>&1; then + check_fail "检测到阿里云 ECS - 不支持 VRRP" + echo " 阿里云 ECS 不支持 VRRP,请使用:" + echo " - 负载均衡 SLB" + echo " - 高可用虚拟 IP (HaVip)" + CLOUD_DETECTED=1 +fi + +# 检查 Azure +if curl -s -m 1 -H "Metadata: true" http://169.254.169.254/metadata/instance?api-version=2021-02-01 > /dev/null 2>&1; then + check_warn "检测到 Azure 环境 - VRRP 支持受限" + echo " Azure 建议使用:" + echo " - Azure Load Balancer" + echo " - Azure Traffic Manager" + CLOUD_DETECTED=1 +fi + +# 检查 GCP +if curl -s -m 1 -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/id > /dev/null 2>&1; then + check_warn "检测到 Google Cloud 环境 - VRRP 支持受限" + echo " GCP 建议使用:" + echo " - Cloud Load Balancing" + echo " - Forwarding Rules" + CLOUD_DETECTED=1 +fi + +if [ $CLOUD_DETECTED -eq 0 ]; then + check_pass "未检测到云环境限制" +fi +echo "" + +# 总结 +echo "" +echo -e "${COLOR_BLUE}=== 检测总结 ===${COLOR_NC}" +echo "" + +if [ $ERRORS -eq 0 ] && [ $WARNINGS -eq 0 ]; then + echo -e "${COLOR_GREEN}✓ 环境完全支持 go-alived${COLOR_NC}" + echo " 可以正常使用所有功能" +elif [ $ERRORS -eq 0 ]; then + echo -e "${COLOR_YELLOW}⚠ 环境基本支持,但有 $WARNINGS 个警告${COLOR_NC}" + echo " 建议修复警告项以获得更好的稳定性" +else + echo -e "${COLOR_RED}✗ 发现 $ERRORS 个错误, $WARNINGS 个警告${COLOR_NC}" + echo " 请修复错误后再使用 go-alived" +fi + +echo "" +echo "详细文档: https://github.com/loveuer/go-alived" +echo "" + +exit $ERRORS diff --git a/deployment/go-alived.service b/deployment/go-alived.service new file mode 100644 index 0000000..bbf4d40 --- /dev/null +++ b/deployment/go-alived.service @@ -0,0 +1,38 @@ +[Unit] +Description=Go-Alived - VRRP High Availability Service +Documentation=https://github.com/loveuer/go-alived +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +User=root +Group=root + +ExecStart=/usr/local/bin/go-alived --config /etc/go-alived/config.yaml +ExecReload=/bin/kill -HUP $MAINPID + +Restart=on-failure +RestartSec=5s + +StandardOutput=journal +StandardError=journal +SyslogIdentifier=go-alived + +# Security settings +NoNewPrivileges=false +PrivateTmp=true +ProtectSystem=strict +ProtectHome=true +ReadWritePaths=/etc/go-alived + +# Resource limits +LimitNOFILE=65535 +LimitNPROC=512 + +# Capabilities required for VRRP operations +AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW CAP_NET_BIND_SERVICE +CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_RAW CAP_NET_BIND_SERVICE + +[Install] +WantedBy=multi-user.target diff --git a/deployment/install.sh b/deployment/install.sh new file mode 100755 index 0000000..499059f --- /dev/null +++ b/deployment/install.sh @@ -0,0 +1,53 @@ +#!/bin/bash +set -e + +echo "=== Installing go-alived ===" + +if [ "$EUID" -ne 0 ]; then + echo "Please run as root (use sudo)" + exit 1 +fi + +BINARY_PATH="/usr/local/bin/go-alived" +CONFIG_DIR="/etc/go-alived" +SERVICE_FILE="/etc/systemd/system/go-alived.service" + +echo "1. Installing binary to ${BINARY_PATH}..." +if [ ! -f "go-alived" ]; then + echo "Error: go-alived binary not found. Please run 'go build' first." + exit 1 +fi +cp go-alived ${BINARY_PATH} +chmod +x ${BINARY_PATH} +echo " ✓ Binary installed" + +echo "2. Creating configuration directory ${CONFIG_DIR}..." +mkdir -p ${CONFIG_DIR} +mkdir -p ${CONFIG_DIR}/scripts +echo " ✓ Directories created" + +if [ ! -f "${CONFIG_DIR}/config.yaml" ]; then + echo "3. Installing example configuration..." + cp config.example.yaml ${CONFIG_DIR}/config.yaml + echo " ✓ Configuration installed to ${CONFIG_DIR}/config.yaml" + echo " ⚠ Please edit ${CONFIG_DIR}/config.yaml before starting the service" +else + echo "3. Configuration already exists at ${CONFIG_DIR}/config.yaml" + echo " ⚠ Skipping configuration installation" +fi + +echo "4. Installing systemd service..." +cp deployment/go-alived.service ${SERVICE_FILE} +systemctl daemon-reload +echo " ✓ Service installed" + +echo "" +echo "=== Installation complete ===" +echo "" +echo "Next steps:" +echo " 1. Edit configuration: vim ${CONFIG_DIR}/config.yaml" +echo " 2. Start service: systemctl start go-alived" +echo " 3. Check status: systemctl status go-alived" +echo " 4. View logs: journalctl -u go-alived -f" +echo " 5. Enable autostart: systemctl enable go-alived" +echo "" diff --git a/deployment/uninstall.sh b/deployment/uninstall.sh new file mode 100755 index 0000000..1f42f2c --- /dev/null +++ b/deployment/uninstall.sh @@ -0,0 +1,53 @@ +#!/bin/bash +set -e + +echo "=== Uninstalling go-alived ===" + +if [ "$EUID" -ne 0 ]; then + echo "Please run as root (use sudo)" + exit 1 +fi + +BINARY_PATH="/usr/local/bin/go-alived" +CONFIG_DIR="/etc/go-alived" +SERVICE_FILE="/etc/systemd/system/go-alived.service" + +if systemctl is-active --quiet go-alived; then + echo "1. Stopping service..." + systemctl stop go-alived + echo " ✓ Service stopped" +fi + +if systemctl is-enabled --quiet go-alived 2>/dev/null; then + echo "2. Disabling service..." + systemctl disable go-alived + echo " ✓ Service disabled" +fi + +if [ -f "${SERVICE_FILE}" ]; then + echo "3. Removing service file..." + rm ${SERVICE_FILE} + systemctl daemon-reload + echo " ✓ Service file removed" +fi + +if [ -f "${BINARY_PATH}" ]; then + echo "4. Removing binary..." + rm ${BINARY_PATH} + echo " ✓ Binary removed" +fi + +echo "" +read -p "Do you want to remove configuration directory ${CONFIG_DIR}? (y/N) " -n 1 -r +echo +if [[ $REPLY =~ ^[Yy]$ ]]; then + if [ -d "${CONFIG_DIR}" ]; then + rm -rf ${CONFIG_DIR} + echo " ✓ Configuration removed" + fi +else + echo " ⚠ Configuration kept at ${CONFIG_DIR}" +fi + +echo "" +echo "=== Uninstallation complete ===" diff --git a/etc/config.example.yaml b/etc/config.example.yaml new file mode 100644 index 0000000..c2ebb7a --- /dev/null +++ b/etc/config.example.yaml @@ -0,0 +1,65 @@ +global: + router_id: "node1" + notification_email: "admin@example.com" + +vrrp_instances: + - name: "VI_1" + interface: "eth0" + state: "BACKUP" + virtual_router_id: 51 + priority: 100 + advert_interval: 1 + auth_type: "PASS" + auth_pass: "secret123" + virtual_ips: + - "192.168.1.100/24" + - "192.168.1.101/24" + notify_master: "/etc/go-alived/scripts/notify_master.sh" + notify_backup: "/etc/go-alived/scripts/notify_backup.sh" + notify_fault: "/etc/go-alived/scripts/notify_fault.sh" + track_scripts: + - "check_nginx" + +health_checkers: + - name: "check_nginx" + type: "tcp" + interval: 3s + timeout: 2s + rise: 3 + fall: 2 + config: + host: "127.0.0.1" + port: 80 + + - name: "check_web" + type: "http" + interval: 5s + timeout: 3s + rise: 2 + fall: 3 + config: + url: "http://127.0.0.1:80/health" + expected_status: 200 + method: "GET" + insecure_skip_verify: false + + - name: "check_ping" + type: "ping" + interval: 2s + timeout: 1s + rise: 2 + fall: 2 + config: + host: "8.8.8.8" + count: 1 + + - name: "check_service" + type: "script" + interval: 10s + timeout: 5s + rise: 1 + fall: 1 + config: + script: "/usr/local/bin/check_service.sh" + args: + - "nginx" diff --git a/etc/config.mini.yaml b/etc/config.mini.yaml new file mode 100644 index 0000000..474eae8 --- /dev/null +++ b/etc/config.mini.yaml @@ -0,0 +1,14 @@ +global: + router_id: "node1" + +vrrp_instances: + - name: "VI_1" + interface: "eth0" + state: "BACKUP" + virtual_router_id: 51 + priority: 100 + advert_interval: 1 + auth_type: "PASS" + auth_pass: "secret123" + virtual_ips: + - "192.168.1.100/24" diff --git a/go.mod b/go.mod new file mode 100644 index 0000000..23af7d4 --- /dev/null +++ b/go.mod @@ -0,0 +1,20 @@ +module github.com/loveuer/go-alived + +go 1.25.0 + +require ( + github.com/inconshreveable/mousetrap v1.1.0 // indirect + github.com/josharian/native v1.0.0 // indirect + github.com/mdlayher/arp v0.0.0-20220512170110-6706a2966875 // indirect + github.com/mdlayher/ethernet v0.0.0-20220221185849-529eae5b6118 // indirect + github.com/mdlayher/packet v1.0.0 // indirect + github.com/mdlayher/socket v0.2.1 // indirect + github.com/spf13/cobra v1.10.2 // indirect + github.com/spf13/pflag v1.0.9 // indirect + github.com/vishvananda/netlink v1.3.1 // indirect + github.com/vishvananda/netns v0.0.5 // indirect + golang.org/x/net v0.47.0 // indirect + golang.org/x/sync v0.0.0-20210220032951-036812b2e83c // indirect + golang.org/x/sys v0.38.0 // indirect + gopkg.in/yaml.v3 v3.0.1 // indirect +) diff --git a/go.sum b/go.sum new file mode 100644 index 0000000..5730267 --- /dev/null +++ b/go.sum @@ -0,0 +1,46 @@ +github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g= +github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= +github.com/google/go-cmp v0.5.7/go.mod h1:n+brtR0CgQNWTVd5ZUFpTBC8YFBDLK/h/bpaJ8/DtOE= +github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8= +github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw= +github.com/josharian/native v1.0.0 h1:Ts/E8zCSEsG17dUqv7joXJFybuMLjQfWE04tsBODTxk= +github.com/josharian/native v1.0.0/go.mod h1:7X/raswPFr05uY3HiLlYeyQntB6OO7E/d2Cu7qoaN2w= +github.com/mdlayher/arp v0.0.0-20220512170110-6706a2966875 h1:ql8x//rJsHMjS+qqEag8n3i4azw1QneKh5PieH9UEbY= +github.com/mdlayher/arp v0.0.0-20220512170110-6706a2966875/go.mod h1:kfOoFJuHWp76v1RgZCb9/gVUc7XdY877S2uVYbNliGc= +github.com/mdlayher/ethernet v0.0.0-20220221185849-529eae5b6118 h1:2oDp6OOhLxQ9JBoUuysVz9UZ9uI6oLUbvAZu0x8o+vE= +github.com/mdlayher/ethernet v0.0.0-20220221185849-529eae5b6118/go.mod h1:ZFUnHIVchZ9lJoWoEGUg8Q3M4U8aNNWA3CVSUTkW4og= +github.com/mdlayher/packet v1.0.0 h1:InhZJbdShQYt6XV2GPj5XHxChzOfhJJOMbvnGAmOfQ8= +github.com/mdlayher/packet v1.0.0/go.mod h1:eE7/ctqDhoiRhQ44ko5JZU2zxB88g+JH/6jmnjzPjOU= +github.com/mdlayher/socket v0.2.1 h1:F2aaOwb53VsBE+ebRS9bLd7yPOfYUMC8lOODdCBDY6w= +github.com/mdlayher/socket v0.2.1/go.mod h1:QLlNPkFR88mRUNQIzRBMfXxwKal8H7u1h3bL1CV+f0E= +github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM= +github.com/spf13/cobra v1.10.2 h1:DMTTonx5m65Ic0GOoRY2c16WCbHxOOw6xxezuLaBpcU= +github.com/spf13/cobra v1.10.2/go.mod h1:7C1pvHqHw5A4vrJfjNwvOdzYu0Gml16OCs2GRiTUUS4= +github.com/spf13/pflag v1.0.9 h1:9exaQaMOCwffKiiiYk6/BndUBv+iRViNW+4lEMi0PvY= +github.com/spf13/pflag v1.0.9/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= +github.com/vishvananda/netlink v1.3.1 h1:3AEMt62VKqz90r0tmNhog0r/PpWKmrEShJU0wJW6bV0= +github.com/vishvananda/netlink v1.3.1/go.mod h1:ARtKouGSTGchR8aMwmkzC0qiNPrrWO5JS/XMVl45+b4= +github.com/vishvananda/netns v0.0.5 h1:DfiHV+j8bA32MFM7bfEunvT8IAqQ/NzSJHtcmW5zdEY= +github.com/vishvananda/netns v0.0.5/go.mod h1:SpkAiCQRtJ6TvvxPnOSyH3BMl6unz3xZlaprSwhNNJM= +go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg= +golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= +golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= +golang.org/x/net v0.0.0-20190603091049-60506f45cf65 h1:+rhAzEzT3f4JtomfC371qB+0Ola2caSKcY69NUBZrRQ= +golang.org/x/net v0.0.0-20190603091049-60506f45cf65/go.mod h1:HSz+uSET+XFnRR8LxR5pz3Of3rY3CfYBVs4xY44aLks= +golang.org/x/net v0.47.0 h1:Mx+4dIFzqraBXUugkia1OOvlD6LemFo1ALMHjrXDOhY= +golang.org/x/net v0.47.0/go.mod h1:/jNxtkgq5yWUGYkaZGqo27cfGZ1c5Nen03aYrrKpVRU= +golang.org/x/sync v0.0.0-20210220032951-036812b2e83c h1:5KslGYwFpkhGh+Q16bwMP3cOontH8FOep7tGV86Y7SQ= +golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= +golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20210927094055-39ccf1dd6fa6/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.0.0-20220209214540-3681064d5158/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.2.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.10.0 h1:SqMFp9UcQJZa+pmYuAKjd9xq1f0j5rLcDIk0mj4qAsA= +golang.org/x/sys v0.10.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.38.0 h1:3yZWxaJjBmCWXqhN1qh02AkOnCQ1poK6oF+a7xWL6Gc= +golang.org/x/sys v0.38.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks= +golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= +golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= +gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/internal/cmd/root.go b/internal/cmd/root.go new file mode 100644 index 0000000..57516a6 --- /dev/null +++ b/internal/cmd/root.go @@ -0,0 +1,25 @@ +package cmd + +import ( + "os" + + "github.com/spf13/cobra" +) + +var rootCmd = &cobra.Command{ + Use: "go-alived", + Short: "Go-Alived - VRRP High Availability Service", + Long: `go-alived is a lightweight, dependency-free VRRP implementation in Go. +It provides high availability for IP addresses with health checking support.`, + Version: "1.0.0", +} + +func Execute() { + if err := rootCmd.Execute(); err != nil { + os.Exit(1) + } +} + +func init() { + rootCmd.CompletionOptions.DisableDefaultCmd = true +} \ No newline at end of file diff --git a/internal/cmd/run.go b/internal/cmd/run.go new file mode 100644 index 0000000..162cc70 --- /dev/null +++ b/internal/cmd/run.go @@ -0,0 +1,133 @@ +package cmd + +import ( + "os" + "os/signal" + "syscall" + + "github.com/loveuer/go-alived/internal/health" + "github.com/loveuer/go-alived/internal/vrrp" + "github.com/loveuer/go-alived/pkg/config" + "github.com/loveuer/go-alived/pkg/logger" + "github.com/spf13/cobra" +) + +var ( + configFile string + debug bool +) + +var runCmd = &cobra.Command{ + Use: "run", + Short: "Run the VRRP service", + Long: `Start the go-alived VRRP service with health checking.`, + Run: runService, +} + +func init() { + rootCmd.AddCommand(runCmd) + + runCmd.Flags().StringVarP(&configFile, "config", "c", "/etc/go-alived/config.yaml", "path to configuration file") + runCmd.Flags().BoolVarP(&debug, "debug", "d", false, "enable debug mode") +} + +func runService(cmd *cobra.Command, args []string) { + log := logger.New(debug) + + log.Info("starting go-alived...") + log.Info("loading configuration from: %s", configFile) + + cfg, err := config.Load(configFile) + if err != nil { + log.Error("failed to load configuration: %v", err) + os.Exit(1) + } + + log.Info("configuration loaded successfully") + log.Debug("config: %+v", cfg) + + healthMgr, err := health.LoadFromConfig(cfg, log) + if err != nil { + log.Error("failed to load health check configuration: %v", err) + os.Exit(1) + } + + vrrpMgr := vrrp.NewManager(log) + if err := vrrpMgr.LoadFromConfig(cfg); err != nil { + log.Error("failed to load VRRP configuration: %v", err) + os.Exit(1) + } + + setupHealthTracking(vrrpMgr, healthMgr, log) + + healthMgr.StartAll() + + if err := vrrpMgr.StartAll(); err != nil { + log.Error("failed to start VRRP instances: %v", err) + os.Exit(1) + } + + sigChan := make(chan os.Signal, 1) + signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM, syscall.SIGHUP) + + for { + sig := <-sigChan + switch sig { + case syscall.SIGHUP: + log.Info("received SIGHUP, reloading configuration...") + newCfg, err := config.Load(configFile) + if err != nil { + log.Error("failed to reload configuration: %v", err) + continue + } + if err := vrrpMgr.Reload(newCfg); err != nil { + log.Error("failed to reload VRRP: %v", err) + continue + } + cfg = newCfg + log.Info("configuration reloaded successfully") + case syscall.SIGINT, syscall.SIGTERM: + log.Info("received signal %v, shutting down...", sig) + cleanup(log, vrrpMgr, healthMgr) + os.Exit(0) + } + } +} + +func cleanup(log *logger.Logger, vrrpMgr *vrrp.Manager, healthMgr *health.Manager) { + log.Info("cleaning up resources...") + healthMgr.StopAll() + vrrpMgr.StopAll() +} + +func setupHealthTracking(vrrpMgr *vrrp.Manager, healthMgr *health.Manager, log *logger.Logger) { + instances := vrrpMgr.GetAllInstances() + + for _, inst := range instances { + for _, trackScript := range inst.TrackScripts { + monitor, ok := healthMgr.GetMonitor(trackScript) + if !ok { + log.Warn("[%s] track_script '%s' not found in health checkers", inst.Name, trackScript) + continue + } + + instanceName := inst.Name + monitor.OnStateChange(func(checkerName string, oldHealthy, newHealthy bool) { + vrrpInst, ok := vrrpMgr.GetInstance(instanceName) + if !ok { + return + } + + if newHealthy && !oldHealthy { + log.Info("[%s] health check '%s' recovered, resetting priority", instanceName, checkerName) + vrrpInst.ResetPriority() + } else if !newHealthy && oldHealthy { + log.Warn("[%s] health check '%s' failed, decreasing priority", instanceName, checkerName) + vrrpInst.AdjustPriority(-10) + } + }) + + log.Info("[%s] tracking health check: %s", inst.Name, trackScript) + } + } +} diff --git a/internal/cmd/test.go b/internal/cmd/test.go new file mode 100644 index 0000000..0aecdba --- /dev/null +++ b/internal/cmd/test.go @@ -0,0 +1,470 @@ +package cmd + +import ( + "fmt" + "net" + "os" + "os/exec" + "strings" + "time" + + "github.com/loveuer/go-alived/pkg/logger" + "github.com/loveuer/go-alived/pkg/netif" + "github.com/spf13/cobra" +) + +type TestResult struct { + Name string + Pass bool + Message string + Fatal bool +} + +type EnvironmentTest struct { + log *logger.Logger + results []TestResult + errors int + warns int +} + +func NewEnvironmentTest(log *logger.Logger) *EnvironmentTest { + return &EnvironmentTest{ + log: log, + results: make([]TestResult, 0), + } +} + +func (t *EnvironmentTest) AddResult(name string, pass bool, message string, fatal bool) { + t.results = append(t.results, TestResult{ + Name: name, + Pass: pass, + Message: message, + Fatal: fatal, + }) + + if !pass { + if fatal { + t.errors++ + } else { + t.warns++ + } + } +} + +func (t *EnvironmentTest) TestRootPermission() { + t.log.Info("检查运行权限...") + + if os.Geteuid() != 0 { + t.AddResult("Root权限", false, "需要root权限运行,请使用sudo", true) + } else { + t.AddResult("Root权限", true, "以root用户运行", false) + } +} + +func (t *EnvironmentTest) TestNetworkInterface(ifaceName string) string { + t.log.Info("检查网络接口...") + + if ifaceName == "" { + interfaces, err := net.Interfaces() + if err != nil { + t.AddResult("网络接口", false, "无法获取网络接口列表", true) + return "" + } + + for _, iface := range interfaces { + if iface.Flags&net.FlagUp != 0 && iface.Flags&net.FlagLoopback == 0 { + addrs, err := iface.Addrs() + if err == nil && len(addrs) > 0 { + for _, addr := range addrs { + if ipnet, ok := addr.(*net.IPNet); ok && ipnet.IP.To4() != nil { + ifaceName = iface.Name + t.log.Info("自动选择网卡: %s", ifaceName) + break + } + } + } + if ifaceName != "" { + break + } + } + } + + if ifaceName == "" { + t.AddResult("网络接口", false, "未找到可用的网络接口", true) + return "" + } + } + + iface, err := netif.GetInterface(ifaceName) + if err != nil { + t.AddResult("网络接口", false, fmt.Sprintf("网卡 %s 不存在", ifaceName), true) + return "" + } + + if !iface.IsUp() { + t.AddResult("网络接口状态", false, fmt.Sprintf("网卡 %s 未启动", ifaceName), true) + return "" + } + + t.AddResult("网络接口", true, fmt.Sprintf("网卡 %s 存在且已启动", ifaceName), false) + return ifaceName +} + +func (t *EnvironmentTest) TestVIPOperations(ifaceName, testVIP string) { + t.log.Info("测试VIP添加/删除功能...") + + if ifaceName == "" || testVIP == "" { + t.AddResult("VIP操作", false, "网卡名或测试VIP为空", true) + return + } + + iface, err := netif.GetInterface(ifaceName) + if err != nil { + t.AddResult("VIP操作", false, fmt.Sprintf("获取网卡失败: %v", err), true) + return + } + + if !strings.Contains(testVIP, "/") { + testVIP = testVIP + "/32" + } + + exists, _ := iface.HasIP(testVIP) + if exists { + t.AddResult("VIP操作", false, fmt.Sprintf("VIP %s 已存在,请使用其他IP测试", testVIP), true) + return + } + + err = iface.AddIP(testVIP) + if err != nil { + t.AddResult("VIP添加", false, fmt.Sprintf("VIP添加失败: %v", err), true) + return + } + + t.AddResult("VIP添加", true, fmt.Sprintf("成功添加VIP %s", testVIP), false) + + time.Sleep(100 * time.Millisecond) + + exists, _ = iface.HasIP(testVIP) + if !exists { + t.AddResult("VIP验证", false, "VIP添加后无法在网卡上找到", true) + iface.DeleteIP(testVIP) + return + } + + t.AddResult("VIP验证", true, "VIP已成功添加到网卡", false) + + vipAddr := strings.Split(testVIP, "/")[0] + cmd := exec.Command("ping", "-c", "1", "-W", "1", vipAddr) + err = cmd.Run() + if err != nil { + t.AddResult("VIP可达性", false, "VIP ping失败(可能需要路由配置)", false) + } else { + t.AddResult("VIP可达性", true, "VIP可以ping通", false) + } + + err = iface.DeleteIP(testVIP) + if err != nil { + t.AddResult("VIP删除", false, fmt.Sprintf("VIP删除失败: %v", err), false) + } else { + t.AddResult("VIP删除", true, "VIP删除成功", false) + } +} + +func (t *EnvironmentTest) TestMulticast(ifaceName string) { + t.log.Info("检查组播支持...") + + if ifaceName == "" { + t.AddResult("组播支持", false, "网卡名为空,跳过检查", false) + return + } + + cmd := exec.Command("ip", "maddr", "show", ifaceName) + output, err := cmd.CombinedOutput() + if err != nil { + t.AddResult("组播支持", false, "无法查询组播配置", false) + return + } + + if len(output) > 0 { + t.AddResult("组播支持", true, "网卡支持组播", false) + } else { + t.AddResult("组播支持", false, "网卡组播支持未知", false) + } +} + +func (t *EnvironmentTest) TestFirewall() { + t.log.Info("检查防火墙设置...") + + cmd := exec.Command("iptables", "-L", "INPUT", "-n") + output, err := cmd.CombinedOutput() + if err != nil { + t.AddResult("防火墙检查", false, "无法查询iptables规则(可能未安装)", false) + return + } + + if strings.Contains(string(output), "112") || strings.Contains(string(output), "vrrp") { + t.AddResult("防火墙VRRP", true, "防火墙已配置VRRP规则", false) + } else { + t.AddResult("防火墙VRRP", false, "防火墙未配置VRRP规则,建议添加: iptables -A INPUT -p 112 -j ACCEPT", false) + } + + cmd = exec.Command("systemctl", "is-active", "firewalld") + err = cmd.Run() + if err == nil { + cmd = exec.Command("firewall-cmd", "--list-protocols") + output, err = cmd.CombinedOutput() + if err == nil { + if strings.Contains(string(output), "vrrp") { + t.AddResult("Firewalld VRRP", true, "firewalld已允许VRRP协议", false) + } else { + t.AddResult("Firewalld VRRP", false, "firewalld未配置VRRP,建议: firewall-cmd --permanent --add-protocol=vrrp", false) + } + } + } +} + +func (t *EnvironmentTest) TestKernelParameters() { + t.log.Info("检查内核参数...") + + params := map[string]string{ + "/proc/sys/net/ipv4/ip_forward": "1", + "/proc/sys/net/ipv4/conf/all/arp_ignore": "0", + "/proc/sys/net/ipv4/conf/all/arp_announce": "0", + } + + for path, expected := range params { + data, err := os.ReadFile(path) + if err != nil { + continue + } + + value := strings.TrimSpace(string(data)) + name := strings.TrimPrefix(path, "/proc/sys/net/ipv4/") + + if value == expected { + t.AddResult(name, true, fmt.Sprintf("%s = %s (正常)", name, value), false) + } else { + if name == "ip_forward" && value != "1" { + t.AddResult(name, false, fmt.Sprintf("%s = %s (建议设置为1)", name, value), false) + } + } + } +} + +func (t *EnvironmentTest) TestConflictingServices() { + t.log.Info("检查冲突服务...") + + services := []string{"keepalived"} + hasConflict := false + + for _, service := range services { + cmd := exec.Command("systemctl", "is-active", service) + err := cmd.Run() + if err == nil { + t.AddResult("服务冲突", false, fmt.Sprintf("发现运行中的%s服务,可能冲突", service), false) + hasConflict = true + } + } + + cmd := exec.Command("pgrep", "-x", "keepalived") + err := cmd.Run() + if err == nil { + t.AddResult("进程冲突", false, "发现运行中的keepalived进程", false) + hasConflict = true + } + + if !hasConflict { + t.AddResult("服务冲突", true, "未发现冲突的服务", false) + } +} + +func (t *EnvironmentTest) TestVirtualization() { + t.log.Info("检查虚拟化环境...") + + productFile := "/sys/class/dmi/id/product_name" + data, err := os.ReadFile(productFile) + if err != nil { + cmd := exec.Command("systemd-detect-virt") + output, err := cmd.CombinedOutput() + if err == nil { + virt := strings.TrimSpace(string(output)) + if virt != "none" { + t.AddResult("虚拟化", true, fmt.Sprintf("检测到虚拟化环境: %s", virt), false) + t.log.Warn("虚拟化环境可能需要特殊配置(如启用混杂模式)") + } else { + t.AddResult("虚拟化", true, "物理机环境", false) + } + } + return + } + + product := strings.TrimSpace(string(data)) + switch { + case strings.Contains(product, "VMware"): + t.AddResult("虚拟化", true, "VMware虚拟机(需要启用混杂模式)", false) + t.log.Warn("VMware需要配置: 虚拟机设置 -> 网络适配器 -> 高级 -> 混杂模式: 允许全部") + case strings.Contains(product, "VirtualBox"): + t.AddResult("虚拟化", true, "VirtualBox虚拟机(需要桥接模式+混杂模式)", false) + t.log.Warn("VirtualBox需要配置: 网络 -> 桥接网卡 -> 高级 -> 混杂模式: 全部允许") + case strings.Contains(product, "KVM") || strings.Contains(product, "QEMU"): + t.AddResult("虚拟化", true, "KVM/QEMU虚拟机(通常支持良好)", false) + case strings.Contains(product, "Amazon") || strings.Contains(product, "EC2"): + t.AddResult("虚拟化", false, "AWS EC2环境 - 不支持VRRP", true) + t.log.Error("AWS不支持组播协议,无法运行VRRP,请使用Elastic IP或负载均衡") + default: + t.AddResult("虚拟化", true, fmt.Sprintf("环境: %s", product), false) + } +} + +func (t *EnvironmentTest) TestCloudEnvironment() { + t.log.Info("检查云环境...") + + cloudTests := []struct { + name string + url string + headers map[string]string + isFatal bool + solution string + }{ + { + name: "AWS", + url: "http://169.254.169.254/latest/meta-data/instance-id", + solution: "AWS不支持VRRP,请使用: Elastic IP、ALB或NLB", + isFatal: true, + }, + { + name: "阿里云", + url: "http://100.100.100.200/latest/meta-data/instance-id", + solution: "阿里云ECS不支持VRRP,请使用: 负载均衡SLB或高可用虚拟IP(HaVip)", + isFatal: true, + }, + { + name: "Azure", + url: "http://169.254.169.254/metadata/instance?api-version=2021-02-01", + headers: map[string]string{"Metadata": "true"}, + solution: "Azure建议使用: Azure Load Balancer或Traffic Manager", + isFatal: false, + }, + { + name: "Google Cloud", + url: "http://metadata.google.internal/computeMetadata/v1/instance/id", + headers: map[string]string{"Metadata-Flavor": "Google"}, + solution: "GCP建议使用: Cloud Load Balancing", + isFatal: false, + }, + } + + cloudDetected := false + for _, test := range cloudTests { + cmd := exec.Command("curl", "-s", "-m", "1", test.url) + if len(test.headers) > 0 { + for k, v := range test.headers { + cmd.Args = append(cmd.Args, "-H", fmt.Sprintf("%s: %s", k, v)) + } + } + + err := cmd.Run() + if err == nil { + cloudDetected = true + t.AddResult("云环境", !test.isFatal, fmt.Sprintf("检测到%s环境", test.name), test.isFatal) + t.log.Warn(test.solution) + } + } + + if !cloudDetected { + t.AddResult("云环境", true, "未检测到公有云环境限制", false) + } +} + +func (t *EnvironmentTest) PrintResults() { + fmt.Println() + fmt.Println("=== 测试结果 ===") + fmt.Println() + + for _, result := range t.results { + status := "✓" + if !result.Pass { + if result.Fatal { + status = "✗" + } else { + status = "⚠" + } + } + + fmt.Printf("%s %-20s %s\n", status, result.Name, result.Message) + } + + fmt.Println() + fmt.Println("=== 总结 ===") + fmt.Println() + + if t.errors == 0 && t.warns == 0 { + fmt.Println("✓ 环境完全支持 go-alived") + fmt.Println(" 可以正常使用所有功能") + } else if t.errors == 0 { + fmt.Printf("⚠ 环境基本支持,但有 %d 个警告\n", t.warns) + fmt.Println(" 建议修复警告项以获得更好的稳定性") + } else { + fmt.Printf("✗ 发现 %d 个错误, %d 个警告\n", t.errors, t.warns) + fmt.Println(" 请修复错误后再使用 go-alived") + } + + fmt.Println() +} + +func (t *EnvironmentTest) HasErrors() bool { + return t.errors > 0 +} + +var ( + testIface string + testVIP string +) + +var testCmd = &cobra.Command{ + Use: "test", + Short: "Test environment for VRRP support", + Long: `Test the current environment to verify if it supports VRRP functionality. +This includes checking permissions, network interfaces, VIP operations, multicast support, and more.`, + Run: runTest, +} + +func init() { + rootCmd.AddCommand(testCmd) + + testCmd.Flags().StringVarP(&testIface, "interface", "i", "", "network interface to test (auto-detect if not specified)") + testCmd.Flags().StringVarP(&testVIP, "vip", "v", "", "test VIP address (e.g., 192.168.1.100/24)") +} + +func runTest(cmd *cobra.Command, args []string) { + log := logger.New(false) + + fmt.Println("=== go-alived 环境测试 ===") + fmt.Println() + + test := NewEnvironmentTest(log) + + test.TestRootPermission() + + selectedIface := test.TestNetworkInterface(testIface) + + if selectedIface != "" && testVIP != "" { + test.TestVIPOperations(selectedIface, testVIP) + } + + if selectedIface != "" { + test.TestMulticast(selectedIface) + } + + test.TestFirewall() + test.TestKernelParameters() + test.TestConflictingServices() + test.TestVirtualization() + test.TestCloudEnvironment() + + test.PrintResults() + + if test.HasErrors() { + os.Exit(1) + } +} \ No newline at end of file diff --git a/internal/health/checker.go b/internal/health/checker.go new file mode 100644 index 0000000..9d1504f --- /dev/null +++ b/internal/health/checker.go @@ -0,0 +1,89 @@ +package health + +import ( + "context" + "time" +) + +type CheckResult int + +const ( + CheckResultUnknown CheckResult = iota + CheckResultSuccess + CheckResultFailure +) + +func (r CheckResult) String() string { + switch r { + case CheckResultSuccess: + return "SUCCESS" + case CheckResultFailure: + return "FAILURE" + default: + return "UNKNOWN" + } +} + +type Checker interface { + Check(ctx context.Context) CheckResult + Name() string + Type() string +} + +type CheckerConfig struct { + Name string + Type string + Interval time.Duration + Timeout time.Duration + Rise int + Fall int + Config map[string]interface{} +} + +type CheckerState struct { + Name string + Healthy bool + LastResult CheckResult + LastCheckTime time.Time + SuccessCount int + FailureCount int + TotalChecks int + ConsecutiveOK int + ConsecutiveFail int +} + +func (s *CheckerState) IsHealthy() bool { + return s.Healthy +} + +func (s *CheckerState) Update(result CheckResult, rise, fall int) bool { + s.LastResult = result + s.LastCheckTime = time.Now() + s.TotalChecks++ + + oldHealthy := s.Healthy + + switch result { + case CheckResultSuccess: + s.SuccessCount++ + s.ConsecutiveOK++ + s.ConsecutiveFail = 0 + + if !s.Healthy && s.ConsecutiveOK >= rise { + s.Healthy = true + } + + case CheckResultFailure: + s.FailureCount++ + s.ConsecutiveFail++ + s.ConsecutiveOK = 0 + + if s.Healthy && s.ConsecutiveFail >= fall { + s.Healthy = false + } + } + + return s.Healthy != oldHealthy +} + +type StateChangeCallback func(name string, oldHealthy, newHealthy bool) diff --git a/internal/health/factory.go b/internal/health/factory.go new file mode 100644 index 0000000..d7b69ac --- /dev/null +++ b/internal/health/factory.go @@ -0,0 +1,56 @@ +package health + +import ( + "fmt" + + "github.com/loveuer/go-alived/pkg/config" + "github.com/loveuer/go-alived/pkg/logger" +) + +func CreateChecker(cfg *config.HealthChecker) (Checker, error) { + configMap, ok := cfg.Config.(map[string]interface{}) + if !ok { + return nil, fmt.Errorf("invalid config for checker %s", cfg.Name) + } + + switch cfg.Type { + case "tcp": + return NewTCPChecker(cfg.Name, configMap) + case "http", "https": + return NewHTTPChecker(cfg.Name, configMap) + case "ping", "icmp": + return NewPingChecker(cfg.Name, configMap) + case "script": + return NewScriptChecker(cfg.Name, configMap) + default: + return nil, fmt.Errorf("unsupported checker type: %s", cfg.Type) + } +} + +func LoadFromConfig(cfg *config.Config, log *logger.Logger) (*Manager, error) { + manager := NewManager(log) + + for _, healthCfg := range cfg.Health { + checker, err := CreateChecker(&healthCfg) + if err != nil { + return nil, fmt.Errorf("failed to create checker %s: %w", healthCfg.Name, err) + } + + monitorCfg := &CheckerConfig{ + Name: healthCfg.Name, + Type: healthCfg.Type, + Interval: healthCfg.Interval, + Timeout: healthCfg.Timeout, + Rise: healthCfg.Rise, + Fall: healthCfg.Fall, + Config: healthCfg.Config.(map[string]interface{}), + } + + monitor := NewMonitor(checker, monitorCfg, log) + manager.AddMonitor(monitor) + + log.Info("loaded health checker: %s (type=%s)", healthCfg.Name, healthCfg.Type) + } + + return manager, nil +} diff --git a/internal/health/http.go b/internal/health/http.go new file mode 100644 index 0000000..d843cf6 --- /dev/null +++ b/internal/health/http.go @@ -0,0 +1,90 @@ +package health + +import ( + "context" + "crypto/tls" + "fmt" + "net/http" + "time" +) + +type HTTPChecker struct { + name string + url string + method string + expectedStatus int + client *http.Client +} + +func NewHTTPChecker(name string, config map[string]interface{}) (*HTTPChecker, error) { + url, ok := config["url"].(string) + if !ok { + return nil, fmt.Errorf("http checker: missing or invalid 'url' field") + } + + method := "GET" + if m, ok := config["method"].(string); ok { + method = m + } + + expectedStatus := 200 + if status, ok := config["expected_status"]; ok { + switch v := status.(type) { + case int: + expectedStatus = v + case float64: + expectedStatus = int(v) + } + } + + insecureSkipVerify := false + if skip, ok := config["insecure_skip_verify"].(bool); ok { + insecureSkipVerify = skip + } + + transport := &http.Transport{ + TLSClientConfig: &tls.Config{ + InsecureSkipVerify: insecureSkipVerify, + }, + } + + client := &http.Client{ + Transport: transport, + Timeout: 30 * time.Second, + } + + return &HTTPChecker{ + name: name, + url: url, + method: method, + expectedStatus: expectedStatus, + client: client, + }, nil +} + +func (c *HTTPChecker) Name() string { + return c.name +} + +func (c *HTTPChecker) Type() string { + return "http" +} + +func (c *HTTPChecker) Check(ctx context.Context) CheckResult { + req, err := http.NewRequestWithContext(ctx, c.method, c.url, nil) + if err != nil { + return CheckResultFailure + } + + resp, err := c.client.Do(req) + if err != nil { + return CheckResultFailure + } + defer resp.Body.Close() + + if resp.StatusCode == c.expectedStatus { + return CheckResultSuccess + } + + return CheckResultFailure +} diff --git a/internal/health/monitor.go b/internal/health/monitor.go new file mode 100644 index 0000000..331ddca --- /dev/null +++ b/internal/health/monitor.go @@ -0,0 +1,192 @@ +package health + +import ( + "context" + "sync" + "time" + + "github.com/loveuer/go-alived/pkg/logger" +) + +type Monitor struct { + checker Checker + config *CheckerConfig + state *CheckerState + log *logger.Logger + callbacks []StateChangeCallback + + running bool + stopCh chan struct{} + wg sync.WaitGroup + mu sync.RWMutex +} + +func NewMonitor(checker Checker, config *CheckerConfig, log *logger.Logger) *Monitor { + return &Monitor{ + checker: checker, + config: config, + state: &CheckerState{ + Name: config.Name, + Healthy: false, + }, + log: log, + callbacks: make([]StateChangeCallback, 0), + stopCh: make(chan struct{}), + } +} + +func (m *Monitor) Start() { + m.mu.Lock() + if m.running { + m.mu.Unlock() + return + } + m.running = true + m.mu.Unlock() + + m.log.Info("[HealthCheck:%s] starting health check monitor (interval=%s, timeout=%s)", + m.config.Name, m.config.Interval, m.config.Timeout) + + m.wg.Add(1) + go m.checkLoop() +} + +func (m *Monitor) Stop() { + m.mu.Lock() + if !m.running { + m.mu.Unlock() + return + } + m.running = false + m.mu.Unlock() + + m.log.Info("[HealthCheck:%s] stopping health check monitor", m.config.Name) + close(m.stopCh) + m.wg.Wait() +} + +func (m *Monitor) checkLoop() { + defer m.wg.Done() + + ticker := time.NewTicker(m.config.Interval) + defer ticker.Stop() + + m.performCheck() + + for { + select { + case <-m.stopCh: + return + case <-ticker.C: + m.performCheck() + } + } +} + +func (m *Monitor) performCheck() { + ctx, cancel := context.WithTimeout(context.Background(), m.config.Timeout) + defer cancel() + + startTime := time.Now() + result := m.checker.Check(ctx) + duration := time.Since(startTime) + + m.mu.Lock() + oldHealthy := m.state.Healthy + stateChanged := m.state.Update(result, m.config.Rise, m.config.Fall) + newHealthy := m.state.Healthy + callbacks := m.callbacks + m.mu.Unlock() + + m.log.Debug("[HealthCheck:%s] check completed: result=%s, duration=%s, healthy=%v", + m.config.Name, result, duration, newHealthy) + + if stateChanged { + m.log.Info("[HealthCheck:%s] health state changed: %v -> %v (consecutive_ok=%d, consecutive_fail=%d)", + m.config.Name, oldHealthy, newHealthy, m.state.ConsecutiveOK, m.state.ConsecutiveFail) + + for _, callback := range callbacks { + callback(m.config.Name, oldHealthy, newHealthy) + } + } +} + +func (m *Monitor) OnStateChange(callback StateChangeCallback) { + m.mu.Lock() + defer m.mu.Unlock() + m.callbacks = append(m.callbacks, callback) +} + +func (m *Monitor) GetState() *CheckerState { + m.mu.RLock() + defer m.mu.RUnlock() + + stateCopy := *m.state + return &stateCopy +} + +func (m *Monitor) IsHealthy() bool { + m.mu.RLock() + defer m.mu.RUnlock() + return m.state.Healthy +} + +type Manager struct { + monitors map[string]*Monitor + mu sync.RWMutex + log *logger.Logger +} + +func NewManager(log *logger.Logger) *Manager { + return &Manager{ + monitors: make(map[string]*Monitor), + log: log, + } +} + +func (m *Manager) AddMonitor(monitor *Monitor) { + m.mu.Lock() + defer m.mu.Unlock() + m.monitors[monitor.config.Name] = monitor +} + +func (m *Manager) GetMonitor(name string) (*Monitor, bool) { + m.mu.RLock() + defer m.mu.RUnlock() + monitor, ok := m.monitors[name] + return monitor, ok +} + +func (m *Manager) StartAll() { + m.mu.RLock() + defer m.mu.RUnlock() + + for _, monitor := range m.monitors { + monitor.Start() + } + + m.log.Info("started %d health check monitor(s)", len(m.monitors)) +} + +func (m *Manager) StopAll() { + m.mu.RLock() + defer m.mu.RUnlock() + + for _, monitor := range m.monitors { + monitor.Stop() + } + + m.log.Info("stopped all health check monitors") +} + +func (m *Manager) GetAllStates() map[string]*CheckerState { + m.mu.RLock() + defer m.mu.RUnlock() + + states := make(map[string]*CheckerState) + for name, monitor := range m.monitors { + states[name] = monitor.GetState() + } + + return states +} diff --git a/internal/health/ping.go b/internal/health/ping.go new file mode 100644 index 0000000..effed23 --- /dev/null +++ b/internal/health/ping.go @@ -0,0 +1,129 @@ +package health + +import ( + "context" + "fmt" + "net" + "time" + + "golang.org/x/net/icmp" + "golang.org/x/net/ipv4" +) + +type PingChecker struct { + name string + host string + count int + timeout time.Duration +} + +func NewPingChecker(name string, config map[string]interface{}) (*PingChecker, error) { + host, ok := config["host"].(string) + if !ok { + return nil, fmt.Errorf("ping checker: missing or invalid 'host' field") + } + + count := 1 + if c, ok := config["count"]; ok { + switch v := c.(type) { + case int: + count = v + case float64: + count = int(v) + } + } + + timeout := 2 * time.Second + if t, ok := config["timeout"].(string); ok { + if d, err := time.ParseDuration(t); err == nil { + timeout = d + } + } + + return &PingChecker{ + name: name, + host: host, + count: count, + timeout: timeout, + }, nil +} + +func (c *PingChecker) Name() string { + return c.name +} + +func (c *PingChecker) Type() string { + return "ping" +} + +func (c *PingChecker) Check(ctx context.Context) CheckResult { + addr, err := net.ResolveIPAddr("ip4", c.host) + if err != nil { + return CheckResultFailure + } + + conn, err := icmp.ListenPacket("ip4:icmp", "0.0.0.0") + if err != nil { + return CheckResultFailure + } + defer conn.Close() + + successCount := 0 + for i := 0; i < c.count; i++ { + select { + case <-ctx.Done(): + return CheckResultFailure + default: + } + + if c.sendPing(conn, addr) { + successCount++ + } + } + + if successCount > 0 { + return CheckResultSuccess + } + + return CheckResultFailure +} + +func (c *PingChecker) sendPing(conn *icmp.PacketConn, addr *net.IPAddr) bool { + msg := icmp.Message{ + Type: ipv4.ICMPTypeEcho, + Code: 0, + Body: &icmp.Echo{ + ID: 1234, + Seq: 1, + Data: []byte("go-alived-ping"), + }, + } + + msgBytes, err := msg.Marshal(nil) + if err != nil { + return false + } + + if _, err := conn.WriteTo(msgBytes, addr); err != nil { + return false + } + + conn.SetReadDeadline(time.Now().Add(c.timeout)) + + reply := make([]byte, 1500) + n, _, err := conn.ReadFrom(reply) + if err != nil { + return false + } + + parsedMsg, err := icmp.ParseMessage(ipv4.ICMPTypeEchoReply.Protocol(), reply[:n]) + if err != nil { + return false + } + + if parsedMsg.Type == ipv4.ICMPTypeEchoReply { + return true + } + + return false +} diff --git a/internal/health/script.go b/internal/health/script.go new file mode 100644 index 0000000..7a87583 --- /dev/null +++ b/internal/health/script.go @@ -0,0 +1,73 @@ +package health + +import ( + "context" + "fmt" + "os/exec" + "time" +) + +type ScriptChecker struct { + name string + script string + args []string + timeout time.Duration +} + +func NewScriptChecker(name string, config map[string]interface{}) (*ScriptChecker, error) { + script, ok := config["script"].(string) + if !ok { + return nil, fmt.Errorf("script checker: missing or invalid 'script' field") + } + + var args []string + if argsInterface, ok := config["args"].([]interface{}); ok { + args = make([]string, len(argsInterface)) + for i, arg := range argsInterface { + if argStr, ok := arg.(string); ok { + args[i] = argStr + } + } + } + + timeout := 10 * time.Second + if t, ok := config["timeout"].(string); ok { + if d, err := time.ParseDuration(t); err == nil { + timeout = d + } + } + + return &ScriptChecker{ + name: name, + script: script, + args: args, + timeout: timeout, + }, nil +} + +func (c *ScriptChecker) Name() string { + return c.name +} + +func (c *ScriptChecker) Type() string { + return "script" +} + +func (c *ScriptChecker) Check(ctx context.Context) CheckResult { + cmdCtx, cancel := context.WithTimeout(ctx, c.timeout) + defer cancel() + + cmd := exec.CommandContext(cmdCtx, c.script, c.args...) + + err := cmd.Run() + if err != nil { + if exitErr, ok := err.(*exec.ExitError); ok { + if exitErr.ExitCode() != 0 { + return CheckResultFailure + } + } + return CheckResultFailure + } + + return CheckResultSuccess +} diff --git a/internal/health/tcp.go b/internal/health/tcp.go new file mode 100644 index 0000000..26f2696 --- /dev/null +++ b/internal/health/tcp.go @@ -0,0 +1,61 @@ +package health + +import ( + "context" + "fmt" + "net" +) + +type TCPChecker struct { + name string + host string + port int +} + +func NewTCPChecker(name string, config map[string]interface{}) (*TCPChecker, error) { + host, ok := config["host"].(string) + if !ok { + return nil, fmt.Errorf("tcp checker: missing or invalid 'host' field") + } + + var port int + switch v := config["port"].(type) { + case int: + port = v + case float64: + port = int(v) + default: + return nil, fmt.Errorf("tcp checker: missing or invalid 'port' field") + } + + if port < 1 || port > 65535 { + return nil, fmt.Errorf("tcp checker: invalid port number: %d", port) + } + + return &TCPChecker{ + name: name, + host: host, + port: port, + }, nil +} + +func (c *TCPChecker) Name() string { + return c.name +} + +func (c *TCPChecker) Type() string { + return "tcp" +} + +func (c *TCPChecker) Check(ctx context.Context) CheckResult { + addr := fmt.Sprintf("%s:%d", c.host, c.port) + + var dialer net.Dialer + conn, err := dialer.DialContext(ctx, "tcp", addr) + if err != nil { + return CheckResultFailure + } + + conn.Close() + return CheckResultSuccess +} \ No newline at end of file diff --git a/internal/vrrp/arp.go b/internal/vrrp/arp.go new file mode 100644 index 0000000..3edb25b --- /dev/null +++ b/internal/vrrp/arp.go @@ -0,0 +1,72 @@ +package vrrp + +import ( + "fmt" + "net" + "net/netip" + + "github.com/mdlayher/arp" +) + +type ARPSender struct { + client *arp.Client + iface *net.Interface +} + +func NewARPSender(ifaceName string) (*ARPSender, error) { + iface, err := net.InterfaceByName(ifaceName) + if err != nil { + return nil, fmt.Errorf("failed to get interface %s: %w", ifaceName, err) + } + + client, err := arp.Dial(iface) + if err != nil { + return nil, fmt.Errorf("failed to create ARP client: %w", err) + } + + return &ARPSender{ + client: client, + iface: iface, + }, nil +} + +func (a *ARPSender) SendGratuitousARP(ip net.IP) error { + if ip4 := ip.To4(); ip4 == nil { + return fmt.Errorf("invalid IPv4 address: %s", ip) + } + + addr, err := netip.ParseAddr(ip.String()) + if err != nil { + return fmt.Errorf("failed to parse IP: %w", err) + } + + pkt, err := arp.NewPacket( + arp.OperationRequest, + a.iface.HardwareAddr, + addr, + net.HardwareAddr{0xff, 0xff, 0xff, 0xff, 0xff, 0xff}, + addr, + ) + if err != nil { + return fmt.Errorf("failed to create ARP packet: %w", err) + } + + if err := a.client.WriteTo(pkt, net.HardwareAddr{0xff, 0xff, 0xff, 0xff, 0xff, 0xff}); err != nil { + return fmt.Errorf("failed to send gratuitous ARP: %w", err) + } + + return nil +} + +func (a *ARPSender) SendGratuitousARPForIPs(ips []net.IP) error { + for _, ip := range ips { + if err := a.SendGratuitousARP(ip); err != nil { + return err + } + } + return nil +} + +func (a *ARPSender) Close() error { + return a.client.Close() +} diff --git a/internal/vrrp/instance.go b/internal/vrrp/instance.go new file mode 100644 index 0000000..eb90a4f --- /dev/null +++ b/internal/vrrp/instance.go @@ -0,0 +1,427 @@ +package vrrp + +import ( + "fmt" + "net" + "sync" + "time" + + "github.com/loveuer/go-alived/pkg/logger" + "github.com/loveuer/go-alived/pkg/netif" +) + +type Instance struct { + Name string + VirtualRouterID uint8 + Priority uint8 + AdvertInterval uint8 + Interface string + VirtualIPs []net.IP + AuthType uint8 + AuthPass string + TrackScripts []string + + state *StateMachine + priorityCalc *PriorityCalculator + history *StateHistory + socket *Socket + arpSender *ARPSender + netInterface *netif.Interface + + advertTimer *Timer + masterDownTimer *Timer + + running bool + stopCh chan struct{} + wg sync.WaitGroup + mu sync.RWMutex + + log *logger.Logger + + onMaster func() + onBackup func() + onFault func() +} + +func NewInstance( + name string, + vrID uint8, + priority uint8, + advertInt uint8, + iface string, + vips []string, + authType string, + authPass string, + trackScripts []string, + log *logger.Logger, +) (*Instance, error) { + if vrID < 1 || vrID > 255 { + return nil, fmt.Errorf("invalid virtual router ID: %d", vrID) + } + + if priority < 1 || priority > 255 { + return nil, fmt.Errorf("invalid priority: %d", priority) + } + + virtualIPs := make([]net.IP, 0, len(vips)) + for _, vip := range vips { + ip, _, err := net.ParseCIDR(vip) + if err != nil { + return nil, fmt.Errorf("invalid VIP %s: %w", vip, err) + } + virtualIPs = append(virtualIPs, ip) + } + + var authTypeNum uint8 + switch authType { + case "NONE", "": + authTypeNum = AuthTypeNone + case "PASS": + authTypeNum = AuthTypeSimpleText + default: + return nil, fmt.Errorf("unsupported auth type: %s", authType) + } + + netInterface, err := netif.GetInterface(iface) + if err != nil { + return nil, fmt.Errorf("failed to get interface: %w", err) + } + + inst := &Instance{ + Name: name, + VirtualRouterID: vrID, + Priority: priority, + AdvertInterval: advertInt, + Interface: iface, + VirtualIPs: virtualIPs, + AuthType: authTypeNum, + AuthPass: authPass, + TrackScripts: trackScripts, + state: NewStateMachine(StateInit), + priorityCalc: NewPriorityCalculator(priority), + history: NewStateHistory(100), + netInterface: netInterface, + stopCh: make(chan struct{}), + log: log, + } + + inst.advertTimer = NewTimer(time.Duration(advertInt)*time.Second, inst.onAdvertTimer) + inst.masterDownTimer = NewTimer(CalculateMasterDownInterval(advertInt), inst.onMasterDownTimer) + + inst.state.OnStateChange(func(old, new State) { + inst.history.Add(old, new, "state transition") + inst.log.Info("[%s] state changed: %s -> %s", inst.Name, old, new) + inst.handleStateChange(old, new) + }) + + return inst, nil +} + +func (inst *Instance) Start() error { + inst.mu.Lock() + if inst.running { + inst.mu.Unlock() + return fmt.Errorf("instance %s already running", inst.Name) + } + inst.running = true + inst.mu.Unlock() + + var err error + inst.socket, err = NewSocket(inst.Interface) + if err != nil { + return fmt.Errorf("failed to create socket: %w", err) + } + + inst.arpSender, err = NewARPSender(inst.Interface) + if err != nil { + inst.socket.Close() + return fmt.Errorf("failed to create ARP sender: %w", err) + } + + inst.log.Info("[%s] starting VRRP instance (VRID=%d, Priority=%d, Interface=%s)", + inst.Name, inst.VirtualRouterID, inst.Priority, inst.Interface) + + inst.state.SetState(StateBackup) + inst.masterDownTimer.Start() + + inst.wg.Add(1) + go inst.receiveLoop() + + return nil +} + +func (inst *Instance) Stop() { + inst.mu.Lock() + if !inst.running { + inst.mu.Unlock() + return + } + inst.running = false + inst.mu.Unlock() + + inst.log.Info("[%s] stopping VRRP instance", inst.Name) + + close(inst.stopCh) + inst.wg.Wait() + + inst.advertTimer.Stop() + inst.masterDownTimer.Stop() + + if inst.state.GetState() == StateMaster { + inst.removeVIPs() + } + + if inst.socket != nil { + inst.socket.Close() + } + + if inst.arpSender != nil { + inst.arpSender.Close() + } + + inst.state.SetState(StateInit) +} + +func (inst *Instance) receiveLoop() { + defer inst.wg.Done() + + for { + select { + case <-inst.stopCh: + return + default: + } + + pkt, srcIP, err := inst.socket.Receive() + if err != nil { + inst.log.Debug("[%s] failed to receive packet: %v", inst.Name, err) + continue + } + + if pkt.VirtualRtrID != inst.VirtualRouterID { + continue + } + + if err := pkt.Validate(inst.AuthPass); err != nil { + inst.log.Warn("[%s] packet validation failed: %v", inst.Name, err) + continue + } + + inst.handleAdvertisement(pkt, srcIP) + } +} + +func (inst *Instance) handleAdvertisement(pkt *VRRPPacket, srcIP net.IP) { + currentState := inst.state.GetState() + localPriority := inst.priorityCalc.GetPriority() + + inst.log.Debug("[%s] received advertisement from %s (priority=%d, state=%s)", + inst.Name, srcIP, pkt.Priority, currentState) + + switch currentState { + case StateBackup: + if pkt.Priority == 0 { + inst.masterDownTimer.SetDuration(CalculateSkewTime(localPriority)) + inst.masterDownTimer.Reset() + } else if !ShouldBecomeMaster(localPriority, pkt.Priority, inst.socket.localIP.String(), srcIP.String()) { + inst.masterDownTimer.Reset() + } + + case StateMaster: + if ShouldBecomeMaster(pkt.Priority, localPriority, srcIP.String(), inst.socket.localIP.String()) { + inst.log.Warn("[%s] received higher priority advertisement, stepping down", inst.Name) + inst.state.SetState(StateBackup) + } + } +} + +func (inst *Instance) onAdvertTimer() { + if inst.state.GetState() == StateMaster { + inst.sendAdvertisement() + inst.advertTimer.Start() + } +} + +func (inst *Instance) onMasterDownTimer() { + if inst.state.GetState() == StateBackup { + inst.log.Info("[%s] master down timer expired, becoming master", inst.Name) + inst.state.SetState(StateMaster) + } +} + +func (inst *Instance) sendAdvertisement() error { + priority := inst.priorityCalc.GetPriority() + + pkt := NewAdvertisement( + inst.VirtualRouterID, + priority, + inst.AdvertInterval, + inst.VirtualIPs, + inst.AuthType, + inst.AuthPass, + ) + + if err := inst.socket.Send(pkt); err != nil { + inst.log.Error("[%s] failed to send advertisement: %v", inst.Name, err) + return err + } + + inst.log.Debug("[%s] sent advertisement (priority=%d)", inst.Name, priority) + return nil +} + +func (inst *Instance) handleStateChange(old, new State) { + switch new { + case StateMaster: + inst.becomeMaster() + case StateBackup: + inst.becomeBackup(old) + case StateFault: + inst.becomeFault() + } +} + +func (inst *Instance) becomeMaster() { + inst.log.Info("[%s] transitioning to MASTER state", inst.Name) + + if err := inst.addVIPs(); err != nil { + inst.log.Error("[%s] failed to add VIPs: %v", inst.Name, err) + inst.state.SetState(StateFault) + return + } + + if err := inst.arpSender.SendGratuitousARPForIPs(inst.VirtualIPs); err != nil { + inst.log.Error("[%s] failed to send gratuitous ARP: %v", inst.Name, err) + } + + inst.masterDownTimer.Stop() + inst.advertTimer.Start() + + inst.sendAdvertisement() + + if inst.onMaster != nil { + inst.onMaster() + } +} + +func (inst *Instance) becomeBackup(oldState State) { + inst.log.Info("[%s] transitioning to BACKUP state", inst.Name) + + inst.advertTimer.Stop() + + if oldState == StateMaster { + if err := inst.removeVIPs(); err != nil { + inst.log.Error("[%s] failed to remove VIPs: %v", inst.Name, err) + } + } + + inst.masterDownTimer.Reset() + + if inst.onBackup != nil { + inst.onBackup() + } +} + +func (inst *Instance) becomeFault() { + inst.log.Error("[%s] transitioning to FAULT state", inst.Name) + + inst.advertTimer.Stop() + inst.masterDownTimer.Stop() + + if err := inst.removeVIPs(); err != nil { + inst.log.Error("[%s] failed to remove VIPs: %v", inst.Name, err) + } + + if inst.onFault != nil { + inst.onFault() + } +} + +func (inst *Instance) addVIPs() error { + inst.log.Info("[%s] adding virtual IPs", inst.Name) + + for _, vipStr := range inst.getVIPsWithCIDR() { + if err := inst.netInterface.AddIP(vipStr); err != nil { + inst.log.Error("[%s] failed to add VIP %s: %v", inst.Name, vipStr, err) + return err + } + inst.log.Info("[%s] added VIP %s", inst.Name, vipStr) + } + + return nil +} + +func (inst *Instance) removeVIPs() error { + inst.log.Info("[%s] removing virtual IPs", inst.Name) + + for _, vipStr := range inst.getVIPsWithCIDR() { + has, _ := inst.netInterface.HasIP(vipStr) + if !has { + continue + } + + if err := inst.netInterface.DeleteIP(vipStr); err != nil { + inst.log.Error("[%s] failed to remove VIP %s: %v", inst.Name, vipStr, err) + return err + } + inst.log.Info("[%s] removed VIP %s", inst.Name, vipStr) + } + + return nil +} + +func (inst *Instance) getVIPsWithCIDR() []string { + result := make([]string, len(inst.VirtualIPs)) + for i, ip := range inst.VirtualIPs { + result[i] = ip.String() + "/32" + } + return result +} + +func (inst *Instance) GetState() State { + return inst.state.GetState() +} + +func (inst *Instance) OnMaster(callback func()) { + inst.onMaster = callback +} + +func (inst *Instance) OnBackup(callback func()) { + inst.onBackup = callback +} + +func (inst *Instance) OnFault(callback func()) { + inst.onFault = callback +} + +func (inst *Instance) AdjustPriority(delta int) { + inst.mu.Lock() + defer inst.mu.Unlock() + + oldPriority := inst.priorityCalc.GetPriority() + + if delta < 0 { + inst.priorityCalc.DecreasePriority(uint8(-delta)) + } + + newPriority := inst.priorityCalc.GetPriority() + + if oldPriority != newPriority { + inst.log.Info("[%s] priority adjusted: %d -> %d (delta=%d)", + inst.Name, oldPriority, newPriority, delta) + } +} + +func (inst *Instance) ResetPriority() { + inst.mu.Lock() + defer inst.mu.Unlock() + + oldPriority := inst.priorityCalc.GetPriority() + inst.priorityCalc.ResetPriority() + newPriority := inst.priorityCalc.GetPriority() + + if oldPriority != newPriority { + inst.log.Info("[%s] priority reset: %d -> %d", + inst.Name, oldPriority, newPriority) + } +} diff --git a/internal/vrrp/manager.go b/internal/vrrp/manager.go new file mode 100644 index 0000000..f58010c --- /dev/null +++ b/internal/vrrp/manager.go @@ -0,0 +1,116 @@ +package vrrp + +import ( + "fmt" + "sync" + + "github.com/loveuer/go-alived/pkg/config" + "github.com/loveuer/go-alived/pkg/logger" +) + +type Manager struct { + instances map[string]*Instance + mu sync.RWMutex + log *logger.Logger +} + +func NewManager(log *logger.Logger) *Manager { + return &Manager{ + instances: make(map[string]*Instance), + log: log, + } +} + +func (m *Manager) LoadFromConfig(cfg *config.Config) error { + m.mu.Lock() + defer m.mu.Unlock() + + for _, vrrpCfg := range cfg.VRRP { + inst, err := NewInstance( + vrrpCfg.Name, + uint8(vrrpCfg.VirtualRouterID), + uint8(vrrpCfg.Priority), + uint8(vrrpCfg.AdvertInterval), + vrrpCfg.Interface, + vrrpCfg.VirtualIPs, + vrrpCfg.AuthType, + vrrpCfg.AuthPass, + vrrpCfg.TrackScripts, + m.log, + ) + if err != nil { + return fmt.Errorf("failed to create instance %s: %w", vrrpCfg.Name, err) + } + + m.instances[vrrpCfg.Name] = inst + m.log.Info("loaded VRRP instance: %s", vrrpCfg.Name) + } + + return nil +} + +func (m *Manager) StartAll() error { + m.mu.RLock() + defer m.mu.RUnlock() + + for name, inst := range m.instances { + if err := inst.Start(); err != nil { + return fmt.Errorf("failed to start instance %s: %w", name, err) + } + } + + m.log.Info("started %d VRRP instance(s)", len(m.instances)) + return nil +} + +func (m *Manager) StopAll() { + m.mu.RLock() + defer m.mu.RUnlock() + + for _, inst := range m.instances { + inst.Stop() + } + + m.log.Info("stopped all VRRP instances") +} + +func (m *Manager) GetInstance(name string) (*Instance, bool) { + m.mu.RLock() + defer m.mu.RUnlock() + + inst, ok := m.instances[name] + return inst, ok +} + +func (m *Manager) GetAllInstances() []*Instance { + m.mu.RLock() + defer m.mu.RUnlock() + + result := make([]*Instance, 0, len(m.instances)) + for _, inst := range m.instances { + result = append(result, inst) + } + + return result +} + +func (m *Manager) Reload(cfg *config.Config) error { + m.log.Info("reloading VRRP configuration...") + + m.StopAll() + + m.mu.Lock() + m.instances = make(map[string]*Instance) + m.mu.Unlock() + + if err := m.LoadFromConfig(cfg); err != nil { + return fmt.Errorf("failed to load config: %w", err) + } + + if err := m.StartAll(); err != nil { + return fmt.Errorf("failed to start instances: %w", err) + } + + m.log.Info("VRRP configuration reloaded successfully") + return nil +} \ No newline at end of file diff --git a/internal/vrrp/packet.go b/internal/vrrp/packet.go new file mode 100644 index 0000000..4c94453 --- /dev/null +++ b/internal/vrrp/packet.go @@ -0,0 +1,184 @@ +package vrrp + +import ( + "bytes" + "encoding/binary" + "fmt" + "net" +) + +const ( + VRRPVersion = 2 + VRRPProtocolNumber = 112 +) + +type VRRPPacket struct { + Version uint8 + Type uint8 + VirtualRtrID uint8 + Priority uint8 + CountIPAddrs uint8 + AuthType uint8 + AdvertInt uint8 + Checksum uint16 + IPAddresses []net.IP + AuthData [8]byte +} + +const ( + VRRPTypeAdvertisement = 1 +) + +const ( + AuthTypeNone = 0 + AuthTypeSimpleText = 1 + AuthTypeIPAH = 2 +) + +func NewAdvertisement(vrID uint8, priority uint8, advertInt uint8, ips []net.IP, authType uint8, authPass string) *VRRPPacket { + pkt := &VRRPPacket{ + Version: VRRPVersion, + Type: VRRPTypeAdvertisement, + VirtualRtrID: vrID, + Priority: priority, + CountIPAddrs: uint8(len(ips)), + AuthType: authType, + AdvertInt: advertInt, + IPAddresses: ips, + } + + if authType == AuthTypeSimpleText && authPass != "" { + copy(pkt.AuthData[:], authPass) + } + + return pkt +} + +func (p *VRRPPacket) Marshal() ([]byte, error) { + buf := new(bytes.Buffer) + + versionType := (p.Version << 4) | p.Type + if err := binary.Write(buf, binary.BigEndian, versionType); err != nil { + return nil, err + } + + if err := binary.Write(buf, binary.BigEndian, p.VirtualRtrID); err != nil { + return nil, err + } + + if err := binary.Write(buf, binary.BigEndian, p.Priority); err != nil { + return nil, err + } + + if err := binary.Write(buf, binary.BigEndian, p.CountIPAddrs); err != nil { + return nil, err + } + + if err := binary.Write(buf, binary.BigEndian, p.AuthType); err != nil { + return nil, err + } + + if err := binary.Write(buf, binary.BigEndian, p.AdvertInt); err != nil { + return nil, err + } + + if err := binary.Write(buf, binary.BigEndian, uint16(0)); err != nil { + return nil, err + } + + for _, ip := range p.IPAddresses { + ip4 := ip.To4() + if ip4 == nil { + return nil, fmt.Errorf("invalid IPv4 address: %s", ip) + } + if err := binary.Write(buf, binary.BigEndian, ip4); err != nil { + return nil, err + } + } + + if err := binary.Write(buf, binary.BigEndian, p.AuthData); err != nil { + return nil, err + } + + data := buf.Bytes() + + checksum := calculateChecksum(data) + binary.BigEndian.PutUint16(data[6:8], checksum) + + return data, nil +} + +func Unmarshal(data []byte) (*VRRPPacket, error) { + if len(data) < 20 { + return nil, fmt.Errorf("packet too short: %d bytes", len(data)) + } + + pkt := &VRRPPacket{} + + versionType := data[0] + pkt.Version = versionType >> 4 + pkt.Type = versionType & 0x0F + pkt.VirtualRtrID = data[1] + pkt.Priority = data[2] + pkt.CountIPAddrs = data[3] + pkt.AuthType = data[4] + pkt.AdvertInt = data[5] + pkt.Checksum = binary.BigEndian.Uint16(data[6:8]) + + offset := 8 + pkt.IPAddresses = make([]net.IP, pkt.CountIPAddrs) + for i := 0; i < int(pkt.CountIPAddrs); i++ { + if offset+4 > len(data) { + return nil, fmt.Errorf("packet too short for IP addresses") + } + pkt.IPAddresses[i] = net.IPv4(data[offset], data[offset+1], data[offset+2], data[offset+3]) + offset += 4 + } + + if offset+8 > len(data) { + return nil, fmt.Errorf("packet too short for auth data") + } + copy(pkt.AuthData[:], data[offset:offset+8]) + + return pkt, nil +} + +func calculateChecksum(data []byte) uint16 { + sum := uint32(0) + + for i := 0; i < len(data)-1; i += 2 { + sum += uint32(data[i])<<8 | uint32(data[i+1]) + } + + if len(data)%2 == 1 { + sum += uint32(data[len(data)-1]) << 8 + } + + for sum > 0xFFFF { + sum = (sum & 0xFFFF) + (sum >> 16) + } + + return uint16(^sum) +} + +func (p *VRRPPacket) Validate(authPass string) error { + if p.Version != VRRPVersion { + return fmt.Errorf("unsupported VRRP version: %d", p.Version) + } + + if p.Type != VRRPTypeAdvertisement { + return fmt.Errorf("unsupported VRRP type: %d", p.Type) + } + + if p.AuthType == AuthTypeSimpleText { + if authPass != "" { + var expectedAuth [8]byte + copy(expectedAuth[:], authPass) + if !bytes.Equal(p.AuthData[:], expectedAuth[:]) { + return fmt.Errorf("authentication failed") + } + } + } + + return nil +} diff --git a/internal/vrrp/socket.go b/internal/vrrp/socket.go new file mode 100644 index 0000000..39b1937 --- /dev/null +++ b/internal/vrrp/socket.go @@ -0,0 +1,141 @@ +package vrrp + +import ( + "fmt" + "net" + "os" + "syscall" + + "golang.org/x/net/ipv4" +) + +const ( + VRRPMulticastAddr = "224.0.0.18" +) + +type Socket struct { + conn *ipv4.RawConn + iface *net.Interface + localIP net.IP + groupIP net.IP +} + +func NewSocket(ifaceName string) (*Socket, error) { + iface, err := net.InterfaceByName(ifaceName) + if err != nil { + return nil, fmt.Errorf("failed to get interface %s: %w", ifaceName, err) + } + + addrs, err := iface.Addrs() + if err != nil { + return nil, fmt.Errorf("failed to get addresses for %s: %w", ifaceName, err) + } + + var localIP net.IP + for _, addr := range addrs { + if ipNet, ok := addr.(*net.IPNet); ok { + if ipv4 := ipNet.IP.To4(); ipv4 != nil { + localIP = ipv4 + break + } + } + } + + if localIP == nil { + return nil, fmt.Errorf("no IPv4 address found on interface %s", ifaceName) + } + + fd, err := syscall.Socket(syscall.AF_INET, syscall.SOCK_RAW, VRRPProtocolNumber) + if err != nil { + return nil, fmt.Errorf("failed to create raw socket: %w", err) + } + + if err := syscall.SetsockoptInt(fd, syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1); err != nil { + syscall.Close(fd) + return nil, fmt.Errorf("failed to set SO_REUSEADDR: %w", err) + } + + file := os.NewFile(uintptr(fd), "vrrp-socket") + defer file.Close() + + packetConn, err := net.FilePacketConn(file) + if err != nil { + return nil, fmt.Errorf("failed to create packet connection: %w", err) + } + + rawConn, err := ipv4.NewRawConn(packetConn) + if err != nil { + packetConn.Close() + return nil, fmt.Errorf("failed to create raw connection: %w", err) + } + + groupIP := net.ParseIP(VRRPMulticastAddr).To4() + if groupIP == nil { + rawConn.Close() + return nil, fmt.Errorf("invalid multicast address: %s", VRRPMulticastAddr) + } + + if err := rawConn.JoinGroup(iface, &net.IPAddr{IP: groupIP}); err != nil { + rawConn.Close() + return nil, fmt.Errorf("failed to join multicast group: %w", err) + } + + if err := rawConn.SetControlMessage(ipv4.FlagTTL|ipv4.FlagSrc|ipv4.FlagDst|ipv4.FlagInterface, true); err != nil { + rawConn.Close() + return nil, fmt.Errorf("failed to set control message: %w", err) + } + + return &Socket{ + conn: rawConn, + iface: iface, + localIP: localIP, + groupIP: groupIP, + }, nil +} + +func (s *Socket) Send(pkt *VRRPPacket) error { + data, err := pkt.Marshal() + if err != nil { + return fmt.Errorf("failed to marshal packet: %w", err) + } + + header := &ipv4.Header{ + Version: ipv4.Version, + Len: ipv4.HeaderLen, + TOS: 0xC0, + TotalLen: ipv4.HeaderLen + len(data), + TTL: 255, + Protocol: VRRPProtocolNumber, + Dst: s.groupIP, + Src: s.localIP, + } + + if err := s.conn.WriteTo(header, data, nil); err != nil { + return fmt.Errorf("failed to send packet: %w", err) + } + + return nil +} + +func (s *Socket) Receive() (*VRRPPacket, net.IP, error) { + buf := make([]byte, 1500) + + header, payload, _, err := s.conn.ReadFrom(buf) + if err != nil { + return nil, nil, fmt.Errorf("failed to receive packet: %w", err) + } + + pkt, err := Unmarshal(payload) + if err != nil { + return nil, nil, fmt.Errorf("failed to unmarshal packet: %w", err) + } + + return pkt, header.Src, nil +} + +func (s *Socket) Close() error { + if err := s.conn.LeaveGroup(s.iface, &net.IPAddr{IP: s.groupIP}); err != nil { + return err + } + return s.conn.Close() +} \ No newline at end of file diff --git a/internal/vrrp/state.go b/internal/vrrp/state.go new file mode 100644 index 0000000..efec825 --- /dev/null +++ b/internal/vrrp/state.go @@ -0,0 +1,258 @@ +package vrrp + +import ( + "fmt" + "sync" + "time" +) + +type State int + +const ( + StateInit State = iota + StateBackup + StateMaster + StateFault +) + +func (s State) String() string { + switch s { + case StateInit: + return "INIT" + case StateBackup: + return "BACKUP" + case StateMaster: + return "MASTER" + case StateFault: + return "FAULT" + default: + return "UNKNOWN" + } +} + +type StateMachine struct { + currentState State + previousState State + mu sync.RWMutex + stateChangeCallbacks []func(old, new State) +} + +func NewStateMachine(initialState State) *StateMachine { + return &StateMachine{ + currentState: initialState, + previousState: StateInit, + stateChangeCallbacks: make([]func(old, new State), 0), + } +} + +func (sm *StateMachine) GetState() State { + sm.mu.RLock() + defer sm.mu.RUnlock() + return sm.currentState +} + +func (sm *StateMachine) SetState(newState State) { + sm.mu.Lock() + oldState := sm.currentState + sm.previousState = oldState + sm.currentState = newState + callbacks := sm.stateChangeCallbacks + sm.mu.Unlock() + + for _, callback := range callbacks { + callback(oldState, newState) + } +} + +func (sm *StateMachine) OnStateChange(callback func(old, new State)) { + sm.mu.Lock() + defer sm.mu.Unlock() + sm.stateChangeCallbacks = append(sm.stateChangeCallbacks, callback) +} + +type Timer struct { + duration time.Duration + timer *time.Timer + callback func() + mu sync.Mutex +} + +func NewTimer(duration time.Duration, callback func()) *Timer { + return &Timer{ + duration: duration, + callback: callback, + } +} + +func (t *Timer) Start() { + t.mu.Lock() + defer t.mu.Unlock() + + if t.timer != nil { + t.timer.Stop() + } + + t.timer = time.AfterFunc(t.duration, t.callback) +} + +func (t *Timer) Stop() { + t.mu.Lock() + defer t.mu.Unlock() + + if t.timer != nil { + t.timer.Stop() + t.timer = nil + } +} + +func (t *Timer) Reset() { + t.mu.Lock() + defer t.mu.Unlock() + + if t.timer != nil { + t.timer.Stop() + } + + t.timer = time.AfterFunc(t.duration, t.callback) +} + +func (t *Timer) SetDuration(duration time.Duration) { + t.mu.Lock() + defer t.mu.Unlock() + t.duration = duration +} + +type PriorityCalculator struct { + basePriority uint8 + currentPriority uint8 + mu sync.RWMutex +} + +func NewPriorityCalculator(basePriority uint8) *PriorityCalculator { + return &PriorityCalculator{ + basePriority: basePriority, + currentPriority: basePriority, + } +} + +func (pc *PriorityCalculator) GetPriority() uint8 { + pc.mu.RLock() + defer pc.mu.RUnlock() + return pc.currentPriority +} + +func (pc *PriorityCalculator) DecreasePriority(amount uint8) { + pc.mu.Lock() + defer pc.mu.Unlock() + + if pc.currentPriority > amount { + pc.currentPriority -= amount + } else { + pc.currentPriority = 0 + } +} + +func (pc *PriorityCalculator) ResetPriority() { + pc.mu.Lock() + defer pc.mu.Unlock() + pc.currentPriority = pc.basePriority +} + +func (pc *PriorityCalculator) SetBasePriority(priority uint8) { + pc.mu.Lock() + defer pc.mu.Unlock() + pc.basePriority = priority + pc.currentPriority = priority +} + +func ShouldBecomeMaster(localPriority, remotePriority uint8, localIP, remoteIP string) bool { + if localPriority > remotePriority { + return true + } + + if localPriority == remotePriority { + return localIP > remoteIP + } + + return false +} + +func CalculateMasterDownInterval(advertInt uint8) time.Duration { + return time.Duration(3*int(advertInt)) * time.Second +} + +func CalculateSkewTime(priority uint8) time.Duration { + skew := float64(256-int(priority)) / 256.0 + return time.Duration(skew * float64(time.Second)) +} + +type StateTransition struct { + From State + To State + Timestamp time.Time + Reason string +} + +type StateHistory struct { + transitions []StateTransition + maxSize int + mu sync.RWMutex +} + +func NewStateHistory(maxSize int) *StateHistory { + return &StateHistory{ + transitions: make([]StateTransition, 0, maxSize), + maxSize: maxSize, + } +} + +func (sh *StateHistory) Add(from, to State, reason string) { + sh.mu.Lock() + defer sh.mu.Unlock() + + transition := StateTransition{ + From: from, + To: to, + Timestamp: time.Now(), + Reason: reason, + } + + sh.transitions = append(sh.transitions, transition) + + if len(sh.transitions) > sh.maxSize { + sh.transitions = sh.transitions[1:] + } +} + +func (sh *StateHistory) GetRecent(n int) []StateTransition { + sh.mu.RLock() + defer sh.mu.RUnlock() + + if n > len(sh.transitions) { + n = len(sh.transitions) + } + + start := len(sh.transitions) - n + result := make([]StateTransition, n) + copy(result, sh.transitions[start:]) + + return result +} + +func (sh *StateHistory) String() string { + sh.mu.RLock() + defer sh.mu.RUnlock() + + if len(sh.transitions) == 0 { + return "No state transitions" + } + + result := "State transition history:\n" + for _, t := range sh.transitions { + result += fmt.Sprintf(" %s: %s -> %s (%s)\n", + t.Timestamp.Format("2006-01-02 15:04:05"), + t.From, t.To, t.Reason) + } + + return result +} \ No newline at end of file diff --git a/main.go b/main.go new file mode 100644 index 0000000..3ec8ee9 --- /dev/null +++ b/main.go @@ -0,0 +1,9 @@ +package main + +import ( + "github.com/loveuer/go-alived/internal/cmd" +) + +func main() { + cmd.Execute() +} diff --git a/pkg/config/config.go b/pkg/config/config.go new file mode 100644 index 0000000..e6c5059 --- /dev/null +++ b/pkg/config/config.go @@ -0,0 +1,90 @@ +package config + +import ( + "fmt" + "os" + "time" + + "gopkg.in/yaml.v3" +) + +type Config struct { + Global Global `yaml:"global"` + VRRP []VRRPInstance `yaml:"vrrp_instances"` + Health []HealthChecker `yaml:"health_checkers"` +} + +type Global struct { + RouterID string `yaml:"router_id"` + NotificationMail string `yaml:"notification_email"` +} + +type VRRPInstance struct { + Name string `yaml:"name"` + Interface string `yaml:"interface"` + State string `yaml:"state"` + VirtualRouterID int `yaml:"virtual_router_id"` + Priority int `yaml:"priority"` + VirtualIPs []string `yaml:"virtual_ips"` + AdvertInterval int `yaml:"advert_interval"` + AuthType string `yaml:"auth_type"` + AuthPass string `yaml:"auth_pass"` + NotifyMaster string `yaml:"notify_master"` + NotifyBackup string `yaml:"notify_backup"` + NotifyFault string `yaml:"notify_fault"` + TrackScripts []string `yaml:"track_scripts"` +} + +type HealthChecker struct { + Name string `yaml:"name"` + Type string `yaml:"type"` + Interval time.Duration `yaml:"interval"` + Timeout time.Duration `yaml:"timeout"` + Rise int `yaml:"rise"` + Fall int `yaml:"fall"` + Config interface{} `yaml:"config"` +} + +func Load(path string) (*Config, error) { + data, err := os.ReadFile(path) + if err != nil { + return nil, fmt.Errorf("failed to read config file: %w", err) + } + + var cfg Config + if err := yaml.Unmarshal(data, &cfg); err != nil { + return nil, fmt.Errorf("failed to parse config file: %w", err) + } + + if err := validate(&cfg); err != nil { + return nil, fmt.Errorf("invalid configuration: %w", err) + } + + return &cfg, nil +} + +func validate(cfg *Config) error { + if cfg.Global.RouterID == "" { + return fmt.Errorf("global.router_id is required") + } + + for i, vrrp := range cfg.VRRP { + if vrrp.Name == "" { + return fmt.Errorf("vrrp_instances[%d].name is required", i) + } + if vrrp.Interface == "" { + return fmt.Errorf("vrrp_instances[%d].interface is required", i) + } + if vrrp.VirtualRouterID < 1 || vrrp.VirtualRouterID > 255 { + return fmt.Errorf("vrrp_instances[%d].virtual_router_id must be between 1 and 255", i) + } + if vrrp.Priority < 1 || vrrp.Priority > 255 { + return fmt.Errorf("vrrp_instances[%d].priority must be between 1 and 255", i) + } + if len(vrrp.VirtualIPs) == 0 { + return fmt.Errorf("vrrp_instances[%d].virtual_ips cannot be empty", i) + } + } + + return nil +} diff --git a/pkg/logger/logger.go b/pkg/logger/logger.go new file mode 100644 index 0000000..bf99e52 --- /dev/null +++ b/pkg/logger/logger.go @@ -0,0 +1,44 @@ +package logger + +import ( + "fmt" + "log" + "os" + "time" +) + +type Logger struct { + debug bool + logger *log.Logger +} + +func New(debug bool) *Logger { + return &Logger{ + debug: debug, + logger: log.New(os.Stdout, "", 0), + } +} + +func (l *Logger) Info(format string, args ...interface{}) { + l.log("INFO", format, args...) +} + +func (l *Logger) Error(format string, args ...interface{}) { + l.log("ERROR", format, args...) +} + +func (l *Logger) Debug(format string, args ...interface{}) { + if l.debug { + l.log("DEBUG", format, args...) + } +} + +func (l *Logger) Warn(format string, args ...interface{}) { + l.log("WARN", format, args...) +} + +func (l *Logger) log(level string, format string, args ...interface{}) { + timestamp := time.Now().Format("2006-01-02 15:04:05") + message := fmt.Sprintf(format, args...) + l.logger.Printf("[%s] %s: %s", timestamp, level, message) +} diff --git a/pkg/netif/interface.go b/pkg/netif/interface.go new file mode 100644 index 0000000..18124de --- /dev/null +++ b/pkg/netif/interface.go @@ -0,0 +1,81 @@ +package netif + +import ( + "fmt" + "net" + + "github.com/vishvananda/netlink" +) + +type Interface struct { + Name string + Index int + Link netlink.Link +} + +func GetInterface(name string) (*Interface, error) { + link, err := netlink.LinkByName(name) + if err != nil { + return nil, fmt.Errorf("failed to find interface %s: %w", name, err) + } + + return &Interface{ + Name: name, + Index: link.Attrs().Index, + Link: link, + }, nil +} + +func (iface *Interface) AddIP(ipCIDR string) error { + addr, err := netlink.ParseAddr(ipCIDR) + if err != nil { + return fmt.Errorf("invalid IP address %s: %w", ipCIDR, err) + } + + if err := netlink.AddrAdd(iface.Link, addr); err != nil { + return fmt.Errorf("failed to add IP %s to %s: %w", ipCIDR, iface.Name, err) + } + + return nil +} + +func (iface *Interface) DeleteIP(ipCIDR string) error { + addr, err := netlink.ParseAddr(ipCIDR) + if err != nil { + return fmt.Errorf("invalid IP address %s: %w", ipCIDR, err) + } + + if err := netlink.AddrDel(iface.Link, addr); err != nil { + return fmt.Errorf("failed to delete IP %s from %s: %w", ipCIDR, iface.Name, err) + } + + return nil +} + +func (iface *Interface) HasIP(ipCIDR string) (bool, error) { + targetAddr, err := netlink.ParseAddr(ipCIDR) + if err != nil { + return false, fmt.Errorf("invalid IP address %s: %w", ipCIDR, err) + } + + addrs, err := netlink.AddrList(iface.Link, 0) + if err != nil { + return false, fmt.Errorf("failed to list addresses on %s: %w", iface.Name, err) + } + + for _, addr := range addrs { + if addr.IPNet.String() == targetAddr.IPNet.String() { + return true, nil + } + } + + return false, nil +} + +func (iface *Interface) GetHardwareAddr() (net.HardwareAddr, error) { + return iface.Link.Attrs().HardwareAddr, nil +} + +func (iface *Interface) IsUp() bool { + return iface.Link.Attrs().Flags&net.FlagUp != 0 +} \ No newline at end of file diff --git a/roadmap.md b/roadmap.md new file mode 100644 index 0000000..972623a --- /dev/null +++ b/roadmap.md @@ -0,0 +1,133 @@ +# go-alived Roadmap + +## 项目目标 +使用 Golang 实现 keepalived 的核心功能,无外部依赖,单二进制部署。 + +## Keepalived 核心功能 + +### 1. VRRP (Virtual Router Redundancy Protocol) 协议 +- **虚拟 IP 管理**: 管理可在多个节点间浮动的虚拟 IP 地址 (VIP) +- **状态机管理**: MASTER、BACKUP、FAULT 三种状态的转换 +- **优先级选举**: 基于优先级 (1-255) 选举 MASTER 节点 +- **Gratuitous ARP**: 状态变化时发送 ARP 报文更新网络设备 +- **同步组**: 将多个 VRRP 实例组合,作为整体进行状态转换 +- **虚拟 MAC 支持**: 支持使用虚拟 MAC 地址 (macvlan) + +### 2. 健康检查 (Health Checking) +- **HTTP/HTTPS 检查**: 通过 GET 请求验证 Web 服务状态 +- **TCP 检查**: 基本的 TCP 连接测试 +- **SMTP 检查**: 邮件服务监控 +- **DNS 检查**: 基于查询的 DNS 验证 +- **脚本检查**: 自定义脚本实现灵活监控 +- **UDP/PING 检查**: 网络连通性测试 +- **动态权重**: 根据健康检查结果动态调整权重 + +### 3. 负载均衡 (LVS 集成) +- **调度算法**: 支持 rr、wrr、lc、wlc、sh 等多种调度算法 +- **转发模式**: NAT、Direct Routing (DR)、IP Tunneling (TUN) +- **后端服务器管理**: 根据健康状态动态添加/移除后端服务器 +- **Quorum 支持**: 配置最小存活服务器数量 +- **Sorry Server**: 当健康节点不足时的备用服务器 +- **会话保持**: 支持会话持久化 + +### 4. 辅助功能 +- **状态变化脚本**: 在状态转换时执行自定义脚本 +- **邮件通知**: SMTP 告警支持 +- **进程监控**: 监控外部进程并调整优先级 +- **配置热加载**: 支持配置文件重载 + +## 实现计划 + +### Phase 0: 项目基础设施 ✅ +- [x] 项目结构搭建 +- [x] CLI 参数解析 (--config, --debug) +- [x] YAML 配置文件加载和验证 +- [x] 日志系统 +- [x] 信号处理 (SIGHUP 重载配置) + +### Phase 1: 核心 VRRP 功能 (第一优先级) +#### 1.1 网络接口和 IP 管理 +- [ ] 网络接口检测和验证 +- [ ] VIP 添加/删除功能 (使用 netlink) +- [ ] IP 地址冲突检测 +- [ ] VIP 状态查询 + +#### 1.2 VRRP 协议栈 +- [ ] VRRP 报文结构定义 (RFC 3768/5798) +- [ ] 原始 socket 收发 VRRP 报文 +- [ ] Advertisement 报文发送 +- [ ] Advertisement 报文接收和解析 +- [ ] 认证支持 (PASS 类型) + +#### 1.3 状态机实现 +- [ ] 状态定义 (INIT/BACKUP/MASTER/FAULT) +- [ ] 状态转换逻辑 +- [ ] Master 选举算法 +- [ ] 定时器管理 (Advertisement Timer, Master Down Timer) +- [ ] 优先级抢占模式 + +#### 1.4 ARP 和网络更新 +- [ ] Gratuitous ARP 发送 +- [ ] ARP 应答处理 +- [ ] 多 VIP 的 ARP 广播 + +#### 1.5 集成和测试 +- [ ] VRRP 实例管理器 +- [ ] 多实例支持 +- [ ] 基础功能测试 +- [ ] 双机 VRRP 切换测试 + +### Phase 2: 健康检查系统 (第二优先级) +#### 2.1 健康检查框架 +- [ ] 健康检查器接口定义 +- [ ] 检查结果状态管理 (rise/fall 计数) +- [ ] 定时调度器 +- [ ] 超时控制 + +#### 2.2 检查器实现 +- [ ] TCP 健康检查 +- [ ] HTTP/HTTPS 健康检查 +- [ ] ICMP Ping 检查 +- [ ] 脚本检查 (执行外部命令) +- [ ] DNS 检查 + +#### 2.3 与 VRRP 联动 +- [ ] Track Script 支持 +- [ ] 健康检查失败时降低优先级 +- [ ] 检查恢复时恢复优先级 +- [ ] 健康检查状态影响 VRRP 状态机 + +### Phase 3: 增强功能 (第三优先级) +#### 3.1 通知和脚本 +- [ ] 状态变化时执行脚本 (notify_master/backup/fault) +- [ ] 脚本执行器 (权限控制、超时控制) +- [ ] 邮件通知支持 (SMTP) +- [ ] Webhook 通知 + +#### 3.2 高级特性 +- [ ] 同步组 (Sync Group) 支持 +- [ ] 虚拟 MAC 地址支持 +- [ ] 配置热加载优化 +- [ ] 进程监控和自动重启 + +#### 3.3 可观测性 +- [ ] 状态查询 API/CLI +- [ ] Metrics 导出 (Prometheus 格式) +- [ ] 详细的事件日志 +- [ ] 调试模式增强 + +### Phase 4: 负载均衡 (可选,低优先级) +- [ ] LVS 集成调研 +- [ ] IPVS 操作封装 +- [ ] 基础调度算法 (rr, wrr) +- [ ] 后端服务器健康检查 +- [ ] 动态后端管理 + +## 当前进度 +- ✅ Phase 0 已完成 +- 🔄 下一步:Phase 1.1 网络接口和 IP 管理 + +## 技术选型 +- 语言: Go 1.21+ +- 配置格式: YAML/TOML (兼容 keepalived.conf 风格) +- 依赖: 尽量使用标准库,最小化第三方依赖 \ No newline at end of file