wip v1.0.0

This commit is contained in:
loveuer
2025-12-08 22:23:45 +08:00
commit bece440c47
35 changed files with 4893 additions and 0 deletions

6
.gitignore vendored Normal file
View File

@@ -0,0 +1,6 @@
.qoder
.vscode
.idea
dist
x-*

236
README.md Normal file
View File

@@ -0,0 +1,236 @@
# go-alived
A lightweight, dependency-free VRRP (Virtual Router Redundancy Protocol) implementation in Go, designed as a simple alternative to keepalived.
## Features
**Phase 1: Core VRRP Functionality (Completed)**
- VRRP protocol implementation (RFC 3768/5798)
- Virtual IP management (add/remove VIPs)
- State machine (INIT/BACKUP/MASTER/FAULT)
- Priority-based master election
- Gratuitous ARP for network updates
- Raw socket VRRP packet send/receive
- Timer management (advertisement & master-down timers)
- VRRP instance manager with multi-instance support
- Configuration hot-reload (SIGHUP)
**Phase 2: Health Checking (Completed)**
- Health checker interface with rise/fall logic
- TCP health checks
- HTTP/HTTPS health checks
- ICMP ping checks
- Script-based checks (custom commands)
- Periodic health check scheduling
- Health check integration with VRRP priority
- Track scripts: automatic priority adjustment on health changes
🚧 **Phase 3: Enhanced Features (Planned)**
- State transition scripts (notify_master/backup/fault)
- Email/Webhook notifications
- Sync groups
- Virtual MAC support
- Metrics export
## Installation
### Build from source
```bash
git clone https://github.com/loveuer/go-alived.git
cd go-alived
go build -o go-alived .
```
## Quick Start
### 1. Test Your Environment
Before deployment, test if your environment supports VRRP:
```bash
# Basic test (auto-detect network interface)
sudo ./go-alived test
# Test specific interface
sudo ./go-alived test -i eth0
# Full test with VIP
sudo ./go-alived test -i eth0 -v 192.168.1.100/24
```
### 2. Run the Service
```bash
# Run with minimal config
sudo ./go-alived run -c config.mini.yaml -d
# Run with full config
sudo ./go-alived -c config.yaml
# Install as systemd service
sudo ./deployment/install.sh
sudo systemctl start go-alived
```
## Usage
### Commands
```
go-alived # Run VRRP service (default)
go-alived run # Run VRRP service
go-alived test # Test environment for VRRP support
go-alived --help # Show help
go-alived --version # Show version
```
### Global Flags
```
-c, --config string Path to configuration file (default "/etc/go-alived/config.yaml")
-d, --debug Enable debug mode
-h, --help Show help
-v, --version Show version
```
### Test Command Flags
```
-i, --interface string Network interface to test (auto-detect if not specified)
-v, --vip string Test VIP address (e.g., 192.168.1.100/24)
```
See [USAGE.md](USAGE.md) for detailed usage documentation.
## Configuration
### Minimal Configuration
```yaml
# config.mini.yaml - VRRP only
global:
router_id: "node1"
vrrp_instances:
- name: "VI_1"
interface: "eth0"
state: "BACKUP"
virtual_router_id: 51
priority: 100
advert_interval: 1
auth_type: "PASS"
auth_pass: "secret123"
virtual_ips:
- "192.168.1.100/24"
```
### Full Configuration Example
See `config.example.yaml` for complete configuration with health checking.
### Signals
- `SIGHUP`: Reload configuration
- `SIGINT/SIGTERM`: Graceful shutdown
## Architecture
```
go-alived/
├── main.go # Application entry point
├── internal/
│ ├── cmd/ # Cobra commands
│ │ ├── root.go # Root command
│ │ ├── run.go # Run service command
│ │ └── test.go # Environment test command
│ ├── vrrp/ # VRRP implementation
│ │ ├── packet.go # VRRP packet structure & marshaling
│ │ ├── socket.go # Raw socket operations
│ │ ├── state.go # State machine & timers
│ │ ├── arp.go # Gratuitous ARP
│ │ ├── instance.go # VRRP instance logic
│ │ └── manager.go # Instance manager
│ └── health/ # Health check system
│ ├── checker.go # Checker interface & state
│ ├── monitor.go # Health check scheduler
│ ├── tcp.go # TCP health checker
│ ├── http.go # HTTP/HTTPS health checker
│ ├── ping.go # ICMP ping checker
│ ├── script.go # Script checker
│ └── factory.go # Checker factory
├── pkg/
│ ├── config/ # Configuration loading & validation
│ ├── logger/ # Logging system
│ └── netif/ # Network interface management
└── deployment/ # Deployment files
├── go-alived.service # Systemd service file
├── install.sh # Installation script
├── uninstall.sh # Uninstallation script
├── check-env.sh # Environment check script
├── README.md # Deployment documentation
└── COMPATIBILITY.md # Environment compatibility guide
```
## Environment Compatibility
### ✅ Fully Supported
- Physical servers
- KVM/QEMU virtual machines
- Proxmox VE
- VMware ESXi (with promiscuous mode)
- VirtualBox (with bridged network + promiscuous mode)
### ⚠️ Limited Support
- Private cloud (depends on network configuration)
- Docker containers (requires `--privileged` and `--net=host`)
- Kubernetes (requires hostNetwork mode)
### ❌ Not Supported
- AWS EC2 (multicast disabled)
- Aliyun ECS (multicast disabled)
- Azure VM (requires special configuration)
- Google Cloud (multicast disabled by default)
**Why?** Public clouds typically disable multicast protocols (224.0.0.18) at the network virtualization layer.
**Alternative**: Use cloud-native solutions like Elastic IP (AWS), SLB/HaVip (Aliyun), Load Balancer (Azure/GCP).
See [deployment/COMPATIBILITY.md](deployment/COMPATIBILITY.md) for detailed compatibility information.
## Requirements
- Go 1.21+ (for building)
- Linux/macOS with root privileges (for raw sockets and interface management)
- Network interface with IPv4 address
- Multicast support (for VRRP)
## Dependencies
Minimal external dependencies:
- `github.com/vishvananda/netlink` - Network interface management
- `github.com/mdlayher/arp` - ARP packet handling
- `github.com/spf13/cobra` - CLI framework
- `golang.org/x/net/ipv4` - IPv4 raw socket support
- `golang.org/x/net/icmp` - ICMP ping support
- `gopkg.in/yaml.v3` - YAML configuration parsing
## Documentation
- [USAGE.md](USAGE.md) - Detailed usage guide
- [TESTING.md](TESTING.md) - Testing guide
- [deployment/README.md](deployment/README.md) - Deployment guide
- [deployment/COMPATIBILITY.md](deployment/COMPATIBILITY.md) - Environment compatibility
- [roadmap.md](roadmap.md) - Implementation roadmap
## Roadmap
See [roadmap.md](roadmap.md) for detailed implementation plan.
## License
MIT License
## Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.

301
TESTING.md Normal file
View File

@@ -0,0 +1,301 @@
# VRRP 功能测试指南
## 测试环境准备
### 1. 单机测试(使用虚拟网卡)
```bash
# macOS 创建虚拟网卡lo0 回环接口别名)
sudo ifconfig lo0 alias 192.168.100.1/24
# Linux 创建虚拟网卡(使用 dummy 模块)
sudo modprobe dummy
sudo ip link add dummy0 type dummy
sudo ip addr add 192.168.100.1/24 dev dummy0
sudo ip link set dummy0 up
```
### 2. 双机测试(推荐,真实场景)
需要两台机器(虚拟机或物理机),在同一网段:
- Node1: 192.168.1.10/24
- Node2: 192.168.1.20/24
- VIP: 192.168.1.100/24
## 测试配置文件
### Node1 配置 (config-node1.yaml)
```yaml
global:
router_id: "node1"
notification_email: "admin@example.com"
vrrp_instances:
- name: "VI_1"
interface: "eth0" # 修改为实际网卡名
state: "BACKUP"
virtual_router_id: 51
priority: 100 # 较高优先级
advert_interval: 1
auth_type: "PASS"
auth_pass: "secret123"
virtual_ips:
- "192.168.1.100/24" # 修改为实际网段
```
### Node2 配置 (config-node2.yaml)
```yaml
global:
router_id: "node2"
notification_email: "admin@example.com"
vrrp_instances:
- name: "VI_1"
interface: "eth0" # 修改为实际网卡名
state: "BACKUP"
virtual_router_id: 51
priority: 90 # 较低优先级
advert_interval: 1
auth_type: "PASS"
auth_pass: "secret123"
virtual_ips:
- "192.168.1.100/24" # 修改为实际网段
```
## 测试步骤
### 测试 1: 启动和日志检查
**Node1:**
```bash
sudo ./go-alived --config config-node1.yaml --debug
```
**预期输出:**
```
[2025-12-05 14:25:51] INFO: starting go-alived...
[2025-12-05 14:25:51] INFO: loading configuration from: config-node1.yaml
[2025-12-05 14:25:51] INFO: configuration loaded successfully
[2025-12-05 14:25:51] INFO: loaded VRRP instance: VI_1
[2025-12-05 14:25:51] INFO: starting VRRP instance (VRID=51, Priority=100, Interface=eth0)
[2025-12-05 14:25:51] INFO: [VI_1] state changed: INIT -> BACKUP
[2025-12-05 14:25:51] INFO: [VI_1] transitioning to BACKUP state
```
**Node2:**
```bash
sudo ./go-alived --config config-node2.yaml --debug
```
### 测试 2: Master 选举
启动两个节点后,优先级高的 Node1 应该成为 MASTER。
**Node1 预期输出:**
```
[2025-12-05 14:25:54] INFO: [VI_1] master down timer expired, becoming master
[2025-12-05 14:25:54] INFO: [VI_1] state changed: BACKUP -> MASTER
[2025-12-05 14:25:54] INFO: [VI_1] transitioning to MASTER state
[2025-12-05 14:25:54] INFO: [VI_1] adding virtual IPs
[2025-12-05 14:25:54] INFO: [VI_1] added VIP 192.168.1.100/32
[2025-12-05 14:25:54] DEBUG: [VI_1] sent advertisement (priority=100)
```
**验证 VIP:**
```bash
# Node1 上执行
ip addr show eth0 | grep 192.168.1.100
# 应该能看到 VIP 已添加
```
**Node2 保持 BACKUP:**
```
[2025-12-05 14:25:54] DEBUG: [VI_1] received advertisement from 192.168.1.10 (priority=100, state=BACKUP)
# Node2 应该保持 BACKUP 状态
```
### 测试 3: 故障切换
在 Node1 上停止 go-alived
```bash
# Node1 上按 Ctrl+C 或发送 SIGTERM
sudo pkill -SIGTERM go-alived
```
**Node1 预期输出:**
```
[2025-12-05 14:26:10] INFO: received signal terminated, shutting down...
[2025-12-05 14:26:10] INFO: cleaning up resources...
[2025-12-05 14:26:10] INFO: [VI_1] stopping VRRP instance
[2025-12-05 14:26:10] INFO: [VI_1] removing virtual IPs
[2025-12-05 14:26:10] INFO: [VI_1] removed VIP 192.168.1.100/32
```
**Node2 应该接管 (3秒内):**
```
[2025-12-05 14:26:13] INFO: [VI_1] master down timer expired, becoming master
[2025-12-05 14:26:13] INFO: [VI_1] state changed: BACKUP -> MASTER
[2025-12-05 14:26:13] INFO: [VI_1] transitioning to MASTER state
[2025-12-05 14:26:13] INFO: [VI_1] adding virtual IPs
[2025-12-05 14:26:13] INFO: [VI_1] added VIP 192.168.1.100/32
```
**验证 VIP 迁移:**
```bash
# Node2 上执行
ip addr show eth0 | grep 192.168.1.100
# 应该能看到 VIP 已添加
# 从第三台机器 ping VIP应该不中断
ping 192.168.1.100
```
### 测试 4: 抢占测试
重新启动 Node1优先级更高
```bash
# Node1 上执行
sudo ./go-alived --config config-node1.yaml --debug
```
**Node1 预期行为:**
```
[2025-12-05 14:27:00] INFO: [VI_1] state changed: INIT -> BACKUP
[2025-12-05 14:27:03] INFO: [VI_1] master down timer expired, becoming master
[2025-12-05 14:27:03] INFO: [VI_1] state changed: BACKUP -> MASTER
```
**Node2 预期行为 (检测到更高优先级后退位):**
```
[2025-12-05 14:27:03] WARN: [VI_1] received higher priority advertisement, stepping down
[2025-12-05 14:27:03] INFO: [VI_1] state changed: MASTER -> BACKUP
[2025-12-05 14:27:03] INFO: [VI_1] transitioning to BACKUP state
[2025-12-05 14:27:03] INFO: [VI_1] removing virtual IPs
[2025-12-05 14:27:03] INFO: [VI_1] removed VIP 192.168.1.100/32
```
### 测试 5: 配置热加载
修改 Node1 配置文件,改变优先级:
```yaml
priority: 80 # 从 100 改为 80
```
发送 SIGHUP 信号:
```bash
sudo pkill -SIGHUP go-alived
```
**预期输出:**
```
[2025-12-05 14:28:00] INFO: received SIGHUP, reloading configuration...
[2025-12-05 14:28:00] INFO: reloading VRRP configuration...
[2025-12-05 14:28:00] INFO: stopping all VRRP instances
[2025-12-05 14:28:00] INFO: loaded VRRP instance: VI_1
[2025-12-05 14:28:00] INFO: starting VRRP instance (VRID=51, Priority=80, Interface=eth0)
[2025-12-05 14:28:00] INFO: VRRP configuration reloaded successfully
```
## 网络抓包验证
使用 tcpdump 抓取 VRRP 报文:
```bash
# 抓取 VRRP 协议报文 (协议号 112)
sudo tcpdump -i eth0 -n proto 112
# 或者抓取组播地址
sudo tcpdump -i eth0 -n dst 224.0.0.18
```
**预期输出:**
```
14:25:55.123456 IP 192.168.1.10 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s
14:25:56.123456 IP 192.168.1.10 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s
```
## 常见问题排查
### 1. 权限错误
```
failed to create raw socket: operation not permitted
```
**解决:** 使用 `sudo` 运行
### 2. 接口不存在
```
failed to get interface eth0: no such network interface
```
**解决:** 检查并修改配置文件中的 `interface` 字段为实际网卡名
```bash
ip link show # 查看所有网卡
```
### 3. VIP 添加失败
```
failed to add VIP: file exists
```
**解决:** VIP 可能已存在,先删除:
```bash
sudo ip addr del 192.168.1.100/24 dev eth0
```
### 4. 无法接收 VRRP 报文
**检查防火墙:**
```bash
# Linux
sudo iptables -A INPUT -p 112 -j ACCEPT
# macOS
# 系统偏好设置 -> 安全性与隐私 -> 防火墙 -> 防火墙选项 -> 允许 go-alived
```
### 5. macOS 特定问题
macOS 不支持 `SO_BINDTODEVICE`,代码已自动兼容,但可能需要禁用防火墙:
```bash
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --setglobalstate off
```
## 快速验证脚本
```bash
#!/bin/bash
# test-vrrp.sh
echo "=== VRRP 功能测试 ==="
# 1. 检查 VIP 是否添加
echo "1. 检查 VIP..."
ip addr show | grep "192.168.1.100" && echo "✓ VIP 已添加" || echo "✗ VIP 未添加"
# 2. 检查进程
echo "2. 检查进程..."
pgrep -f go-alived && echo "✓ 进程运行中" || echo "✗ 进程未运行"
# 3. 抓包 5 秒
echo "3. 抓取 VRRP 报文 (5秒)..."
timeout 5 sudo tcpdump -i eth0 -n proto 112 -c 5
# 4. Ping VIP
echo "4. Ping VIP..."
ping -c 3 192.168.1.100 && echo "✓ VIP 可达" || echo "✗ VIP 不可达"
echo "=== 测试完成 ==="
```
## 预期测试结果
**通过标准:**
1. 双节点启动后,高优先级节点成为 MASTER
2. MASTER 节点成功添加 VIP
3. 停止 MASTER 后BACKUP 在 3 秒内接管
4. VIP 无缝迁移ping 不中断
5. 高优先级节点重启后成功抢占 MASTER
6. 配置热加载正常工作
7. tcpdump 能抓到周期性的 VRRP Advertisement 报文

419
USAGE.md Normal file
View File

@@ -0,0 +1,419 @@
# go-alived 使用文档
## 命令概览
```bash
go-alived # 运行 VRRP 服务(默认命令)
go-alived run # 运行 VRRP 服务
go-alived test # 测试环境是否支持 VRRP
go-alived --help # 显示帮助信息
go-alived --version # 显示版本信息
```
## 1. 环境测试 (test)
在部署 go-alived 之前,建议先运行环境检测:
```bash
# 基本检测(自动选择网卡)
sudo ./go-alived test
# 指定网卡进行检测
sudo ./go-alived test -i eth0
sudo ./go-alived test --interface eth0
# 指定网卡和测试 VIP
sudo ./go-alived test -i eth0 -v 192.168.1.100/24
sudo ./go-alived test --interface eth0 --vip 192.168.1.100/24
```
**检测项目**
- ✓ Root 权限检查
- ✓ 网络接口状态
- ✓ VIP 添加/删除功能
- ✓ 组播支持
- ✓ 防火墙配置
- ✓ 内核参数
- ✓ 服务冲突检测
- ✓ 虚拟化环境识别
- ✓ 云环境限制检测
**示例输出**
```
=== go-alived 环境测试 ===
检查运行权限...
检查网络接口...
自动选择网卡: eth0
测试VIP添加/删除功能...
检查组播支持...
检查防火墙设置...
检查内核参数...
检查冲突服务...
检查虚拟化环境...
检查云环境...
=== 测试结果 ===
✓ Root权限 以root用户运行
✓ 网络接口 网卡 eth0 存在且已启动
✓ VIP添加 成功添加VIP 192.168.1.100/32
✓ VIP验证 VIP已成功添加到网卡
✓ VIP可达性 VIP可以ping通
✓ VIP删除 VIP删除成功
✓ 组播支持 网卡支持组播
⚠ 防火墙VRRP 防火墙未配置VRRP规则建议添加: iptables -A INPUT -p 112 -j ACCEPT
✓ ip_forward ip_forward = 1 (正常)
✓ 服务冲突 未发现冲突的服务
✓ 虚拟化 KVM/QEMU虚拟机通常支持良好
✓ 云环境 未检测到公有云环境限制
=== 总结 ===
⚠ 环境基本支持,但有 1 个警告
建议修复警告项以获得更好的稳定性
```
### 2. 运行服务 (run)
```bash
# 使用默认配置文件运行
sudo ./go-alived
# 或显式使用 run 命令
sudo ./go-alived run
# 指定配置文件
sudo ./go-alived run -c /etc/go-alived/config.yaml
sudo ./go-alived run --config config.yaml
# 启用调试模式
sudo ./go-alived run -c config.yaml -d
sudo ./go-alived run --config config.yaml --debug
# 简写形式(使用全局参数)
sudo ./go-alived -c config.yaml -d
```
### 3. 信号控制
```bash
# 重载配置(发送 SIGHUP
sudo kill -HUP $(pgrep go-alived)
# 或使用 systemctl如果安装为服务
sudo systemctl reload go-alived
# 优雅停止
sudo kill -TERM $(pgrep go-alived)
# 或
sudo systemctl stop go-alived
```
## 命令行参数
### 全局参数(适用于所有命令)
```
-c, --config string 配置文件路径(默认: /etc/go-alived/config.yaml
-d, --debug 启用调试日志
-h, --help 显示帮助信息
-v, --version 显示版本信息
```
### run 命令参数
```
-c, --config string 配置文件路径(默认: /etc/go-alived/config.yaml
-d, --debug 启用调试日志
```
### test 命令参数
```
-i, --interface string 指定测试网卡名称(如 eth0
-v, --vip string 指定测试 VIP如 192.168.1.100/24
```
## 配置文件
### 最小配置示例
```yaml
# config.mini.yaml - 仅 VRRP 功能
global:
router_id: "node1"
vrrp_instances:
- name: "VI_1"
interface: "eth0"
state: "BACKUP"
virtual_router_id: 51
priority: 100
advert_interval: 1
auth_type: "PASS"
auth_pass: "secret123"
virtual_ips:
- "192.168.1.100/24"
```
### 完整配置示例
```yaml
# config.example.yaml - 包含健康检查
global:
router_id: "node1"
notification_email: "admin@example.com"
vrrp_instances:
- name: "VI_1"
interface: "eth0"
state: "BACKUP"
virtual_router_id: 51
priority: 100
advert_interval: 1
auth_type: "PASS"
auth_pass: "secret123"
virtual_ips:
- "192.168.1.100/24"
- "192.168.1.101/24"
track_scripts:
- "check_nginx"
health_checkers:
- name: "check_nginx"
type: "tcp"
interval: 3s
timeout: 2s
rise: 3
fall: 2
config:
host: "127.0.0.1"
port: 80
```
## 部署方式
### 方式 1: 直接运行
```bash
# 编译
go build -o go-alived .
# 运行测试
sudo ./go-alived test --test-interface eth0
# 启动服务
sudo ./go-alived --config config.yaml --debug
```
### 方式 2: Systemd 服务
```bash
# 使用安装脚本
sudo ./deployment/install.sh
# 编辑配置
sudo vim /etc/go-alived/config.yaml
# 启动服务
sudo systemctl start go-alived
# 查看状态
sudo systemctl status go-alived
# 查看日志
sudo journalctl -u go-alived -f
# 设置开机自启
sudo systemctl enable go-alived
```
## 常见使用场景
### 场景 1: Web 服务高可用
**配置示例**
```yaml
vrrp_instances:
- name: "WEB_HA"
interface: "eth0"
virtual_router_id: 51
priority: 100 # 主节点
virtual_ips:
- "192.168.1.100/24"
track_scripts:
- "check_nginx"
health_checkers:
- name: "check_nginx"
type: "http"
interval: 3s
timeout: 2s
rise: 3
fall: 2
config:
url: "http://127.0.0.1/health"
expected_status: 200
```
**工作原理**
1. Nginx 正常时主节点priority=100持有 VIP
2. Nginx 故障时健康检查失败主节点优先级降低100-10=90
3. 备节点priority=90优先级更高接管 VIP
4. Nginx 恢复后,主节点优先级恢复,重新接管 VIP
### 场景 2: 数据库主备
**主节点配置**
```yaml
vrrp_instances:
- name: "DB_MASTER"
interface: "eth0"
priority: 100
virtual_ips:
- "192.168.1.200/24"
track_scripts:
- "check_mysql"
health_checkers:
- name: "check_mysql"
type: "tcp"
interval: 5s
config:
host: "127.0.0.1"
port: 3306
```
**备节点配置**
```yaml
vrrp_instances:
- name: "DB_MASTER"
interface: "eth0"
priority: 90 # 优先级较低
virtual_ips:
- "192.168.1.200/24"
track_scripts:
- "check_mysql"
```
### 场景 3: 多 VIP 负载均衡
```yaml
vrrp_instances:
- name: "VI_WEB"
virtual_router_id: 51
priority: 100
virtual_ips:
- "192.168.1.100/24"
- name: "VI_API"
virtual_router_id: 52
priority: 90
virtual_ips:
- "192.168.1.101/24"
```
## 故障排查
### 查看日志
```bash
# Systemd 日志
sudo journalctl -u go-alived -f
# 查看最近 100 行
sudo journalctl -u go-alived -n 100
# 查看某个时间段
sudo journalctl -u go-alived --since "1 hour ago"
```
### 抓包调试
```bash
# 抓取 VRRP 报文
sudo tcpdump -i eth0 proto 112 -v
# 抓取指定 VIP 的流量
sudo tcpdump -i eth0 host 192.168.1.100
# 抓取组播报文
sudo tcpdump -i eth0 dst 224.0.0.18
```
### 手动测试 VIP
```bash
# 添加 VIP
sudo ip addr add 192.168.1.100/24 dev eth0
# 发送免费 ARP
sudo arping -c 3 -A -I eth0 192.168.1.100
# 验证
ip addr show eth0 | grep 192.168.1.100
# 删除 VIP
sudo ip addr del 192.168.1.100/24 dev eth0
```
### 检查网卡状态
```bash
# 查看网卡
ip link show
# 查看 IP 地址
ip addr show eth0
# 查看路由
ip route show
# 查看组播组
ip maddr show eth0
```
## 性能优化
### 1. 减少 Advertisement 间隔
```yaml
advert_interval: 1 # 默认 1 秒,可以更快切换
```
### 2. 调整健康检查频率
```yaml
health_checkers:
- interval: 2s # 更频繁的检查
timeout: 1s # 更短的超时
rise: 2 # 更快恢复
fall: 2 # 更快检测故障
```
### 3. 内核参数优化
```bash
# 允许非本地 IP 绑定
echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind
# ARP 优化
echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
```
## 安全建议
1. **使用强密码**: `auth_pass` 使用复杂密码
2. **网络隔离**: 将 VRRP 流量放在独立 VLAN
3. **限制访问**: 使用防火墙限制 VRRP 报文来源
4. **日志审计**: 定期检查状态变化日志
5. **配置备份**: 定期备份配置文件
## 更多资源
- [GitHub 仓库](https://github.com/loveuer/go-alived)
- [部署文档](deployment/README.md)
- [兼容性说明](deployment/COMPATIBILITY.md)
- [测试指南](TESTING.md)

269
deployment/COMPATIBILITY.md Normal file
View File

@@ -0,0 +1,269 @@
# VRRP 环境兼容性说明
## 支持的环境
### ✅ 完全支持
- **物理服务器**: 完全支持所有功能
- **本地虚拟机(网络配置正确)**:
- KVM/QEMU: 完全支持
- Proxmox VE: 完全支持
- VMware ESXi: 需要启用混杂模式
- VirtualBox: 需要桥接网络 + 混杂模式
- Hyper-V: 需要外部网络交换机
### ⚠️ 部分支持
- **某些私有云环境**: 取决于网络配置
- **Docker 容器**: 需要 `--privileged``--net=host` 模式
- **Kubernetes**: 需要 hostNetwork 模式
### ❌ 不支持
- **AWS EC2**: 不支持组播,无法运行 VRRP
- **阿里云 ECS**: 不支持组播,无法运行 VRRP
- **Azure VM**: 默认不支持,需要特殊配置
- **Google Cloud**: 默认不支持组播
- **大多数公有云**: 网络虚拟化层面禁用了组播
## 为什么云环境不支持 VRRP
1. **组播协议限制**: VRRP 使用 IP 组播地址 224.0.0.18,云环境网络虚拟化层通常过滤组播流量
2. **安全考虑**: 云厂商不希望用户自行管理 IP 漂移,避免 IP 冲突
3. **网络架构**: SDN (软件定义网络) 架构不支持传统的 MAC 地址漂移
## 云环境替代方案
### AWS
```yaml
方案1: Elastic IP (EIP)
- 使用 AWS API 动态绑定/解绑 EIP
- 结合健康检查脚本实现故障切换
方案2: Application Load Balancer (ALB)
- 7层负载均衡
- 自动健康检查和故障切换
方案3: Network Load Balancer (NLB)
- 4层负载均衡
- 支持静态 IP
```
### 阿里云
```yaml
方案1: 高可用虚拟IP (HaVip)
- 阿里云提供的 VRRP 替代方案
- 支持主备切换
方案2: 负载均衡 SLB
- 4层/7层负载均衡
- 自动健康检查
```
### Azure
```yaml
方案1: Azure Load Balancer
- 标准负载均衡器
- 支持高可用性
方案2: Azure Traffic Manager
- DNS 级别的流量管理
- 支持多区域故障切换
```
## 虚拟化环境配置指南
### VMware ESXi
1. 选择虚拟机
2. 编辑设置 → 网络适配器
3. 展开 "高级选项"
4. 混杂模式: **允许**
5. MAC 地址更改: **允许**
6. 伪传输: **允许**
### VirtualBox
1. 虚拟机设置 → 网络
2. 连接方式: **桥接网卡**
3. 高级 → 混杂模式: **全部允许**
4. 高级 → 接入网线: **勾选**
### KVM/libvirt
```xml
<interface type='bridge'>
<source bridge='br0'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
```
### Proxmox VE
默认配置即可支持,使用 vmbr0 桥接网络。
## 检测脚本使用
运行环境检测脚本:
```bash
# 下载并运行检测脚本
sudo ./deployment/check-env.sh
```
脚本会自动检测:
1. ✓ 运行权限root
2. ✓ 操作系统兼容性
3. ✓ 网络接口状态
4. ✓ VIP 添加能力
5. ✓ VRRP 协议支持
6. ✓ 防火墙配置
7. ✓ 内核参数
8. ✓ 服务冲突检测
9. ✓ 组播支持
10. ✓ 虚拟化环境
11. ✓ 云环境限制
## 常见问题排查
### 1. VIP 无法添加
**症状**: `ip addr add` 命令失败
**可能原因**:
- 权限不足(需要 root
- IP 地址冲突
- 网络接口不存在或未启动
- 子网掩码错误
**解决方法**:
```bash
# 检查网卡状态
ip link show eth0
# 检查 IP 冲突
arping -I eth0 192.168.1.100
# 手动测试添加
sudo ip addr add 192.168.1.100/24 dev eth0
```
### 2. VIP 添加成功但无法 Ping 通
**可能原因**:
- 防火墙阻止 ICMP
- 路由配置错误
- ARP 表未更新
- 网络隔离VLAN
**解决方法**:
```bash
# 发送免费 ARP
arping -c 3 -A -I eth0 192.168.1.100
# 检查路由
ip route show
# 检查防火墙
iptables -L -n | grep ICMP
```
### 3. VRRP 报文无法发送/接收
**症状**: 双节点无法选举 Master
**可能原因**:
- 组播被过滤
- 防火墙阻止协议 112
- 网络交换机禁用组播
- 虚拟机混杂模式未启用
**解决方法**:
```bash
# 抓包验证 VRRP 报文
sudo tcpdump -i eth0 proto 112 -v
# 检查组播路由
ip maddr show eth0
# 添加防火墙规则
sudo iptables -A INPUT -p 112 -j ACCEPT
sudo iptables -A OUTPUT -p 112 -j ACCEPT
```
### 4. 云环境 VRRP 不工作
**确认方法**:
```bash
# 运行检测脚本
sudo ./deployment/check-env.sh
# 手动检查云环境
curl -s -m 1 http://169.254.169.254/latest/meta-data/instance-id
```
**解决方案**: 使用云厂商提供的高可用方案(见上方"云环境替代方案"
## 网络环境要求
### 必需条件
- [x] 二层网络连通(同一 VLAN/子网)
- [x] 支持组播224.0.0.18
- [x] 允许 ARP 广播
- [x] 网卡支持混杂模式(虚拟机环境)
### 推荐配置
- [x] 千兆以上网络
- [x] 低延迟网络(< 10ms
- [x] 禁用 STP 或配置 PortFast交换机
- [x] 专用 VLAN生产环境
## 测试步骤
### 1. 基础网络测试
```bash
# 测试网卡连通性
ping -c 3 <对端IP>
# 测试组播连通性(需要两台机器)
# 机器 A
iperf3 -s -B 224.0.0.18
# 机器 B
iperf3 -c 224.0.0.18 -u -b 1M
```
### 2. VIP 手动测试
```bash
# 添加 VIP
sudo ip addr add 192.168.1.100/24 dev eth0
# 发送免费 ARP
sudo arping -c 3 -A -I eth0 192.168.1.100
# 从其他机器 ping VIP
ping 192.168.1.100
# 删除 VIP
sudo ip addr del 192.168.1.100/24 dev eth0
```
### 3. VRRP 功能测试
```bash
# 使用最小配置启动
sudo ./go-alived --config config.mini.yaml --debug
# 另一个终端监控网卡
watch -n 1 "ip addr show eth0 | grep inet"
# 抓包验证
sudo tcpdump -i eth0 proto 112 -v
```
## 生产环境部署建议
1. **使用专用网络**: VRRP 流量与业务流量隔离
2. **配置监控**: 监控 VIP 状态VRRP 状态变化
3. **测试故障切换**: 定期测试主备切换是否正常
4. **文档记录**: 记录网络拓扑IP 分配故障处理流程
5. **备份配置**: 定期备份 go-alived 配置文件
## 参考文档
- [VRRP RFC 3768](https://tools.ietf.org/html/rfc3768)
- [Linux IP 命令手册](https://man7.org/linux/man-pages/man8/ip.8.html)
- [iptables VRRP 配置](https://www.netfilter.org/)

166
deployment/README.md Normal file
View File

@@ -0,0 +1,166 @@
# go-alived Deployment
本目录包含 go-alived 的部署文件和安装脚本。
## Systemd Service
### 安装步骤
1. **编译二进制文件**
```bash
go build -o go-alived .
```
2. **安装二进制文件**
```bash
sudo cp go-alived /usr/local/bin/
sudo chmod +x /usr/local/bin/go-alived
```
3. **创建配置目录**
```bash
sudo mkdir -p /etc/go-alived
sudo mkdir -p /etc/go-alived/scripts
```
4. **复制配置文件**
```bash
sudo cp config.example.yaml /etc/go-alived/config.yaml
sudo vim /etc/go-alived/config.yaml # 根据实际环境修改配置
```
5. **安装 systemd 服务**
```bash
sudo cp deployment/go-alived.service /etc/systemd/system/
sudo systemctl daemon-reload
```
6. **启动服务**
```bash
# 启动服务
sudo systemctl start go-alived
# 查看状态
sudo systemctl status go-alived
# 查看日志
sudo journalctl -u go-alived -f
# 设置开机自启
sudo systemctl enable go-alived
```
### 服务管理命令
```bash
# 启动服务
sudo systemctl start go-alived
# 停止服务
sudo systemctl stop go-alived
# 重启服务
sudo systemctl restart go-alived
# 重载配置(发送 SIGHUP 信号)
sudo systemctl reload go-alived
# 查看服务状态
sudo systemctl status go-alived
# 查看实时日志
sudo journalctl -u go-alived -f
# 查看最近的日志
sudo journalctl -u go-alived -n 100
# 启用开机自启
sudo systemctl enable go-alived
# 禁用开机自启
sudo systemctl disable go-alived
```
## Service 文件说明
### 主要配置项
- **ExecStart**: 服务启动命令,指向 `/usr/local/bin/go-alived`
- **ExecReload**: 重载配置命令(发送 SIGHUP 信号)
- **User/Group**: 以 root 用户运行(需要 raw socket 和网络接口管理权限)
- **Restart**: 失败时自动重启,间隔 5 秒
### 安全设置
- **Capabilities**:
- `CAP_NET_ADMIN`: 管理网络接口(添加/删除 IP
- `CAP_NET_RAW`: 创建原始 socketVRRP 协议)
- `CAP_NET_BIND_SERVICE`: 绑定特权端口(可选)
- **Protection**:
- `ProtectSystem=strict`: 保护系统目录只读
- `ProtectHome=true`: 保护用户主目录
- `PrivateTmp=true`: 使用私有临时目录
- `ReadWritePaths=/etc/go-alived`: 仅允许写入配置目录
### 资源限制
- `LimitNOFILE=65535`: 最大打开文件数
- `LimitNPROC=512`: 最大进程数
## 配置文件位置
默认配置文件位置:`/etc/go-alived/config.yaml`
推荐的目录结构:
```
/etc/go-alived/
├── config.yaml # 主配置文件
└── scripts/ # 脚本目录
├── notify_master.sh # Master 状态通知脚本
├── notify_backup.sh # Backup 状态通知脚本
├── notify_fault.sh # Fault 状态通知脚本
└── check_service.sh # 健康检查脚本
```
## 卸载
```bash
# 停止并禁用服务
sudo systemctl stop go-alived
sudo systemctl disable go-alived
# 删除服务文件
sudo rm /etc/systemd/system/go-alived.service
sudo systemctl daemon-reload
# 删除二进制文件
sudo rm /usr/local/bin/go-alived
# 删除配置文件(可选)
sudo rm -rf /etc/go-alived
```
## 故障排查
### 查看服务状态
```bash
sudo systemctl status go-alived
```
### 查看详细日志
```bash
sudo journalctl -u go-alived -n 100 --no-pager
```
### 测试配置文件
```bash
/usr/local/bin/go-alived --config /etc/go-alived/config.yaml --debug
```
### 常见问题
1. **权限错误**: 确保服务以 root 运行或具有 CAP_NET_ADMIN/CAP_NET_RAW 权限
2. **网卡不存在**: 检查配置文件中的 interface 是否正确
3. **端口冲突**: 确保没有其他 keepalived 或 VRRP 服务在运行
4. **VIP 添加失败**: 检查网络配置和 IP 地址是否冲突

334
deployment/check-env.sh Executable file
View File

@@ -0,0 +1,334 @@
#!/bin/bash
# VIP 环境检测脚本
# 用于检测当前环境是否支持 VRRP 和 VIP 功能
set -e
COLOR_RED='\033[0;31m'
COLOR_GREEN='\033[0;32m'
COLOR_YELLOW='\033[1;33m'
COLOR_BLUE='\033[0;34m'
COLOR_NC='\033[0m'
ERRORS=0
WARNINGS=0
echo -e "${COLOR_BLUE}=== go-alived 环境检测工具 ===${COLOR_NC}"
echo ""
check_pass() {
echo -e "${COLOR_GREEN}${COLOR_NC} $1"
}
check_fail() {
echo -e "${COLOR_RED}${COLOR_NC} $1"
ERRORS=$((ERRORS + 1))
}
check_warn() {
echo -e "${COLOR_YELLOW}${COLOR_NC} $1"
WARNINGS=$((WARNINGS + 1))
}
# 1. 检查是否 root 用户
echo "1. 检查运行权限..."
if [ "$EUID" -eq 0 ]; then
check_pass "以 root 用户运行"
else
check_fail "需要 root 权限,请使用 sudo 运行此脚本"
fi
echo ""
# 2. 检查操作系统
echo "2. 检查操作系统..."
OS=$(uname -s)
if [ "$OS" = "Linux" ]; then
check_pass "操作系统: $OS"
DISTRO=$(cat /etc/os-release | grep ^NAME= | cut -d'"' -f2 || echo "Unknown")
echo " 发行版: $DISTRO"
elif [ "$OS" = "Darwin" ]; then
check_warn "操作系统: macOS - 功能受限,仅支持部分 VRRP 功能"
echo " macOS 不支持某些 Linux 特有的网络功能"
else
check_fail "不支持的操作系统: $OS"
fi
echo ""
# 3. 检查网络接口
echo "3. 检查网络接口..."
read -p "请输入要使用的网卡名称(如 eth0, ens33, en0: " INTERFACE
if ip link show "$INTERFACE" > /dev/null 2>&1; then
check_pass "网卡 $INTERFACE 存在"
# 检查接口状态
STATE=$(ip link show "$INTERFACE" | grep -o "state [A-Z]*" | awk '{print $2}')
if [ "$STATE" = "UP" ]; then
check_pass "网卡状态: UP"
else
check_fail "网卡状态: $STATE (需要是 UP)"
fi
# 检查是否有 IPv4 地址
IP_ADDR=$(ip -4 addr show "$INTERFACE" | grep "inet " | awk '{print $2}' | head -n1)
if [ -n "$IP_ADDR" ]; then
check_pass "网卡已配置 IPv4 地址: $IP_ADDR"
else
check_fail "网卡未配置 IPv4 地址"
fi
else
check_fail "网卡 $INTERFACE 不存在"
echo " 可用网卡列表:"
ip link show | grep "^[0-9]" | awk '{print " - " $2}' | sed 's/:$//'
fi
echo ""
# 4. 检查 VIP 是否可以添加
echo "4. 测试 VIP 添加功能..."
read -p "请输入要测试的 VIP (如 192.168.1.100/24): " TEST_VIP
if [ -n "$TEST_VIP" ] && [ -n "$INTERFACE" ]; then
# 检查 VIP 格式
if [[ $TEST_VIP =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/[0-9]+$ ]]; then
check_pass "VIP 格式正确: $TEST_VIP"
# 尝试添加 VIP
if ip addr add "$TEST_VIP" dev "$INTERFACE" 2>/dev/null; then
check_pass "VIP 添加成功"
# 验证 VIP 是否真的添加了
if ip addr show "$INTERFACE" | grep -q "$TEST_VIP"; then
check_pass "VIP 已添加到网卡"
# 测试 VIP 是否可达(本机 ping
VIP_ADDR=$(echo $TEST_VIP | cut -d'/' -f1)
if ping -c 1 -W 1 "$VIP_ADDR" > /dev/null 2>&1; then
check_pass "VIP 可以 ping 通"
else
check_warn "VIP ping 失败(可能需要配置路由)"
fi
else
check_fail "VIP 添加后无法在网卡上找到"
fi
# 清理:删除测试 VIP
echo " 清理测试 VIP..."
ip addr del "$TEST_VIP" dev "$INTERFACE" 2>/dev/null || true
check_pass "测试 VIP 已删除"
else
check_fail "VIP 添加失败(可能是权限问题或 IP 冲突)"
fi
else
check_fail "VIP 格式错误,正确格式: 192.168.1.100/24"
fi
fi
echo ""
# 5. 检查 VRRP 协议支持
echo "5. 检查 VRRP 协议支持..."
# 检查是否可以创建 raw socket
if [ "$OS" = "Linux" ]; then
if [ -e /proc/sys/net/ipv4/ip_forward ]; then
IP_FORWARD=$(cat /proc/sys/net/ipv4/ip_forward)
if [ "$IP_FORWARD" = "1" ]; then
check_pass "IP 转发已启用"
else
check_warn "IP 转发未启用(某些场景需要)"
echo " 启用命令: echo 1 > /proc/sys/net/ipv4/ip_forward"
fi
fi
fi
# 检查防火墙
echo ""
echo "6. 检查防火墙设置..."
if [ "$OS" = "Linux" ]; then
# 检查 iptables
if command -v iptables > /dev/null 2>&1; then
if iptables -L INPUT -n | grep -q "112"; then
check_pass "防火墙已允许 VRRP 协议 (112)"
else
check_warn "防火墙未配置 VRRP 规则"
echo " 添加规则: iptables -A INPUT -p 112 -j ACCEPT"
fi
fi
# 检查 firewalld
if command -v firewall-cmd > /dev/null 2>&1; then
if systemctl is-active --quiet firewalld; then
if firewall-cmd --list-protocols | grep -q vrrp; then
check_pass "firewalld 已允许 VRRP 协议"
else
check_warn "firewalld 未配置 VRRP 规则"
echo " 添加规则: firewall-cmd --permanent --add-protocol=vrrp"
echo " 重载配置: firewall-cmd --reload"
fi
fi
fi
fi
echo ""
# 7. 检查内核参数
echo "7. 检查内核参数..."
if [ "$OS" = "Linux" ]; then
# 检查 ARP 相关参数
if [ -e /proc/sys/net/ipv4/conf/all/arp_ignore ]; then
ARP_IGNORE=$(cat /proc/sys/net/ipv4/conf/all/arp_ignore)
if [ "$ARP_IGNORE" = "0" ]; then
check_pass "ARP 配置正常"
else
check_warn "ARP ignore 设置为 $ARP_IGNORE,可能影响 VIP"
fi
fi
# 检查 rp_filter
if [ -e /proc/sys/net/ipv4/conf/all/rp_filter ]; then
RP_FILTER=$(cat /proc/sys/net/ipv4/conf/all/rp_filter)
if [ "$RP_FILTER" = "0" ] || [ "$RP_FILTER" = "2" ]; then
check_pass "反向路径过滤配置正常"
else
check_warn "rp_filter 设置为 $RP_FILTER,建议设置为 0 或 2"
echo " 修改命令: echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter"
fi
fi
fi
echo ""
# 8. 检查是否有其他 VRRP 服务
echo "8. 检查冲突的服务..."
CONFLICT_SERVICES=("keepalived" "vrrpd")
for service in "${CONFLICT_SERVICES[@]}"; do
if systemctl is-active --quiet "$service" 2>/dev/null; then
check_warn "发现运行中的 $service 服务,可能冲突"
echo " 停止命令: systemctl stop $service"
fi
done
if pgrep -x "keepalived" > /dev/null; then
check_warn "发现运行中的 keepalived 进程"
fi
echo ""
# 9. 检查组播支持
echo "9. 检查组播支持..."
if [ -n "$INTERFACE" ]; then
if ip maddr show "$INTERFACE" > /dev/null 2>&1; then
check_pass "网卡支持组播"
# 尝试 ping 组播地址
if timeout 2 ping -c 1 -I "$INTERFACE" 224.0.0.18 > /dev/null 2>&1; then
check_pass "可以发送组播报文"
else
check_warn "组播报文发送可能受限(正常情况)"
fi
else
check_warn "无法查询组播配置"
fi
fi
echo ""
# 10. 虚拟化环境检测
echo "10. 检查虚拟化环境..."
if [ -e /sys/class/dmi/id/product_name ]; then
PRODUCT=$(cat /sys/class/dmi/id/product_name 2>/dev/null || echo "Unknown")
case $PRODUCT in
*VMware*)
check_warn "检测到 VMware 虚拟机"
echo " VMware 需要启用混杂模式才能支持 VRRP"
echo " 设置: 虚拟机 -> 网络适配器 -> 高级 -> 混杂模式: 允许全部"
;;
*VirtualBox*)
check_warn "检测到 VirtualBox 虚拟机"
echo " VirtualBox 需要使用桥接模式且启用混杂模式"
echo " 设置: 网络 -> 桥接网卡 -> 高级 -> 混杂模式: 全部允许"
;;
*KVM*|*QEMU*)
check_pass "检测到 KVM/QEMU 虚拟机(通常支持良好)"
;;
*Amazon*|*EC2*)
check_fail "检测到 AWS EC2 实例 - 不支持 VRRP"
echo " AWS 不支持组播协议,请使用 AWS Elastic IP 替代"
;;
*)
if [ "$PRODUCT" != "Unknown" ]; then
echo " 物理机或未识别的虚拟化: $PRODUCT"
fi
;;
esac
elif command -v systemd-detect-virt > /dev/null 2>&1; then
VIRT=$(systemd-detect-virt)
if [ "$VIRT" != "none" ]; then
check_warn "检测到虚拟化环境: $VIRT"
fi
fi
echo ""
# 11. 云环境检测
echo "11. 检查云环境限制..."
CLOUD_DETECTED=0
# 检查 AWS
if curl -s -m 1 http://169.254.169.254/latest/meta-data/instance-id > /dev/null 2>&1; then
check_fail "检测到 AWS 环境 - 不支持 VRRP"
echo " AWS 不支持 VRRP 协议,请使用:"
echo " - Elastic IP (EIP) 实现 IP 漂移"
echo " - Application Load Balancer (ALB)"
echo " - Network Load Balancer (NLB)"
CLOUD_DETECTED=1
fi
# 检查 阿里云
if curl -s -m 1 http://100.100.100.200/latest/meta-data/instance-id > /dev/null 2>&1; then
check_fail "检测到阿里云 ECS - 不支持 VRRP"
echo " 阿里云 ECS 不支持 VRRP请使用:"
echo " - 负载均衡 SLB"
echo " - 高可用虚拟 IP (HaVip)"
CLOUD_DETECTED=1
fi
# 检查 Azure
if curl -s -m 1 -H "Metadata: true" http://169.254.169.254/metadata/instance?api-version=2021-02-01 > /dev/null 2>&1; then
check_warn "检测到 Azure 环境 - VRRP 支持受限"
echo " Azure 建议使用:"
echo " - Azure Load Balancer"
echo " - Azure Traffic Manager"
CLOUD_DETECTED=1
fi
# 检查 GCP
if curl -s -m 1 -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/id > /dev/null 2>&1; then
check_warn "检测到 Google Cloud 环境 - VRRP 支持受限"
echo " GCP 建议使用:"
echo " - Cloud Load Balancing"
echo " - Forwarding Rules"
CLOUD_DETECTED=1
fi
if [ $CLOUD_DETECTED -eq 0 ]; then
check_pass "未检测到云环境限制"
fi
echo ""
# 总结
echo ""
echo -e "${COLOR_BLUE}=== 检测总结 ===${COLOR_NC}"
echo ""
if [ $ERRORS -eq 0 ] && [ $WARNINGS -eq 0 ]; then
echo -e "${COLOR_GREEN}✓ 环境完全支持 go-alived${COLOR_NC}"
echo " 可以正常使用所有功能"
elif [ $ERRORS -eq 0 ]; then
echo -e "${COLOR_YELLOW}⚠ 环境基本支持,但有 $WARNINGS 个警告${COLOR_NC}"
echo " 建议修复警告项以获得更好的稳定性"
else
echo -e "${COLOR_RED}✗ 发现 $ERRORS 个错误, $WARNINGS 个警告${COLOR_NC}"
echo " 请修复错误后再使用 go-alived"
fi
echo ""
echo "详细文档: https://github.com/loveuer/go-alived"
echo ""
exit $ERRORS

View File

@@ -0,0 +1,38 @@
[Unit]
Description=Go-Alived - VRRP High Availability Service
Documentation=https://github.com/loveuer/go-alived
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/bin/go-alived --config /etc/go-alived/config.yaml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5s
StandardOutput=journal
StandardError=journal
SyslogIdentifier=go-alived
# Security settings
NoNewPrivileges=false
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/etc/go-alived
# Resource limits
LimitNOFILE=65535
LimitNPROC=512
# Capabilities required for VRRP operations
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_RAW CAP_NET_BIND_SERVICE
[Install]
WantedBy=multi-user.target

53
deployment/install.sh Executable file
View File

@@ -0,0 +1,53 @@
#!/bin/bash
set -e
echo "=== Installing go-alived ==="
if [ "$EUID" -ne 0 ]; then
echo "Please run as root (use sudo)"
exit 1
fi
BINARY_PATH="/usr/local/bin/go-alived"
CONFIG_DIR="/etc/go-alived"
SERVICE_FILE="/etc/systemd/system/go-alived.service"
echo "1. Installing binary to ${BINARY_PATH}..."
if [ ! -f "go-alived" ]; then
echo "Error: go-alived binary not found. Please run 'go build' first."
exit 1
fi
cp go-alived ${BINARY_PATH}
chmod +x ${BINARY_PATH}
echo " ✓ Binary installed"
echo "2. Creating configuration directory ${CONFIG_DIR}..."
mkdir -p ${CONFIG_DIR}
mkdir -p ${CONFIG_DIR}/scripts
echo " ✓ Directories created"
if [ ! -f "${CONFIG_DIR}/config.yaml" ]; then
echo "3. Installing example configuration..."
cp config.example.yaml ${CONFIG_DIR}/config.yaml
echo " ✓ Configuration installed to ${CONFIG_DIR}/config.yaml"
echo " ⚠ Please edit ${CONFIG_DIR}/config.yaml before starting the service"
else
echo "3. Configuration already exists at ${CONFIG_DIR}/config.yaml"
echo " ⚠ Skipping configuration installation"
fi
echo "4. Installing systemd service..."
cp deployment/go-alived.service ${SERVICE_FILE}
systemctl daemon-reload
echo " ✓ Service installed"
echo ""
echo "=== Installation complete ==="
echo ""
echo "Next steps:"
echo " 1. Edit configuration: vim ${CONFIG_DIR}/config.yaml"
echo " 2. Start service: systemctl start go-alived"
echo " 3. Check status: systemctl status go-alived"
echo " 4. View logs: journalctl -u go-alived -f"
echo " 5. Enable autostart: systemctl enable go-alived"
echo ""

53
deployment/uninstall.sh Executable file
View File

@@ -0,0 +1,53 @@
#!/bin/bash
set -e
echo "=== Uninstalling go-alived ==="
if [ "$EUID" -ne 0 ]; then
echo "Please run as root (use sudo)"
exit 1
fi
BINARY_PATH="/usr/local/bin/go-alived"
CONFIG_DIR="/etc/go-alived"
SERVICE_FILE="/etc/systemd/system/go-alived.service"
if systemctl is-active --quiet go-alived; then
echo "1. Stopping service..."
systemctl stop go-alived
echo " ✓ Service stopped"
fi
if systemctl is-enabled --quiet go-alived 2>/dev/null; then
echo "2. Disabling service..."
systemctl disable go-alived
echo " ✓ Service disabled"
fi
if [ -f "${SERVICE_FILE}" ]; then
echo "3. Removing service file..."
rm ${SERVICE_FILE}
systemctl daemon-reload
echo " ✓ Service file removed"
fi
if [ -f "${BINARY_PATH}" ]; then
echo "4. Removing binary..."
rm ${BINARY_PATH}
echo " ✓ Binary removed"
fi
echo ""
read -p "Do you want to remove configuration directory ${CONFIG_DIR}? (y/N) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
if [ -d "${CONFIG_DIR}" ]; then
rm -rf ${CONFIG_DIR}
echo " ✓ Configuration removed"
fi
else
echo " ⚠ Configuration kept at ${CONFIG_DIR}"
fi
echo ""
echo "=== Uninstallation complete ==="

65
etc/config.example.yaml Normal file
View File

@@ -0,0 +1,65 @@
global:
router_id: "node1"
notification_email: "admin@example.com"
vrrp_instances:
- name: "VI_1"
interface: "eth0"
state: "BACKUP"
virtual_router_id: 51
priority: 100
advert_interval: 1
auth_type: "PASS"
auth_pass: "secret123"
virtual_ips:
- "192.168.1.100/24"
- "192.168.1.101/24"
notify_master: "/etc/go-alived/scripts/notify_master.sh"
notify_backup: "/etc/go-alived/scripts/notify_backup.sh"
notify_fault: "/etc/go-alived/scripts/notify_fault.sh"
track_scripts:
- "check_nginx"
health_checkers:
- name: "check_nginx"
type: "tcp"
interval: 3s
timeout: 2s
rise: 3
fall: 2
config:
host: "127.0.0.1"
port: 80
- name: "check_web"
type: "http"
interval: 5s
timeout: 3s
rise: 2
fall: 3
config:
url: "http://127.0.0.1:80/health"
expected_status: 200
method: "GET"
insecure_skip_verify: false
- name: "check_ping"
type: "ping"
interval: 2s
timeout: 1s
rise: 2
fall: 2
config:
host: "8.8.8.8"
count: 1
- name: "check_service"
type: "script"
interval: 10s
timeout: 5s
rise: 1
fall: 1
config:
script: "/usr/local/bin/check_service.sh"
args:
- "nginx"

14
etc/config.mini.yaml Normal file
View File

@@ -0,0 +1,14 @@
global:
router_id: "node1"
vrrp_instances:
- name: "VI_1"
interface: "eth0"
state: "BACKUP"
virtual_router_id: 51
priority: 100
advert_interval: 1
auth_type: "PASS"
auth_pass: "secret123"
virtual_ips:
- "192.168.1.100/24"

20
go.mod Normal file
View File

@@ -0,0 +1,20 @@
module github.com/loveuer/go-alived
go 1.25.0
require (
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/josharian/native v1.0.0 // indirect
github.com/mdlayher/arp v0.0.0-20220512170110-6706a2966875 // indirect
github.com/mdlayher/ethernet v0.0.0-20220221185849-529eae5b6118 // indirect
github.com/mdlayher/packet v1.0.0 // indirect
github.com/mdlayher/socket v0.2.1 // indirect
github.com/spf13/cobra v1.10.2 // indirect
github.com/spf13/pflag v1.0.9 // indirect
github.com/vishvananda/netlink v1.3.1 // indirect
github.com/vishvananda/netns v0.0.5 // indirect
golang.org/x/net v0.47.0 // indirect
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c // indirect
golang.org/x/sys v0.38.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

46
go.sum Normal file
View File

@@ -0,0 +1,46 @@
github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g=
github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.7/go.mod h1:n+brtR0CgQNWTVd5ZUFpTBC8YFBDLK/h/bpaJ8/DtOE=
github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
github.com/josharian/native v1.0.0 h1:Ts/E8zCSEsG17dUqv7joXJFybuMLjQfWE04tsBODTxk=
github.com/josharian/native v1.0.0/go.mod h1:7X/raswPFr05uY3HiLlYeyQntB6OO7E/d2Cu7qoaN2w=
github.com/mdlayher/arp v0.0.0-20220512170110-6706a2966875 h1:ql8x//rJsHMjS+qqEag8n3i4azw1QneKh5PieH9UEbY=
github.com/mdlayher/arp v0.0.0-20220512170110-6706a2966875/go.mod h1:kfOoFJuHWp76v1RgZCb9/gVUc7XdY877S2uVYbNliGc=
github.com/mdlayher/ethernet v0.0.0-20220221185849-529eae5b6118 h1:2oDp6OOhLxQ9JBoUuysVz9UZ9uI6oLUbvAZu0x8o+vE=
github.com/mdlayher/ethernet v0.0.0-20220221185849-529eae5b6118/go.mod h1:ZFUnHIVchZ9lJoWoEGUg8Q3M4U8aNNWA3CVSUTkW4og=
github.com/mdlayher/packet v1.0.0 h1:InhZJbdShQYt6XV2GPj5XHxChzOfhJJOMbvnGAmOfQ8=
github.com/mdlayher/packet v1.0.0/go.mod h1:eE7/ctqDhoiRhQ44ko5JZU2zxB88g+JH/6jmnjzPjOU=
github.com/mdlayher/socket v0.2.1 h1:F2aaOwb53VsBE+ebRS9bLd7yPOfYUMC8lOODdCBDY6w=
github.com/mdlayher/socket v0.2.1/go.mod h1:QLlNPkFR88mRUNQIzRBMfXxwKal8H7u1h3bL1CV+f0E=
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/spf13/cobra v1.10.2 h1:DMTTonx5m65Ic0GOoRY2c16WCbHxOOw6xxezuLaBpcU=
github.com/spf13/cobra v1.10.2/go.mod h1:7C1pvHqHw5A4vrJfjNwvOdzYu0Gml16OCs2GRiTUUS4=
github.com/spf13/pflag v1.0.9 h1:9exaQaMOCwffKiiiYk6/BndUBv+iRViNW+4lEMi0PvY=
github.com/spf13/pflag v1.0.9/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/vishvananda/netlink v1.3.1 h1:3AEMt62VKqz90r0tmNhog0r/PpWKmrEShJU0wJW6bV0=
github.com/vishvananda/netlink v1.3.1/go.mod h1:ARtKouGSTGchR8aMwmkzC0qiNPrrWO5JS/XMVl45+b4=
github.com/vishvananda/netns v0.0.5 h1:DfiHV+j8bA32MFM7bfEunvT8IAqQ/NzSJHtcmW5zdEY=
github.com/vishvananda/netns v0.0.5/go.mod h1:SpkAiCQRtJ6TvvxPnOSyH3BMl6unz3xZlaprSwhNNJM=
go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20190603091049-60506f45cf65 h1:+rhAzEzT3f4JtomfC371qB+0Ola2caSKcY69NUBZrRQ=
golang.org/x/net v0.0.0-20190603091049-60506f45cf65/go.mod h1:HSz+uSET+XFnRR8LxR5pz3Of3rY3CfYBVs4xY44aLks=
golang.org/x/net v0.47.0 h1:Mx+4dIFzqraBXUugkia1OOvlD6LemFo1ALMHjrXDOhY=
golang.org/x/net v0.47.0/go.mod h1:/jNxtkgq5yWUGYkaZGqo27cfGZ1c5Nen03aYrrKpVRU=
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c h1:5KslGYwFpkhGh+Q16bwMP3cOontH8FOep7tGV86Y7SQ=
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20210927094055-39ccf1dd6fa6/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220209214540-3681064d5158/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.2.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.10.0 h1:SqMFp9UcQJZa+pmYuAKjd9xq1f0j5rLcDIk0mj4qAsA=
golang.org/x/sys v0.10.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.38.0 h1:3yZWxaJjBmCWXqhN1qh02AkOnCQ1poK6oF+a7xWL6Gc=
golang.org/x/sys v0.38.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

25
internal/cmd/root.go Normal file
View File

@@ -0,0 +1,25 @@
package cmd
import (
"os"
"github.com/spf13/cobra"
)
var rootCmd = &cobra.Command{
Use: "go-alived",
Short: "Go-Alived - VRRP High Availability Service",
Long: `go-alived is a lightweight, dependency-free VRRP implementation in Go.
It provides high availability for IP addresses with health checking support.`,
Version: "1.0.0",
}
func Execute() {
if err := rootCmd.Execute(); err != nil {
os.Exit(1)
}
}
func init() {
rootCmd.CompletionOptions.DisableDefaultCmd = true
}

133
internal/cmd/run.go Normal file
View File

@@ -0,0 +1,133 @@
package cmd
import (
"os"
"os/signal"
"syscall"
"github.com/loveuer/go-alived/internal/health"
"github.com/loveuer/go-alived/internal/vrrp"
"github.com/loveuer/go-alived/pkg/config"
"github.com/loveuer/go-alived/pkg/logger"
"github.com/spf13/cobra"
)
var (
configFile string
debug bool
)
var runCmd = &cobra.Command{
Use: "run",
Short: "Run the VRRP service",
Long: `Start the go-alived VRRP service with health checking.`,
Run: runService,
}
func init() {
rootCmd.AddCommand(runCmd)
runCmd.Flags().StringVarP(&configFile, "config", "c", "/etc/go-alived/config.yaml", "path to configuration file")
runCmd.Flags().BoolVarP(&debug, "debug", "d", false, "enable debug mode")
}
func runService(cmd *cobra.Command, args []string) {
log := logger.New(debug)
log.Info("starting go-alived...")
log.Info("loading configuration from: %s", configFile)
cfg, err := config.Load(configFile)
if err != nil {
log.Error("failed to load configuration: %v", err)
os.Exit(1)
}
log.Info("configuration loaded successfully")
log.Debug("config: %+v", cfg)
healthMgr, err := health.LoadFromConfig(cfg, log)
if err != nil {
log.Error("failed to load health check configuration: %v", err)
os.Exit(1)
}
vrrpMgr := vrrp.NewManager(log)
if err := vrrpMgr.LoadFromConfig(cfg); err != nil {
log.Error("failed to load VRRP configuration: %v", err)
os.Exit(1)
}
setupHealthTracking(vrrpMgr, healthMgr, log)
healthMgr.StartAll()
if err := vrrpMgr.StartAll(); err != nil {
log.Error("failed to start VRRP instances: %v", err)
os.Exit(1)
}
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM, syscall.SIGHUP)
for {
sig := <-sigChan
switch sig {
case syscall.SIGHUP:
log.Info("received SIGHUP, reloading configuration...")
newCfg, err := config.Load(configFile)
if err != nil {
log.Error("failed to reload configuration: %v", err)
continue
}
if err := vrrpMgr.Reload(newCfg); err != nil {
log.Error("failed to reload VRRP: %v", err)
continue
}
cfg = newCfg
log.Info("configuration reloaded successfully")
case syscall.SIGINT, syscall.SIGTERM:
log.Info("received signal %v, shutting down...", sig)
cleanup(log, vrrpMgr, healthMgr)
os.Exit(0)
}
}
}
func cleanup(log *logger.Logger, vrrpMgr *vrrp.Manager, healthMgr *health.Manager) {
log.Info("cleaning up resources...")
healthMgr.StopAll()
vrrpMgr.StopAll()
}
func setupHealthTracking(vrrpMgr *vrrp.Manager, healthMgr *health.Manager, log *logger.Logger) {
instances := vrrpMgr.GetAllInstances()
for _, inst := range instances {
for _, trackScript := range inst.TrackScripts {
monitor, ok := healthMgr.GetMonitor(trackScript)
if !ok {
log.Warn("[%s] track_script '%s' not found in health checkers", inst.Name, trackScript)
continue
}
instanceName := inst.Name
monitor.OnStateChange(func(checkerName string, oldHealthy, newHealthy bool) {
vrrpInst, ok := vrrpMgr.GetInstance(instanceName)
if !ok {
return
}
if newHealthy && !oldHealthy {
log.Info("[%s] health check '%s' recovered, resetting priority", instanceName, checkerName)
vrrpInst.ResetPriority()
} else if !newHealthy && oldHealthy {
log.Warn("[%s] health check '%s' failed, decreasing priority", instanceName, checkerName)
vrrpInst.AdjustPriority(-10)
}
})
log.Info("[%s] tracking health check: %s", inst.Name, trackScript)
}
}
}

470
internal/cmd/test.go Normal file
View File

@@ -0,0 +1,470 @@
package cmd
import (
"fmt"
"net"
"os"
"os/exec"
"strings"
"time"
"github.com/loveuer/go-alived/pkg/logger"
"github.com/loveuer/go-alived/pkg/netif"
"github.com/spf13/cobra"
)
type TestResult struct {
Name string
Pass bool
Message string
Fatal bool
}
type EnvironmentTest struct {
log *logger.Logger
results []TestResult
errors int
warns int
}
func NewEnvironmentTest(log *logger.Logger) *EnvironmentTest {
return &EnvironmentTest{
log: log,
results: make([]TestResult, 0),
}
}
func (t *EnvironmentTest) AddResult(name string, pass bool, message string, fatal bool) {
t.results = append(t.results, TestResult{
Name: name,
Pass: pass,
Message: message,
Fatal: fatal,
})
if !pass {
if fatal {
t.errors++
} else {
t.warns++
}
}
}
func (t *EnvironmentTest) TestRootPermission() {
t.log.Info("检查运行权限...")
if os.Geteuid() != 0 {
t.AddResult("Root权限", false, "需要root权限运行请使用sudo", true)
} else {
t.AddResult("Root权限", true, "以root用户运行", false)
}
}
func (t *EnvironmentTest) TestNetworkInterface(ifaceName string) string {
t.log.Info("检查网络接口...")
if ifaceName == "" {
interfaces, err := net.Interfaces()
if err != nil {
t.AddResult("网络接口", false, "无法获取网络接口列表", true)
return ""
}
for _, iface := range interfaces {
if iface.Flags&net.FlagUp != 0 && iface.Flags&net.FlagLoopback == 0 {
addrs, err := iface.Addrs()
if err == nil && len(addrs) > 0 {
for _, addr := range addrs {
if ipnet, ok := addr.(*net.IPNet); ok && ipnet.IP.To4() != nil {
ifaceName = iface.Name
t.log.Info("自动选择网卡: %s", ifaceName)
break
}
}
}
if ifaceName != "" {
break
}
}
}
if ifaceName == "" {
t.AddResult("网络接口", false, "未找到可用的网络接口", true)
return ""
}
}
iface, err := netif.GetInterface(ifaceName)
if err != nil {
t.AddResult("网络接口", false, fmt.Sprintf("网卡 %s 不存在", ifaceName), true)
return ""
}
if !iface.IsUp() {
t.AddResult("网络接口状态", false, fmt.Sprintf("网卡 %s 未启动", ifaceName), true)
return ""
}
t.AddResult("网络接口", true, fmt.Sprintf("网卡 %s 存在且已启动", ifaceName), false)
return ifaceName
}
func (t *EnvironmentTest) TestVIPOperations(ifaceName, testVIP string) {
t.log.Info("测试VIP添加/删除功能...")
if ifaceName == "" || testVIP == "" {
t.AddResult("VIP操作", false, "网卡名或测试VIP为空", true)
return
}
iface, err := netif.GetInterface(ifaceName)
if err != nil {
t.AddResult("VIP操作", false, fmt.Sprintf("获取网卡失败: %v", err), true)
return
}
if !strings.Contains(testVIP, "/") {
testVIP = testVIP + "/32"
}
exists, _ := iface.HasIP(testVIP)
if exists {
t.AddResult("VIP操作", false, fmt.Sprintf("VIP %s 已存在请使用其他IP测试", testVIP), true)
return
}
err = iface.AddIP(testVIP)
if err != nil {
t.AddResult("VIP添加", false, fmt.Sprintf("VIP添加失败: %v", err), true)
return
}
t.AddResult("VIP添加", true, fmt.Sprintf("成功添加VIP %s", testVIP), false)
time.Sleep(100 * time.Millisecond)
exists, _ = iface.HasIP(testVIP)
if !exists {
t.AddResult("VIP验证", false, "VIP添加后无法在网卡上找到", true)
iface.DeleteIP(testVIP)
return
}
t.AddResult("VIP验证", true, "VIP已成功添加到网卡", false)
vipAddr := strings.Split(testVIP, "/")[0]
cmd := exec.Command("ping", "-c", "1", "-W", "1", vipAddr)
err = cmd.Run()
if err != nil {
t.AddResult("VIP可达性", false, "VIP ping失败可能需要路由配置", false)
} else {
t.AddResult("VIP可达性", true, "VIP可以ping通", false)
}
err = iface.DeleteIP(testVIP)
if err != nil {
t.AddResult("VIP删除", false, fmt.Sprintf("VIP删除失败: %v", err), false)
} else {
t.AddResult("VIP删除", true, "VIP删除成功", false)
}
}
func (t *EnvironmentTest) TestMulticast(ifaceName string) {
t.log.Info("检查组播支持...")
if ifaceName == "" {
t.AddResult("组播支持", false, "网卡名为空,跳过检查", false)
return
}
cmd := exec.Command("ip", "maddr", "show", ifaceName)
output, err := cmd.CombinedOutput()
if err != nil {
t.AddResult("组播支持", false, "无法查询组播配置", false)
return
}
if len(output) > 0 {
t.AddResult("组播支持", true, "网卡支持组播", false)
} else {
t.AddResult("组播支持", false, "网卡组播支持未知", false)
}
}
func (t *EnvironmentTest) TestFirewall() {
t.log.Info("检查防火墙设置...")
cmd := exec.Command("iptables", "-L", "INPUT", "-n")
output, err := cmd.CombinedOutput()
if err != nil {
t.AddResult("防火墙检查", false, "无法查询iptables规则可能未安装", false)
return
}
if strings.Contains(string(output), "112") || strings.Contains(string(output), "vrrp") {
t.AddResult("防火墙VRRP", true, "防火墙已配置VRRP规则", false)
} else {
t.AddResult("防火墙VRRP", false, "防火墙未配置VRRP规则建议添加: iptables -A INPUT -p 112 -j ACCEPT", false)
}
cmd = exec.Command("systemctl", "is-active", "firewalld")
err = cmd.Run()
if err == nil {
cmd = exec.Command("firewall-cmd", "--list-protocols")
output, err = cmd.CombinedOutput()
if err == nil {
if strings.Contains(string(output), "vrrp") {
t.AddResult("Firewalld VRRP", true, "firewalld已允许VRRP协议", false)
} else {
t.AddResult("Firewalld VRRP", false, "firewalld未配置VRRP建议: firewall-cmd --permanent --add-protocol=vrrp", false)
}
}
}
}
func (t *EnvironmentTest) TestKernelParameters() {
t.log.Info("检查内核参数...")
params := map[string]string{
"/proc/sys/net/ipv4/ip_forward": "1",
"/proc/sys/net/ipv4/conf/all/arp_ignore": "0",
"/proc/sys/net/ipv4/conf/all/arp_announce": "0",
}
for path, expected := range params {
data, err := os.ReadFile(path)
if err != nil {
continue
}
value := strings.TrimSpace(string(data))
name := strings.TrimPrefix(path, "/proc/sys/net/ipv4/")
if value == expected {
t.AddResult(name, true, fmt.Sprintf("%s = %s (正常)", name, value), false)
} else {
if name == "ip_forward" && value != "1" {
t.AddResult(name, false, fmt.Sprintf("%s = %s (建议设置为1)", name, value), false)
}
}
}
}
func (t *EnvironmentTest) TestConflictingServices() {
t.log.Info("检查冲突服务...")
services := []string{"keepalived"}
hasConflict := false
for _, service := range services {
cmd := exec.Command("systemctl", "is-active", service)
err := cmd.Run()
if err == nil {
t.AddResult("服务冲突", false, fmt.Sprintf("发现运行中的%s服务可能冲突", service), false)
hasConflict = true
}
}
cmd := exec.Command("pgrep", "-x", "keepalived")
err := cmd.Run()
if err == nil {
t.AddResult("进程冲突", false, "发现运行中的keepalived进程", false)
hasConflict = true
}
if !hasConflict {
t.AddResult("服务冲突", true, "未发现冲突的服务", false)
}
}
func (t *EnvironmentTest) TestVirtualization() {
t.log.Info("检查虚拟化环境...")
productFile := "/sys/class/dmi/id/product_name"
data, err := os.ReadFile(productFile)
if err != nil {
cmd := exec.Command("systemd-detect-virt")
output, err := cmd.CombinedOutput()
if err == nil {
virt := strings.TrimSpace(string(output))
if virt != "none" {
t.AddResult("虚拟化", true, fmt.Sprintf("检测到虚拟化环境: %s", virt), false)
t.log.Warn("虚拟化环境可能需要特殊配置(如启用混杂模式)")
} else {
t.AddResult("虚拟化", true, "物理机环境", false)
}
}
return
}
product := strings.TrimSpace(string(data))
switch {
case strings.Contains(product, "VMware"):
t.AddResult("虚拟化", true, "VMware虚拟机需要启用混杂模式", false)
t.log.Warn("VMware需要配置: 虚拟机设置 -> 网络适配器 -> 高级 -> 混杂模式: 允许全部")
case strings.Contains(product, "VirtualBox"):
t.AddResult("虚拟化", true, "VirtualBox虚拟机需要桥接模式+混杂模式)", false)
t.log.Warn("VirtualBox需要配置: 网络 -> 桥接网卡 -> 高级 -> 混杂模式: 全部允许")
case strings.Contains(product, "KVM") || strings.Contains(product, "QEMU"):
t.AddResult("虚拟化", true, "KVM/QEMU虚拟机通常支持良好", false)
case strings.Contains(product, "Amazon") || strings.Contains(product, "EC2"):
t.AddResult("虚拟化", false, "AWS EC2环境 - 不支持VRRP", true)
t.log.Error("AWS不支持组播协议无法运行VRRP请使用Elastic IP或负载均衡")
default:
t.AddResult("虚拟化", true, fmt.Sprintf("环境: %s", product), false)
}
}
func (t *EnvironmentTest) TestCloudEnvironment() {
t.log.Info("检查云环境...")
cloudTests := []struct {
name string
url string
headers map[string]string
isFatal bool
solution string
}{
{
name: "AWS",
url: "http://169.254.169.254/latest/meta-data/instance-id",
solution: "AWS不支持VRRP请使用: Elastic IP、ALB或NLB",
isFatal: true,
},
{
name: "阿里云",
url: "http://100.100.100.200/latest/meta-data/instance-id",
solution: "阿里云ECS不支持VRRP请使用: 负载均衡SLB或高可用虚拟IP(HaVip)",
isFatal: true,
},
{
name: "Azure",
url: "http://169.254.169.254/metadata/instance?api-version=2021-02-01",
headers: map[string]string{"Metadata": "true"},
solution: "Azure建议使用: Azure Load Balancer或Traffic Manager",
isFatal: false,
},
{
name: "Google Cloud",
url: "http://metadata.google.internal/computeMetadata/v1/instance/id",
headers: map[string]string{"Metadata-Flavor": "Google"},
solution: "GCP建议使用: Cloud Load Balancing",
isFatal: false,
},
}
cloudDetected := false
for _, test := range cloudTests {
cmd := exec.Command("curl", "-s", "-m", "1", test.url)
if len(test.headers) > 0 {
for k, v := range test.headers {
cmd.Args = append(cmd.Args, "-H", fmt.Sprintf("%s: %s", k, v))
}
}
err := cmd.Run()
if err == nil {
cloudDetected = true
t.AddResult("云环境", !test.isFatal, fmt.Sprintf("检测到%s环境", test.name), test.isFatal)
t.log.Warn(test.solution)
}
}
if !cloudDetected {
t.AddResult("云环境", true, "未检测到公有云环境限制", false)
}
}
func (t *EnvironmentTest) PrintResults() {
fmt.Println()
fmt.Println("=== 测试结果 ===")
fmt.Println()
for _, result := range t.results {
status := "✓"
if !result.Pass {
if result.Fatal {
status = "✗"
} else {
status = "⚠"
}
}
fmt.Printf("%s %-20s %s\n", status, result.Name, result.Message)
}
fmt.Println()
fmt.Println("=== 总结 ===")
fmt.Println()
if t.errors == 0 && t.warns == 0 {
fmt.Println("✓ 环境完全支持 go-alived")
fmt.Println(" 可以正常使用所有功能")
} else if t.errors == 0 {
fmt.Printf("⚠ 环境基本支持,但有 %d 个警告\n", t.warns)
fmt.Println(" 建议修复警告项以获得更好的稳定性")
} else {
fmt.Printf("✗ 发现 %d 个错误, %d 个警告\n", t.errors, t.warns)
fmt.Println(" 请修复错误后再使用 go-alived")
}
fmt.Println()
}
func (t *EnvironmentTest) HasErrors() bool {
return t.errors > 0
}
var (
testIface string
testVIP string
)
var testCmd = &cobra.Command{
Use: "test",
Short: "Test environment for VRRP support",
Long: `Test the current environment to verify if it supports VRRP functionality.
This includes checking permissions, network interfaces, VIP operations, multicast support, and more.`,
Run: runTest,
}
func init() {
rootCmd.AddCommand(testCmd)
testCmd.Flags().StringVarP(&testIface, "interface", "i", "", "network interface to test (auto-detect if not specified)")
testCmd.Flags().StringVarP(&testVIP, "vip", "v", "", "test VIP address (e.g., 192.168.1.100/24)")
}
func runTest(cmd *cobra.Command, args []string) {
log := logger.New(false)
fmt.Println("=== go-alived 环境测试 ===")
fmt.Println()
test := NewEnvironmentTest(log)
test.TestRootPermission()
selectedIface := test.TestNetworkInterface(testIface)
if selectedIface != "" && testVIP != "" {
test.TestVIPOperations(selectedIface, testVIP)
}
if selectedIface != "" {
test.TestMulticast(selectedIface)
}
test.TestFirewall()
test.TestKernelParameters()
test.TestConflictingServices()
test.TestVirtualization()
test.TestCloudEnvironment()
test.PrintResults()
if test.HasErrors() {
os.Exit(1)
}
}

View File

@@ -0,0 +1,89 @@
package health
import (
"context"
"time"
)
type CheckResult int
const (
CheckResultUnknown CheckResult = iota
CheckResultSuccess
CheckResultFailure
)
func (r CheckResult) String() string {
switch r {
case CheckResultSuccess:
return "SUCCESS"
case CheckResultFailure:
return "FAILURE"
default:
return "UNKNOWN"
}
}
type Checker interface {
Check(ctx context.Context) CheckResult
Name() string
Type() string
}
type CheckerConfig struct {
Name string
Type string
Interval time.Duration
Timeout time.Duration
Rise int
Fall int
Config map[string]interface{}
}
type CheckerState struct {
Name string
Healthy bool
LastResult CheckResult
LastCheckTime time.Time
SuccessCount int
FailureCount int
TotalChecks int
ConsecutiveOK int
ConsecutiveFail int
}
func (s *CheckerState) IsHealthy() bool {
return s.Healthy
}
func (s *CheckerState) Update(result CheckResult, rise, fall int) bool {
s.LastResult = result
s.LastCheckTime = time.Now()
s.TotalChecks++
oldHealthy := s.Healthy
switch result {
case CheckResultSuccess:
s.SuccessCount++
s.ConsecutiveOK++
s.ConsecutiveFail = 0
if !s.Healthy && s.ConsecutiveOK >= rise {
s.Healthy = true
}
case CheckResultFailure:
s.FailureCount++
s.ConsecutiveFail++
s.ConsecutiveOK = 0
if s.Healthy && s.ConsecutiveFail >= fall {
s.Healthy = false
}
}
return s.Healthy != oldHealthy
}
type StateChangeCallback func(name string, oldHealthy, newHealthy bool)

View File

@@ -0,0 +1,56 @@
package health
import (
"fmt"
"github.com/loveuer/go-alived/pkg/config"
"github.com/loveuer/go-alived/pkg/logger"
)
func CreateChecker(cfg *config.HealthChecker) (Checker, error) {
configMap, ok := cfg.Config.(map[string]interface{})
if !ok {
return nil, fmt.Errorf("invalid config for checker %s", cfg.Name)
}
switch cfg.Type {
case "tcp":
return NewTCPChecker(cfg.Name, configMap)
case "http", "https":
return NewHTTPChecker(cfg.Name, configMap)
case "ping", "icmp":
return NewPingChecker(cfg.Name, configMap)
case "script":
return NewScriptChecker(cfg.Name, configMap)
default:
return nil, fmt.Errorf("unsupported checker type: %s", cfg.Type)
}
}
func LoadFromConfig(cfg *config.Config, log *logger.Logger) (*Manager, error) {
manager := NewManager(log)
for _, healthCfg := range cfg.Health {
checker, err := CreateChecker(&healthCfg)
if err != nil {
return nil, fmt.Errorf("failed to create checker %s: %w", healthCfg.Name, err)
}
monitorCfg := &CheckerConfig{
Name: healthCfg.Name,
Type: healthCfg.Type,
Interval: healthCfg.Interval,
Timeout: healthCfg.Timeout,
Rise: healthCfg.Rise,
Fall: healthCfg.Fall,
Config: healthCfg.Config.(map[string]interface{}),
}
monitor := NewMonitor(checker, monitorCfg, log)
manager.AddMonitor(monitor)
log.Info("loaded health checker: %s (type=%s)", healthCfg.Name, healthCfg.Type)
}
return manager, nil
}

90
internal/health/http.go Normal file
View File

@@ -0,0 +1,90 @@
package health
import (
"context"
"crypto/tls"
"fmt"
"net/http"
"time"
)
type HTTPChecker struct {
name string
url string
method string
expectedStatus int
client *http.Client
}
func NewHTTPChecker(name string, config map[string]interface{}) (*HTTPChecker, error) {
url, ok := config["url"].(string)
if !ok {
return nil, fmt.Errorf("http checker: missing or invalid 'url' field")
}
method := "GET"
if m, ok := config["method"].(string); ok {
method = m
}
expectedStatus := 200
if status, ok := config["expected_status"]; ok {
switch v := status.(type) {
case int:
expectedStatus = v
case float64:
expectedStatus = int(v)
}
}
insecureSkipVerify := false
if skip, ok := config["insecure_skip_verify"].(bool); ok {
insecureSkipVerify = skip
}
transport := &http.Transport{
TLSClientConfig: &tls.Config{
InsecureSkipVerify: insecureSkipVerify,
},
}
client := &http.Client{
Transport: transport,
Timeout: 30 * time.Second,
}
return &HTTPChecker{
name: name,
url: url,
method: method,
expectedStatus: expectedStatus,
client: client,
}, nil
}
func (c *HTTPChecker) Name() string {
return c.name
}
func (c *HTTPChecker) Type() string {
return "http"
}
func (c *HTTPChecker) Check(ctx context.Context) CheckResult {
req, err := http.NewRequestWithContext(ctx, c.method, c.url, nil)
if err != nil {
return CheckResultFailure
}
resp, err := c.client.Do(req)
if err != nil {
return CheckResultFailure
}
defer resp.Body.Close()
if resp.StatusCode == c.expectedStatus {
return CheckResultSuccess
}
return CheckResultFailure
}

192
internal/health/monitor.go Normal file
View File

@@ -0,0 +1,192 @@
package health
import (
"context"
"sync"
"time"
"github.com/loveuer/go-alived/pkg/logger"
)
type Monitor struct {
checker Checker
config *CheckerConfig
state *CheckerState
log *logger.Logger
callbacks []StateChangeCallback
running bool
stopCh chan struct{}
wg sync.WaitGroup
mu sync.RWMutex
}
func NewMonitor(checker Checker, config *CheckerConfig, log *logger.Logger) *Monitor {
return &Monitor{
checker: checker,
config: config,
state: &CheckerState{
Name: config.Name,
Healthy: false,
},
log: log,
callbacks: make([]StateChangeCallback, 0),
stopCh: make(chan struct{}),
}
}
func (m *Monitor) Start() {
m.mu.Lock()
if m.running {
m.mu.Unlock()
return
}
m.running = true
m.mu.Unlock()
m.log.Info("[HealthCheck:%s] starting health check monitor (interval=%s, timeout=%s)",
m.config.Name, m.config.Interval, m.config.Timeout)
m.wg.Add(1)
go m.checkLoop()
}
func (m *Monitor) Stop() {
m.mu.Lock()
if !m.running {
m.mu.Unlock()
return
}
m.running = false
m.mu.Unlock()
m.log.Info("[HealthCheck:%s] stopping health check monitor", m.config.Name)
close(m.stopCh)
m.wg.Wait()
}
func (m *Monitor) checkLoop() {
defer m.wg.Done()
ticker := time.NewTicker(m.config.Interval)
defer ticker.Stop()
m.performCheck()
for {
select {
case <-m.stopCh:
return
case <-ticker.C:
m.performCheck()
}
}
}
func (m *Monitor) performCheck() {
ctx, cancel := context.WithTimeout(context.Background(), m.config.Timeout)
defer cancel()
startTime := time.Now()
result := m.checker.Check(ctx)
duration := time.Since(startTime)
m.mu.Lock()
oldHealthy := m.state.Healthy
stateChanged := m.state.Update(result, m.config.Rise, m.config.Fall)
newHealthy := m.state.Healthy
callbacks := m.callbacks
m.mu.Unlock()
m.log.Debug("[HealthCheck:%s] check completed: result=%s, duration=%s, healthy=%v",
m.config.Name, result, duration, newHealthy)
if stateChanged {
m.log.Info("[HealthCheck:%s] health state changed: %v -> %v (consecutive_ok=%d, consecutive_fail=%d)",
m.config.Name, oldHealthy, newHealthy, m.state.ConsecutiveOK, m.state.ConsecutiveFail)
for _, callback := range callbacks {
callback(m.config.Name, oldHealthy, newHealthy)
}
}
}
func (m *Monitor) OnStateChange(callback StateChangeCallback) {
m.mu.Lock()
defer m.mu.Unlock()
m.callbacks = append(m.callbacks, callback)
}
func (m *Monitor) GetState() *CheckerState {
m.mu.RLock()
defer m.mu.RUnlock()
stateCopy := *m.state
return &stateCopy
}
func (m *Monitor) IsHealthy() bool {
m.mu.RLock()
defer m.mu.RUnlock()
return m.state.Healthy
}
type Manager struct {
monitors map[string]*Monitor
mu sync.RWMutex
log *logger.Logger
}
func NewManager(log *logger.Logger) *Manager {
return &Manager{
monitors: make(map[string]*Monitor),
log: log,
}
}
func (m *Manager) AddMonitor(monitor *Monitor) {
m.mu.Lock()
defer m.mu.Unlock()
m.monitors[monitor.config.Name] = monitor
}
func (m *Manager) GetMonitor(name string) (*Monitor, bool) {
m.mu.RLock()
defer m.mu.RUnlock()
monitor, ok := m.monitors[name]
return monitor, ok
}
func (m *Manager) StartAll() {
m.mu.RLock()
defer m.mu.RUnlock()
for _, monitor := range m.monitors {
monitor.Start()
}
m.log.Info("started %d health check monitor(s)", len(m.monitors))
}
func (m *Manager) StopAll() {
m.mu.RLock()
defer m.mu.RUnlock()
for _, monitor := range m.monitors {
monitor.Stop()
}
m.log.Info("stopped all health check monitors")
}
func (m *Manager) GetAllStates() map[string]*CheckerState {
m.mu.RLock()
defer m.mu.RUnlock()
states := make(map[string]*CheckerState)
for name, monitor := range m.monitors {
states[name] = monitor.GetState()
}
return states
}

129
internal/health/ping.go Normal file
View File

@@ -0,0 +1,129 @@
package health
import (
"context"
"fmt"
"net"
"time"
"golang.org/x/net/icmp"
"golang.org/x/net/ipv4"
)
type PingChecker struct {
name string
host string
count int
timeout time.Duration
}
func NewPingChecker(name string, config map[string]interface{}) (*PingChecker, error) {
host, ok := config["host"].(string)
if !ok {
return nil, fmt.Errorf("ping checker: missing or invalid 'host' field")
}
count := 1
if c, ok := config["count"]; ok {
switch v := c.(type) {
case int:
count = v
case float64:
count = int(v)
}
}
timeout := 2 * time.Second
if t, ok := config["timeout"].(string); ok {
if d, err := time.ParseDuration(t); err == nil {
timeout = d
}
}
return &PingChecker{
name: name,
host: host,
count: count,
timeout: timeout,
}, nil
}
func (c *PingChecker) Name() string {
return c.name
}
func (c *PingChecker) Type() string {
return "ping"
}
func (c *PingChecker) Check(ctx context.Context) CheckResult {
addr, err := net.ResolveIPAddr("ip4", c.host)
if err != nil {
return CheckResultFailure
}
conn, err := icmp.ListenPacket("ip4:icmp", "0.0.0.0")
if err != nil {
return CheckResultFailure
}
defer conn.Close()
successCount := 0
for i := 0; i < c.count; i++ {
select {
case <-ctx.Done():
return CheckResultFailure
default:
}
if c.sendPing(conn, addr) {
successCount++
}
}
if successCount > 0 {
return CheckResultSuccess
}
return CheckResultFailure
}
func (c *PingChecker) sendPing(conn *icmp.PacketConn, addr *net.IPAddr) bool {
msg := icmp.Message{
Type: ipv4.ICMPTypeEcho,
Code: 0,
Body: &icmp.Echo{
ID: 1234,
Seq: 1,
Data: []byte("go-alived-ping"),
},
}
msgBytes, err := msg.Marshal(nil)
if err != nil {
return false
}
if _, err := conn.WriteTo(msgBytes, addr); err != nil {
return false
}
conn.SetReadDeadline(time.Now().Add(c.timeout))
reply := make([]byte, 1500)
n, _, err := conn.ReadFrom(reply)
if err != nil {
return false
}
parsedMsg, err := icmp.ParseMessage(ipv4.ICMPTypeEchoReply.Protocol(), reply[:n])
if err != nil {
return false
}
if parsedMsg.Type == ipv4.ICMPTypeEchoReply {
return true
}
return false
}

73
internal/health/script.go Normal file
View File

@@ -0,0 +1,73 @@
package health
import (
"context"
"fmt"
"os/exec"
"time"
)
type ScriptChecker struct {
name string
script string
args []string
timeout time.Duration
}
func NewScriptChecker(name string, config map[string]interface{}) (*ScriptChecker, error) {
script, ok := config["script"].(string)
if !ok {
return nil, fmt.Errorf("script checker: missing or invalid 'script' field")
}
var args []string
if argsInterface, ok := config["args"].([]interface{}); ok {
args = make([]string, len(argsInterface))
for i, arg := range argsInterface {
if argStr, ok := arg.(string); ok {
args[i] = argStr
}
}
}
timeout := 10 * time.Second
if t, ok := config["timeout"].(string); ok {
if d, err := time.ParseDuration(t); err == nil {
timeout = d
}
}
return &ScriptChecker{
name: name,
script: script,
args: args,
timeout: timeout,
}, nil
}
func (c *ScriptChecker) Name() string {
return c.name
}
func (c *ScriptChecker) Type() string {
return "script"
}
func (c *ScriptChecker) Check(ctx context.Context) CheckResult {
cmdCtx, cancel := context.WithTimeout(ctx, c.timeout)
defer cancel()
cmd := exec.CommandContext(cmdCtx, c.script, c.args...)
err := cmd.Run()
if err != nil {
if exitErr, ok := err.(*exec.ExitError); ok {
if exitErr.ExitCode() != 0 {
return CheckResultFailure
}
}
return CheckResultFailure
}
return CheckResultSuccess
}

61
internal/health/tcp.go Normal file
View File

@@ -0,0 +1,61 @@
package health
import (
"context"
"fmt"
"net"
)
type TCPChecker struct {
name string
host string
port int
}
func NewTCPChecker(name string, config map[string]interface{}) (*TCPChecker, error) {
host, ok := config["host"].(string)
if !ok {
return nil, fmt.Errorf("tcp checker: missing or invalid 'host' field")
}
var port int
switch v := config["port"].(type) {
case int:
port = v
case float64:
port = int(v)
default:
return nil, fmt.Errorf("tcp checker: missing or invalid 'port' field")
}
if port < 1 || port > 65535 {
return nil, fmt.Errorf("tcp checker: invalid port number: %d", port)
}
return &TCPChecker{
name: name,
host: host,
port: port,
}, nil
}
func (c *TCPChecker) Name() string {
return c.name
}
func (c *TCPChecker) Type() string {
return "tcp"
}
func (c *TCPChecker) Check(ctx context.Context) CheckResult {
addr := fmt.Sprintf("%s:%d", c.host, c.port)
var dialer net.Dialer
conn, err := dialer.DialContext(ctx, "tcp", addr)
if err != nil {
return CheckResultFailure
}
conn.Close()
return CheckResultSuccess
}

72
internal/vrrp/arp.go Normal file
View File

@@ -0,0 +1,72 @@
package vrrp
import (
"fmt"
"net"
"net/netip"
"github.com/mdlayher/arp"
)
type ARPSender struct {
client *arp.Client
iface *net.Interface
}
func NewARPSender(ifaceName string) (*ARPSender, error) {
iface, err := net.InterfaceByName(ifaceName)
if err != nil {
return nil, fmt.Errorf("failed to get interface %s: %w", ifaceName, err)
}
client, err := arp.Dial(iface)
if err != nil {
return nil, fmt.Errorf("failed to create ARP client: %w", err)
}
return &ARPSender{
client: client,
iface: iface,
}, nil
}
func (a *ARPSender) SendGratuitousARP(ip net.IP) error {
if ip4 := ip.To4(); ip4 == nil {
return fmt.Errorf("invalid IPv4 address: %s", ip)
}
addr, err := netip.ParseAddr(ip.String())
if err != nil {
return fmt.Errorf("failed to parse IP: %w", err)
}
pkt, err := arp.NewPacket(
arp.OperationRequest,
a.iface.HardwareAddr,
addr,
net.HardwareAddr{0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
addr,
)
if err != nil {
return fmt.Errorf("failed to create ARP packet: %w", err)
}
if err := a.client.WriteTo(pkt, net.HardwareAddr{0xff, 0xff, 0xff, 0xff, 0xff, 0xff}); err != nil {
return fmt.Errorf("failed to send gratuitous ARP: %w", err)
}
return nil
}
func (a *ARPSender) SendGratuitousARPForIPs(ips []net.IP) error {
for _, ip := range ips {
if err := a.SendGratuitousARP(ip); err != nil {
return err
}
}
return nil
}
func (a *ARPSender) Close() error {
return a.client.Close()
}

427
internal/vrrp/instance.go Normal file
View File

@@ -0,0 +1,427 @@
package vrrp
import (
"fmt"
"net"
"sync"
"time"
"github.com/loveuer/go-alived/pkg/logger"
"github.com/loveuer/go-alived/pkg/netif"
)
type Instance struct {
Name string
VirtualRouterID uint8
Priority uint8
AdvertInterval uint8
Interface string
VirtualIPs []net.IP
AuthType uint8
AuthPass string
TrackScripts []string
state *StateMachine
priorityCalc *PriorityCalculator
history *StateHistory
socket *Socket
arpSender *ARPSender
netInterface *netif.Interface
advertTimer *Timer
masterDownTimer *Timer
running bool
stopCh chan struct{}
wg sync.WaitGroup
mu sync.RWMutex
log *logger.Logger
onMaster func()
onBackup func()
onFault func()
}
func NewInstance(
name string,
vrID uint8,
priority uint8,
advertInt uint8,
iface string,
vips []string,
authType string,
authPass string,
trackScripts []string,
log *logger.Logger,
) (*Instance, error) {
if vrID < 1 || vrID > 255 {
return nil, fmt.Errorf("invalid virtual router ID: %d", vrID)
}
if priority < 1 || priority > 255 {
return nil, fmt.Errorf("invalid priority: %d", priority)
}
virtualIPs := make([]net.IP, 0, len(vips))
for _, vip := range vips {
ip, _, err := net.ParseCIDR(vip)
if err != nil {
return nil, fmt.Errorf("invalid VIP %s: %w", vip, err)
}
virtualIPs = append(virtualIPs, ip)
}
var authTypeNum uint8
switch authType {
case "NONE", "":
authTypeNum = AuthTypeNone
case "PASS":
authTypeNum = AuthTypeSimpleText
default:
return nil, fmt.Errorf("unsupported auth type: %s", authType)
}
netInterface, err := netif.GetInterface(iface)
if err != nil {
return nil, fmt.Errorf("failed to get interface: %w", err)
}
inst := &Instance{
Name: name,
VirtualRouterID: vrID,
Priority: priority,
AdvertInterval: advertInt,
Interface: iface,
VirtualIPs: virtualIPs,
AuthType: authTypeNum,
AuthPass: authPass,
TrackScripts: trackScripts,
state: NewStateMachine(StateInit),
priorityCalc: NewPriorityCalculator(priority),
history: NewStateHistory(100),
netInterface: netInterface,
stopCh: make(chan struct{}),
log: log,
}
inst.advertTimer = NewTimer(time.Duration(advertInt)*time.Second, inst.onAdvertTimer)
inst.masterDownTimer = NewTimer(CalculateMasterDownInterval(advertInt), inst.onMasterDownTimer)
inst.state.OnStateChange(func(old, new State) {
inst.history.Add(old, new, "state transition")
inst.log.Info("[%s] state changed: %s -> %s", inst.Name, old, new)
inst.handleStateChange(old, new)
})
return inst, nil
}
func (inst *Instance) Start() error {
inst.mu.Lock()
if inst.running {
inst.mu.Unlock()
return fmt.Errorf("instance %s already running", inst.Name)
}
inst.running = true
inst.mu.Unlock()
var err error
inst.socket, err = NewSocket(inst.Interface)
if err != nil {
return fmt.Errorf("failed to create socket: %w", err)
}
inst.arpSender, err = NewARPSender(inst.Interface)
if err != nil {
inst.socket.Close()
return fmt.Errorf("failed to create ARP sender: %w", err)
}
inst.log.Info("[%s] starting VRRP instance (VRID=%d, Priority=%d, Interface=%s)",
inst.Name, inst.VirtualRouterID, inst.Priority, inst.Interface)
inst.state.SetState(StateBackup)
inst.masterDownTimer.Start()
inst.wg.Add(1)
go inst.receiveLoop()
return nil
}
func (inst *Instance) Stop() {
inst.mu.Lock()
if !inst.running {
inst.mu.Unlock()
return
}
inst.running = false
inst.mu.Unlock()
inst.log.Info("[%s] stopping VRRP instance", inst.Name)
close(inst.stopCh)
inst.wg.Wait()
inst.advertTimer.Stop()
inst.masterDownTimer.Stop()
if inst.state.GetState() == StateMaster {
inst.removeVIPs()
}
if inst.socket != nil {
inst.socket.Close()
}
if inst.arpSender != nil {
inst.arpSender.Close()
}
inst.state.SetState(StateInit)
}
func (inst *Instance) receiveLoop() {
defer inst.wg.Done()
for {
select {
case <-inst.stopCh:
return
default:
}
pkt, srcIP, err := inst.socket.Receive()
if err != nil {
inst.log.Debug("[%s] failed to receive packet: %v", inst.Name, err)
continue
}
if pkt.VirtualRtrID != inst.VirtualRouterID {
continue
}
if err := pkt.Validate(inst.AuthPass); err != nil {
inst.log.Warn("[%s] packet validation failed: %v", inst.Name, err)
continue
}
inst.handleAdvertisement(pkt, srcIP)
}
}
func (inst *Instance) handleAdvertisement(pkt *VRRPPacket, srcIP net.IP) {
currentState := inst.state.GetState()
localPriority := inst.priorityCalc.GetPriority()
inst.log.Debug("[%s] received advertisement from %s (priority=%d, state=%s)",
inst.Name, srcIP, pkt.Priority, currentState)
switch currentState {
case StateBackup:
if pkt.Priority == 0 {
inst.masterDownTimer.SetDuration(CalculateSkewTime(localPriority))
inst.masterDownTimer.Reset()
} else if !ShouldBecomeMaster(localPriority, pkt.Priority, inst.socket.localIP.String(), srcIP.String()) {
inst.masterDownTimer.Reset()
}
case StateMaster:
if ShouldBecomeMaster(pkt.Priority, localPriority, srcIP.String(), inst.socket.localIP.String()) {
inst.log.Warn("[%s] received higher priority advertisement, stepping down", inst.Name)
inst.state.SetState(StateBackup)
}
}
}
func (inst *Instance) onAdvertTimer() {
if inst.state.GetState() == StateMaster {
inst.sendAdvertisement()
inst.advertTimer.Start()
}
}
func (inst *Instance) onMasterDownTimer() {
if inst.state.GetState() == StateBackup {
inst.log.Info("[%s] master down timer expired, becoming master", inst.Name)
inst.state.SetState(StateMaster)
}
}
func (inst *Instance) sendAdvertisement() error {
priority := inst.priorityCalc.GetPriority()
pkt := NewAdvertisement(
inst.VirtualRouterID,
priority,
inst.AdvertInterval,
inst.VirtualIPs,
inst.AuthType,
inst.AuthPass,
)
if err := inst.socket.Send(pkt); err != nil {
inst.log.Error("[%s] failed to send advertisement: %v", inst.Name, err)
return err
}
inst.log.Debug("[%s] sent advertisement (priority=%d)", inst.Name, priority)
return nil
}
func (inst *Instance) handleStateChange(old, new State) {
switch new {
case StateMaster:
inst.becomeMaster()
case StateBackup:
inst.becomeBackup(old)
case StateFault:
inst.becomeFault()
}
}
func (inst *Instance) becomeMaster() {
inst.log.Info("[%s] transitioning to MASTER state", inst.Name)
if err := inst.addVIPs(); err != nil {
inst.log.Error("[%s] failed to add VIPs: %v", inst.Name, err)
inst.state.SetState(StateFault)
return
}
if err := inst.arpSender.SendGratuitousARPForIPs(inst.VirtualIPs); err != nil {
inst.log.Error("[%s] failed to send gratuitous ARP: %v", inst.Name, err)
}
inst.masterDownTimer.Stop()
inst.advertTimer.Start()
inst.sendAdvertisement()
if inst.onMaster != nil {
inst.onMaster()
}
}
func (inst *Instance) becomeBackup(oldState State) {
inst.log.Info("[%s] transitioning to BACKUP state", inst.Name)
inst.advertTimer.Stop()
if oldState == StateMaster {
if err := inst.removeVIPs(); err != nil {
inst.log.Error("[%s] failed to remove VIPs: %v", inst.Name, err)
}
}
inst.masterDownTimer.Reset()
if inst.onBackup != nil {
inst.onBackup()
}
}
func (inst *Instance) becomeFault() {
inst.log.Error("[%s] transitioning to FAULT state", inst.Name)
inst.advertTimer.Stop()
inst.masterDownTimer.Stop()
if err := inst.removeVIPs(); err != nil {
inst.log.Error("[%s] failed to remove VIPs: %v", inst.Name, err)
}
if inst.onFault != nil {
inst.onFault()
}
}
func (inst *Instance) addVIPs() error {
inst.log.Info("[%s] adding virtual IPs", inst.Name)
for _, vipStr := range inst.getVIPsWithCIDR() {
if err := inst.netInterface.AddIP(vipStr); err != nil {
inst.log.Error("[%s] failed to add VIP %s: %v", inst.Name, vipStr, err)
return err
}
inst.log.Info("[%s] added VIP %s", inst.Name, vipStr)
}
return nil
}
func (inst *Instance) removeVIPs() error {
inst.log.Info("[%s] removing virtual IPs", inst.Name)
for _, vipStr := range inst.getVIPsWithCIDR() {
has, _ := inst.netInterface.HasIP(vipStr)
if !has {
continue
}
if err := inst.netInterface.DeleteIP(vipStr); err != nil {
inst.log.Error("[%s] failed to remove VIP %s: %v", inst.Name, vipStr, err)
return err
}
inst.log.Info("[%s] removed VIP %s", inst.Name, vipStr)
}
return nil
}
func (inst *Instance) getVIPsWithCIDR() []string {
result := make([]string, len(inst.VirtualIPs))
for i, ip := range inst.VirtualIPs {
result[i] = ip.String() + "/32"
}
return result
}
func (inst *Instance) GetState() State {
return inst.state.GetState()
}
func (inst *Instance) OnMaster(callback func()) {
inst.onMaster = callback
}
func (inst *Instance) OnBackup(callback func()) {
inst.onBackup = callback
}
func (inst *Instance) OnFault(callback func()) {
inst.onFault = callback
}
func (inst *Instance) AdjustPriority(delta int) {
inst.mu.Lock()
defer inst.mu.Unlock()
oldPriority := inst.priorityCalc.GetPriority()
if delta < 0 {
inst.priorityCalc.DecreasePriority(uint8(-delta))
}
newPriority := inst.priorityCalc.GetPriority()
if oldPriority != newPriority {
inst.log.Info("[%s] priority adjusted: %d -> %d (delta=%d)",
inst.Name, oldPriority, newPriority, delta)
}
}
func (inst *Instance) ResetPriority() {
inst.mu.Lock()
defer inst.mu.Unlock()
oldPriority := inst.priorityCalc.GetPriority()
inst.priorityCalc.ResetPriority()
newPriority := inst.priorityCalc.GetPriority()
if oldPriority != newPriority {
inst.log.Info("[%s] priority reset: %d -> %d",
inst.Name, oldPriority, newPriority)
}
}

116
internal/vrrp/manager.go Normal file
View File

@@ -0,0 +1,116 @@
package vrrp
import (
"fmt"
"sync"
"github.com/loveuer/go-alived/pkg/config"
"github.com/loveuer/go-alived/pkg/logger"
)
type Manager struct {
instances map[string]*Instance
mu sync.RWMutex
log *logger.Logger
}
func NewManager(log *logger.Logger) *Manager {
return &Manager{
instances: make(map[string]*Instance),
log: log,
}
}
func (m *Manager) LoadFromConfig(cfg *config.Config) error {
m.mu.Lock()
defer m.mu.Unlock()
for _, vrrpCfg := range cfg.VRRP {
inst, err := NewInstance(
vrrpCfg.Name,
uint8(vrrpCfg.VirtualRouterID),
uint8(vrrpCfg.Priority),
uint8(vrrpCfg.AdvertInterval),
vrrpCfg.Interface,
vrrpCfg.VirtualIPs,
vrrpCfg.AuthType,
vrrpCfg.AuthPass,
vrrpCfg.TrackScripts,
m.log,
)
if err != nil {
return fmt.Errorf("failed to create instance %s: %w", vrrpCfg.Name, err)
}
m.instances[vrrpCfg.Name] = inst
m.log.Info("loaded VRRP instance: %s", vrrpCfg.Name)
}
return nil
}
func (m *Manager) StartAll() error {
m.mu.RLock()
defer m.mu.RUnlock()
for name, inst := range m.instances {
if err := inst.Start(); err != nil {
return fmt.Errorf("failed to start instance %s: %w", name, err)
}
}
m.log.Info("started %d VRRP instance(s)", len(m.instances))
return nil
}
func (m *Manager) StopAll() {
m.mu.RLock()
defer m.mu.RUnlock()
for _, inst := range m.instances {
inst.Stop()
}
m.log.Info("stopped all VRRP instances")
}
func (m *Manager) GetInstance(name string) (*Instance, bool) {
m.mu.RLock()
defer m.mu.RUnlock()
inst, ok := m.instances[name]
return inst, ok
}
func (m *Manager) GetAllInstances() []*Instance {
m.mu.RLock()
defer m.mu.RUnlock()
result := make([]*Instance, 0, len(m.instances))
for _, inst := range m.instances {
result = append(result, inst)
}
return result
}
func (m *Manager) Reload(cfg *config.Config) error {
m.log.Info("reloading VRRP configuration...")
m.StopAll()
m.mu.Lock()
m.instances = make(map[string]*Instance)
m.mu.Unlock()
if err := m.LoadFromConfig(cfg); err != nil {
return fmt.Errorf("failed to load config: %w", err)
}
if err := m.StartAll(); err != nil {
return fmt.Errorf("failed to start instances: %w", err)
}
m.log.Info("VRRP configuration reloaded successfully")
return nil
}

184
internal/vrrp/packet.go Normal file
View File

@@ -0,0 +1,184 @@
package vrrp
import (
"bytes"
"encoding/binary"
"fmt"
"net"
)
const (
VRRPVersion = 2
VRRPProtocolNumber = 112
)
type VRRPPacket struct {
Version uint8
Type uint8
VirtualRtrID uint8
Priority uint8
CountIPAddrs uint8
AuthType uint8
AdvertInt uint8
Checksum uint16
IPAddresses []net.IP
AuthData [8]byte
}
const (
VRRPTypeAdvertisement = 1
)
const (
AuthTypeNone = 0
AuthTypeSimpleText = 1
AuthTypeIPAH = 2
)
func NewAdvertisement(vrID uint8, priority uint8, advertInt uint8, ips []net.IP, authType uint8, authPass string) *VRRPPacket {
pkt := &VRRPPacket{
Version: VRRPVersion,
Type: VRRPTypeAdvertisement,
VirtualRtrID: vrID,
Priority: priority,
CountIPAddrs: uint8(len(ips)),
AuthType: authType,
AdvertInt: advertInt,
IPAddresses: ips,
}
if authType == AuthTypeSimpleText && authPass != "" {
copy(pkt.AuthData[:], authPass)
}
return pkt
}
func (p *VRRPPacket) Marshal() ([]byte, error) {
buf := new(bytes.Buffer)
versionType := (p.Version << 4) | p.Type
if err := binary.Write(buf, binary.BigEndian, versionType); err != nil {
return nil, err
}
if err := binary.Write(buf, binary.BigEndian, p.VirtualRtrID); err != nil {
return nil, err
}
if err := binary.Write(buf, binary.BigEndian, p.Priority); err != nil {
return nil, err
}
if err := binary.Write(buf, binary.BigEndian, p.CountIPAddrs); err != nil {
return nil, err
}
if err := binary.Write(buf, binary.BigEndian, p.AuthType); err != nil {
return nil, err
}
if err := binary.Write(buf, binary.BigEndian, p.AdvertInt); err != nil {
return nil, err
}
if err := binary.Write(buf, binary.BigEndian, uint16(0)); err != nil {
return nil, err
}
for _, ip := range p.IPAddresses {
ip4 := ip.To4()
if ip4 == nil {
return nil, fmt.Errorf("invalid IPv4 address: %s", ip)
}
if err := binary.Write(buf, binary.BigEndian, ip4); err != nil {
return nil, err
}
}
if err := binary.Write(buf, binary.BigEndian, p.AuthData); err != nil {
return nil, err
}
data := buf.Bytes()
checksum := calculateChecksum(data)
binary.BigEndian.PutUint16(data[6:8], checksum)
return data, nil
}
func Unmarshal(data []byte) (*VRRPPacket, error) {
if len(data) < 20 {
return nil, fmt.Errorf("packet too short: %d bytes", len(data))
}
pkt := &VRRPPacket{}
versionType := data[0]
pkt.Version = versionType >> 4
pkt.Type = versionType & 0x0F
pkt.VirtualRtrID = data[1]
pkt.Priority = data[2]
pkt.CountIPAddrs = data[3]
pkt.AuthType = data[4]
pkt.AdvertInt = data[5]
pkt.Checksum = binary.BigEndian.Uint16(data[6:8])
offset := 8
pkt.IPAddresses = make([]net.IP, pkt.CountIPAddrs)
for i := 0; i < int(pkt.CountIPAddrs); i++ {
if offset+4 > len(data) {
return nil, fmt.Errorf("packet too short for IP addresses")
}
pkt.IPAddresses[i] = net.IPv4(data[offset], data[offset+1], data[offset+2], data[offset+3])
offset += 4
}
if offset+8 > len(data) {
return nil, fmt.Errorf("packet too short for auth data")
}
copy(pkt.AuthData[:], data[offset:offset+8])
return pkt, nil
}
func calculateChecksum(data []byte) uint16 {
sum := uint32(0)
for i := 0; i < len(data)-1; i += 2 {
sum += uint32(data[i])<<8 | uint32(data[i+1])
}
if len(data)%2 == 1 {
sum += uint32(data[len(data)-1]) << 8
}
for sum > 0xFFFF {
sum = (sum & 0xFFFF) + (sum >> 16)
}
return uint16(^sum)
}
func (p *VRRPPacket) Validate(authPass string) error {
if p.Version != VRRPVersion {
return fmt.Errorf("unsupported VRRP version: %d", p.Version)
}
if p.Type != VRRPTypeAdvertisement {
return fmt.Errorf("unsupported VRRP type: %d", p.Type)
}
if p.AuthType == AuthTypeSimpleText {
if authPass != "" {
var expectedAuth [8]byte
copy(expectedAuth[:], authPass)
if !bytes.Equal(p.AuthData[:], expectedAuth[:]) {
return fmt.Errorf("authentication failed")
}
}
}
return nil
}

141
internal/vrrp/socket.go Normal file
View File

@@ -0,0 +1,141 @@
package vrrp
import (
"fmt"
"net"
"os"
"syscall"
"golang.org/x/net/ipv4"
)
const (
VRRPMulticastAddr = "224.0.0.18"
)
type Socket struct {
conn *ipv4.RawConn
iface *net.Interface
localIP net.IP
groupIP net.IP
}
func NewSocket(ifaceName string) (*Socket, error) {
iface, err := net.InterfaceByName(ifaceName)
if err != nil {
return nil, fmt.Errorf("failed to get interface %s: %w", ifaceName, err)
}
addrs, err := iface.Addrs()
if err != nil {
return nil, fmt.Errorf("failed to get addresses for %s: %w", ifaceName, err)
}
var localIP net.IP
for _, addr := range addrs {
if ipNet, ok := addr.(*net.IPNet); ok {
if ipv4 := ipNet.IP.To4(); ipv4 != nil {
localIP = ipv4
break
}
}
}
if localIP == nil {
return nil, fmt.Errorf("no IPv4 address found on interface %s", ifaceName)
}
fd, err := syscall.Socket(syscall.AF_INET, syscall.SOCK_RAW, VRRPProtocolNumber)
if err != nil {
return nil, fmt.Errorf("failed to create raw socket: %w", err)
}
if err := syscall.SetsockoptInt(fd, syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1); err != nil {
syscall.Close(fd)
return nil, fmt.Errorf("failed to set SO_REUSEADDR: %w", err)
}
file := os.NewFile(uintptr(fd), "vrrp-socket")
defer file.Close()
packetConn, err := net.FilePacketConn(file)
if err != nil {
return nil, fmt.Errorf("failed to create packet connection: %w", err)
}
rawConn, err := ipv4.NewRawConn(packetConn)
if err != nil {
packetConn.Close()
return nil, fmt.Errorf("failed to create raw connection: %w", err)
}
groupIP := net.ParseIP(VRRPMulticastAddr).To4()
if groupIP == nil {
rawConn.Close()
return nil, fmt.Errorf("invalid multicast address: %s", VRRPMulticastAddr)
}
if err := rawConn.JoinGroup(iface, &net.IPAddr{IP: groupIP}); err != nil {
rawConn.Close()
return nil, fmt.Errorf("failed to join multicast group: %w", err)
}
if err := rawConn.SetControlMessage(ipv4.FlagTTL|ipv4.FlagSrc|ipv4.FlagDst|ipv4.FlagInterface, true); err != nil {
rawConn.Close()
return nil, fmt.Errorf("failed to set control message: %w", err)
}
return &Socket{
conn: rawConn,
iface: iface,
localIP: localIP,
groupIP: groupIP,
}, nil
}
func (s *Socket) Send(pkt *VRRPPacket) error {
data, err := pkt.Marshal()
if err != nil {
return fmt.Errorf("failed to marshal packet: %w", err)
}
header := &ipv4.Header{
Version: ipv4.Version,
Len: ipv4.HeaderLen,
TOS: 0xC0,
TotalLen: ipv4.HeaderLen + len(data),
TTL: 255,
Protocol: VRRPProtocolNumber,
Dst: s.groupIP,
Src: s.localIP,
}
if err := s.conn.WriteTo(header, data, nil); err != nil {
return fmt.Errorf("failed to send packet: %w", err)
}
return nil
}
func (s *Socket) Receive() (*VRRPPacket, net.IP, error) {
buf := make([]byte, 1500)
header, payload, _, err := s.conn.ReadFrom(buf)
if err != nil {
return nil, nil, fmt.Errorf("failed to receive packet: %w", err)
}
pkt, err := Unmarshal(payload)
if err != nil {
return nil, nil, fmt.Errorf("failed to unmarshal packet: %w", err)
}
return pkt, header.Src, nil
}
func (s *Socket) Close() error {
if err := s.conn.LeaveGroup(s.iface, &net.IPAddr{IP: s.groupIP}); err != nil {
return err
}
return s.conn.Close()
}

258
internal/vrrp/state.go Normal file
View File

@@ -0,0 +1,258 @@
package vrrp
import (
"fmt"
"sync"
"time"
)
type State int
const (
StateInit State = iota
StateBackup
StateMaster
StateFault
)
func (s State) String() string {
switch s {
case StateInit:
return "INIT"
case StateBackup:
return "BACKUP"
case StateMaster:
return "MASTER"
case StateFault:
return "FAULT"
default:
return "UNKNOWN"
}
}
type StateMachine struct {
currentState State
previousState State
mu sync.RWMutex
stateChangeCallbacks []func(old, new State)
}
func NewStateMachine(initialState State) *StateMachine {
return &StateMachine{
currentState: initialState,
previousState: StateInit,
stateChangeCallbacks: make([]func(old, new State), 0),
}
}
func (sm *StateMachine) GetState() State {
sm.mu.RLock()
defer sm.mu.RUnlock()
return sm.currentState
}
func (sm *StateMachine) SetState(newState State) {
sm.mu.Lock()
oldState := sm.currentState
sm.previousState = oldState
sm.currentState = newState
callbacks := sm.stateChangeCallbacks
sm.mu.Unlock()
for _, callback := range callbacks {
callback(oldState, newState)
}
}
func (sm *StateMachine) OnStateChange(callback func(old, new State)) {
sm.mu.Lock()
defer sm.mu.Unlock()
sm.stateChangeCallbacks = append(sm.stateChangeCallbacks, callback)
}
type Timer struct {
duration time.Duration
timer *time.Timer
callback func()
mu sync.Mutex
}
func NewTimer(duration time.Duration, callback func()) *Timer {
return &Timer{
duration: duration,
callback: callback,
}
}
func (t *Timer) Start() {
t.mu.Lock()
defer t.mu.Unlock()
if t.timer != nil {
t.timer.Stop()
}
t.timer = time.AfterFunc(t.duration, t.callback)
}
func (t *Timer) Stop() {
t.mu.Lock()
defer t.mu.Unlock()
if t.timer != nil {
t.timer.Stop()
t.timer = nil
}
}
func (t *Timer) Reset() {
t.mu.Lock()
defer t.mu.Unlock()
if t.timer != nil {
t.timer.Stop()
}
t.timer = time.AfterFunc(t.duration, t.callback)
}
func (t *Timer) SetDuration(duration time.Duration) {
t.mu.Lock()
defer t.mu.Unlock()
t.duration = duration
}
type PriorityCalculator struct {
basePriority uint8
currentPriority uint8
mu sync.RWMutex
}
func NewPriorityCalculator(basePriority uint8) *PriorityCalculator {
return &PriorityCalculator{
basePriority: basePriority,
currentPriority: basePriority,
}
}
func (pc *PriorityCalculator) GetPriority() uint8 {
pc.mu.RLock()
defer pc.mu.RUnlock()
return pc.currentPriority
}
func (pc *PriorityCalculator) DecreasePriority(amount uint8) {
pc.mu.Lock()
defer pc.mu.Unlock()
if pc.currentPriority > amount {
pc.currentPriority -= amount
} else {
pc.currentPriority = 0
}
}
func (pc *PriorityCalculator) ResetPriority() {
pc.mu.Lock()
defer pc.mu.Unlock()
pc.currentPriority = pc.basePriority
}
func (pc *PriorityCalculator) SetBasePriority(priority uint8) {
pc.mu.Lock()
defer pc.mu.Unlock()
pc.basePriority = priority
pc.currentPriority = priority
}
func ShouldBecomeMaster(localPriority, remotePriority uint8, localIP, remoteIP string) bool {
if localPriority > remotePriority {
return true
}
if localPriority == remotePriority {
return localIP > remoteIP
}
return false
}
func CalculateMasterDownInterval(advertInt uint8) time.Duration {
return time.Duration(3*int(advertInt)) * time.Second
}
func CalculateSkewTime(priority uint8) time.Duration {
skew := float64(256-int(priority)) / 256.0
return time.Duration(skew * float64(time.Second))
}
type StateTransition struct {
From State
To State
Timestamp time.Time
Reason string
}
type StateHistory struct {
transitions []StateTransition
maxSize int
mu sync.RWMutex
}
func NewStateHistory(maxSize int) *StateHistory {
return &StateHistory{
transitions: make([]StateTransition, 0, maxSize),
maxSize: maxSize,
}
}
func (sh *StateHistory) Add(from, to State, reason string) {
sh.mu.Lock()
defer sh.mu.Unlock()
transition := StateTransition{
From: from,
To: to,
Timestamp: time.Now(),
Reason: reason,
}
sh.transitions = append(sh.transitions, transition)
if len(sh.transitions) > sh.maxSize {
sh.transitions = sh.transitions[1:]
}
}
func (sh *StateHistory) GetRecent(n int) []StateTransition {
sh.mu.RLock()
defer sh.mu.RUnlock()
if n > len(sh.transitions) {
n = len(sh.transitions)
}
start := len(sh.transitions) - n
result := make([]StateTransition, n)
copy(result, sh.transitions[start:])
return result
}
func (sh *StateHistory) String() string {
sh.mu.RLock()
defer sh.mu.RUnlock()
if len(sh.transitions) == 0 {
return "No state transitions"
}
result := "State transition history:\n"
for _, t := range sh.transitions {
result += fmt.Sprintf(" %s: %s -> %s (%s)\n",
t.Timestamp.Format("2006-01-02 15:04:05"),
t.From, t.To, t.Reason)
}
return result
}

9
main.go Normal file
View File

@@ -0,0 +1,9 @@
package main
import (
"github.com/loveuer/go-alived/internal/cmd"
)
func main() {
cmd.Execute()
}

90
pkg/config/config.go Normal file
View File

@@ -0,0 +1,90 @@
package config
import (
"fmt"
"os"
"time"
"gopkg.in/yaml.v3"
)
type Config struct {
Global Global `yaml:"global"`
VRRP []VRRPInstance `yaml:"vrrp_instances"`
Health []HealthChecker `yaml:"health_checkers"`
}
type Global struct {
RouterID string `yaml:"router_id"`
NotificationMail string `yaml:"notification_email"`
}
type VRRPInstance struct {
Name string `yaml:"name"`
Interface string `yaml:"interface"`
State string `yaml:"state"`
VirtualRouterID int `yaml:"virtual_router_id"`
Priority int `yaml:"priority"`
VirtualIPs []string `yaml:"virtual_ips"`
AdvertInterval int `yaml:"advert_interval"`
AuthType string `yaml:"auth_type"`
AuthPass string `yaml:"auth_pass"`
NotifyMaster string `yaml:"notify_master"`
NotifyBackup string `yaml:"notify_backup"`
NotifyFault string `yaml:"notify_fault"`
TrackScripts []string `yaml:"track_scripts"`
}
type HealthChecker struct {
Name string `yaml:"name"`
Type string `yaml:"type"`
Interval time.Duration `yaml:"interval"`
Timeout time.Duration `yaml:"timeout"`
Rise int `yaml:"rise"`
Fall int `yaml:"fall"`
Config interface{} `yaml:"config"`
}
func Load(path string) (*Config, error) {
data, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("failed to read config file: %w", err)
}
var cfg Config
if err := yaml.Unmarshal(data, &cfg); err != nil {
return nil, fmt.Errorf("failed to parse config file: %w", err)
}
if err := validate(&cfg); err != nil {
return nil, fmt.Errorf("invalid configuration: %w", err)
}
return &cfg, nil
}
func validate(cfg *Config) error {
if cfg.Global.RouterID == "" {
return fmt.Errorf("global.router_id is required")
}
for i, vrrp := range cfg.VRRP {
if vrrp.Name == "" {
return fmt.Errorf("vrrp_instances[%d].name is required", i)
}
if vrrp.Interface == "" {
return fmt.Errorf("vrrp_instances[%d].interface is required", i)
}
if vrrp.VirtualRouterID < 1 || vrrp.VirtualRouterID > 255 {
return fmt.Errorf("vrrp_instances[%d].virtual_router_id must be between 1 and 255", i)
}
if vrrp.Priority < 1 || vrrp.Priority > 255 {
return fmt.Errorf("vrrp_instances[%d].priority must be between 1 and 255", i)
}
if len(vrrp.VirtualIPs) == 0 {
return fmt.Errorf("vrrp_instances[%d].virtual_ips cannot be empty", i)
}
}
return nil
}

44
pkg/logger/logger.go Normal file
View File

@@ -0,0 +1,44 @@
package logger
import (
"fmt"
"log"
"os"
"time"
)
type Logger struct {
debug bool
logger *log.Logger
}
func New(debug bool) *Logger {
return &Logger{
debug: debug,
logger: log.New(os.Stdout, "", 0),
}
}
func (l *Logger) Info(format string, args ...interface{}) {
l.log("INFO", format, args...)
}
func (l *Logger) Error(format string, args ...interface{}) {
l.log("ERROR", format, args...)
}
func (l *Logger) Debug(format string, args ...interface{}) {
if l.debug {
l.log("DEBUG", format, args...)
}
}
func (l *Logger) Warn(format string, args ...interface{}) {
l.log("WARN", format, args...)
}
func (l *Logger) log(level string, format string, args ...interface{}) {
timestamp := time.Now().Format("2006-01-02 15:04:05")
message := fmt.Sprintf(format, args...)
l.logger.Printf("[%s] %s: %s", timestamp, level, message)
}

81
pkg/netif/interface.go Normal file
View File

@@ -0,0 +1,81 @@
package netif
import (
"fmt"
"net"
"github.com/vishvananda/netlink"
)
type Interface struct {
Name string
Index int
Link netlink.Link
}
func GetInterface(name string) (*Interface, error) {
link, err := netlink.LinkByName(name)
if err != nil {
return nil, fmt.Errorf("failed to find interface %s: %w", name, err)
}
return &Interface{
Name: name,
Index: link.Attrs().Index,
Link: link,
}, nil
}
func (iface *Interface) AddIP(ipCIDR string) error {
addr, err := netlink.ParseAddr(ipCIDR)
if err != nil {
return fmt.Errorf("invalid IP address %s: %w", ipCIDR, err)
}
if err := netlink.AddrAdd(iface.Link, addr); err != nil {
return fmt.Errorf("failed to add IP %s to %s: %w", ipCIDR, iface.Name, err)
}
return nil
}
func (iface *Interface) DeleteIP(ipCIDR string) error {
addr, err := netlink.ParseAddr(ipCIDR)
if err != nil {
return fmt.Errorf("invalid IP address %s: %w", ipCIDR, err)
}
if err := netlink.AddrDel(iface.Link, addr); err != nil {
return fmt.Errorf("failed to delete IP %s from %s: %w", ipCIDR, iface.Name, err)
}
return nil
}
func (iface *Interface) HasIP(ipCIDR string) (bool, error) {
targetAddr, err := netlink.ParseAddr(ipCIDR)
if err != nil {
return false, fmt.Errorf("invalid IP address %s: %w", ipCIDR, err)
}
addrs, err := netlink.AddrList(iface.Link, 0)
if err != nil {
return false, fmt.Errorf("failed to list addresses on %s: %w", iface.Name, err)
}
for _, addr := range addrs {
if addr.IPNet.String() == targetAddr.IPNet.String() {
return true, nil
}
}
return false, nil
}
func (iface *Interface) GetHardwareAddr() (net.HardwareAddr, error) {
return iface.Link.Attrs().HardwareAddr, nil
}
func (iface *Interface) IsUp() bool {
return iface.Link.Attrs().Flags&net.FlagUp != 0
}

133
roadmap.md Normal file
View File

@@ -0,0 +1,133 @@
# go-alived Roadmap
## 项目目标
使用 Golang 实现 keepalived 的核心功能,无外部依赖,单二进制部署。
## Keepalived 核心功能
### 1. VRRP (Virtual Router Redundancy Protocol) 协议
- **虚拟 IP 管理**: 管理可在多个节点间浮动的虚拟 IP 地址 (VIP)
- **状态机管理**: MASTER、BACKUP、FAULT 三种状态的转换
- **优先级选举**: 基于优先级 (1-255) 选举 MASTER 节点
- **Gratuitous ARP**: 状态变化时发送 ARP 报文更新网络设备
- **同步组**: 将多个 VRRP 实例组合,作为整体进行状态转换
- **虚拟 MAC 支持**: 支持使用虚拟 MAC 地址 (macvlan)
### 2. 健康检查 (Health Checking)
- **HTTP/HTTPS 检查**: 通过 GET 请求验证 Web 服务状态
- **TCP 检查**: 基本的 TCP 连接测试
- **SMTP 检查**: 邮件服务监控
- **DNS 检查**: 基于查询的 DNS 验证
- **脚本检查**: 自定义脚本实现灵活监控
- **UDP/PING 检查**: 网络连通性测试
- **动态权重**: 根据健康检查结果动态调整权重
### 3. 负载均衡 (LVS 集成)
- **调度算法**: 支持 rr、wrr、lc、wlc、sh 等多种调度算法
- **转发模式**: NAT、Direct Routing (DR)、IP Tunneling (TUN)
- **后端服务器管理**: 根据健康状态动态添加/移除后端服务器
- **Quorum 支持**: 配置最小存活服务器数量
- **Sorry Server**: 当健康节点不足时的备用服务器
- **会话保持**: 支持会话持久化
### 4. 辅助功能
- **状态变化脚本**: 在状态转换时执行自定义脚本
- **邮件通知**: SMTP 告警支持
- **进程监控**: 监控外部进程并调整优先级
- **配置热加载**: 支持配置文件重载
## 实现计划
### Phase 0: 项目基础设施 ✅
- [x] 项目结构搭建
- [x] CLI 参数解析 (--config, --debug)
- [x] YAML 配置文件加载和验证
- [x] 日志系统
- [x] 信号处理 (SIGHUP 重载配置)
### Phase 1: 核心 VRRP 功能 (第一优先级)
#### 1.1 网络接口和 IP 管理
- [ ] 网络接口检测和验证
- [ ] VIP 添加/删除功能 (使用 netlink)
- [ ] IP 地址冲突检测
- [ ] VIP 状态查询
#### 1.2 VRRP 协议栈
- [ ] VRRP 报文结构定义 (RFC 3768/5798)
- [ ] 原始 socket 收发 VRRP 报文
- [ ] Advertisement 报文发送
- [ ] Advertisement 报文接收和解析
- [ ] 认证支持 (PASS 类型)
#### 1.3 状态机实现
- [ ] 状态定义 (INIT/BACKUP/MASTER/FAULT)
- [ ] 状态转换逻辑
- [ ] Master 选举算法
- [ ] 定时器管理 (Advertisement Timer, Master Down Timer)
- [ ] 优先级抢占模式
#### 1.4 ARP 和网络更新
- [ ] Gratuitous ARP 发送
- [ ] ARP 应答处理
- [ ] 多 VIP 的 ARP 广播
#### 1.5 集成和测试
- [ ] VRRP 实例管理器
- [ ] 多实例支持
- [ ] 基础功能测试
- [ ] 双机 VRRP 切换测试
### Phase 2: 健康检查系统 (第二优先级)
#### 2.1 健康检查框架
- [ ] 健康检查器接口定义
- [ ] 检查结果状态管理 (rise/fall 计数)
- [ ] 定时调度器
- [ ] 超时控制
#### 2.2 检查器实现
- [ ] TCP 健康检查
- [ ] HTTP/HTTPS 健康检查
- [ ] ICMP Ping 检查
- [ ] 脚本检查 (执行外部命令)
- [ ] DNS 检查
#### 2.3 与 VRRP 联动
- [ ] Track Script 支持
- [ ] 健康检查失败时降低优先级
- [ ] 检查恢复时恢复优先级
- [ ] 健康检查状态影响 VRRP 状态机
### Phase 3: 增强功能 (第三优先级)
#### 3.1 通知和脚本
- [ ] 状态变化时执行脚本 (notify_master/backup/fault)
- [ ] 脚本执行器 (权限控制、超时控制)
- [ ] 邮件通知支持 (SMTP)
- [ ] Webhook 通知
#### 3.2 高级特性
- [ ] 同步组 (Sync Group) 支持
- [ ] 虚拟 MAC 地址支持
- [ ] 配置热加载优化
- [ ] 进程监控和自动重启
#### 3.3 可观测性
- [ ] 状态查询 API/CLI
- [ ] Metrics 导出 (Prometheus 格式)
- [ ] 详细的事件日志
- [ ] 调试模式增强
### Phase 4: 负载均衡 (可选,低优先级)
- [ ] LVS 集成调研
- [ ] IPVS 操作封装
- [ ] 基础调度算法 (rr, wrr)
- [ ] 后端服务器健康检查
- [ ] 动态后端管理
## 当前进度
- ✅ Phase 0 已完成
- 🔄 下一步Phase 1.1 网络接口和 IP 管理
## 技术选型
- 语言: Go 1.21+
- 配置格式: YAML/TOML (兼容 keepalived.conf 风格)
- 依赖: 尽量使用标准库,最小化第三方依赖