Docusaurus Hosting Infrastructure
Introduction
The Ominis Cluster Manager documentation is hosted as a static Docusaurus site at https://docs.app.ominis.ai. This document provides a complete guide to the infrastructure, deployment process, and operational procedures for the documentation hosting system.
Purpose
Host the comprehensive Ominis Cluster Manager documentation with:
- High Availability: Multiple replicas with automated failover
- Automatic TLS: Certificate management via cert-manager
- Fast Performance: Nginx with aggressive caching and compression
- Easy Updates: Containerized deployment with version control
- Security: Best practices for static site hosting
Architecture Components
- Static Site: Docusaurus build output (HTML, CSS, JS)
- Web Server: Nginx Alpine serving static files
- Container: Docker image with nginx + static build
- Kubernetes: Deployment + Service + Ingress
- TLS: Automatic certificate via cert-manager
- Ingress: Traefik routing with Let's Encrypt
URL
Production: https://docs.app.ominis.ai
Architecture Overview
The documentation hosting system follows a standard Kubernetes pattern for static sites:
Runtime Architecture
Build Pipeline
Step 1: Build Docusaurus
The documentation is generated by the agentic system and then built into a static site:
# Generate documentation
cd /home/matt/projects/fml/cluster-manager/agentic-instructions
./run.sh
# Navigate to Docusaurus directory
cd output/docusaurus
# Install dependencies
npm install
# Build static site
npm run build
# Output directory: build/
# Contains: index.html, assets/, *.js, *.css
Step 2: Create Docker Image
The Docker image packages Nginx with the static build output:
Dockerfile (create in agentic-instructions/ directory):
FROM nginx:alpine
# Copy nginx configuration
COPY nginx.conf /etc/nginx/nginx.conf
# Copy Docusaurus build
COPY output/docusaurus/build/ /usr/share/nginx/html/
# Expose port
EXPOSE 80
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD wget --quiet --tries=1 --spider http://localhost/health || exit 1
# Run as non-root user
USER nginx
Nginx Configuration (nginx.conf):
The nginx configuration is stored in the Helm chart values.yaml and includes:
server {
listen 80;
server_name _;
root /usr/share/nginx/html;
index index.html;
# SPA routing - try files, fall back to index.html
location / {
try_files $uri $uri/ /index.html;
}
# Cache static assets aggressively
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$ {
expires 1y;
add_header Cache-Control "public, immutable";
}
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "no-referrer-when-downgrade" always;
# Health check endpoint
location /health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
# Disable access to hidden files
location ~ /\. {
deny all;
}
}
Step 3: Build and Push Image
# Build image
cd /home/matt/projects/fml/cluster-manager/agentic-instructions
docker build -t ghcr.io/ominis-ai/docusaurus:latest \
-f Dockerfile.docusaurus .
# Login to GitHub Container Registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin
# Push image
docker push ghcr.io/ominis-ai/docusaurus:latest
# Tag specific version
docker tag ghcr.io/ominis-ai/docusaurus:latest \
ghcr.io/ominis-ai/docusaurus:v1.0.0
docker push ghcr.io/ominis-ai/docusaurus:v1.0.0
Image Build Process
Kubernetes Deployment
Helm Chart Structure
The documentation is deployed using a Helm chart located at:
/home/matt/projects/fml/cluster-infra/helm-charts/docusaurus/
Chart Components
Chart.yaml - Chart metadata:
apiVersion: v2
name: docusaurus
description: Docusaurus documentation hosting for Ominis Cluster Manager
type: application
version: 1.0.0
appVersion: "1.0.0"
keywords:
- docusaurus
- documentation
- nginx
- ominis
maintainers:
- name: Ominis.ai
email: ops@ominis.ai
values.yaml - Configuration values:
- Image:
ghcr.io/ominis-ai/docusaurus:latest - Replicas: 2 (high availability)
- Resources: 50m CPU, 64Mi memory (minimal footprint)
- Ingress: Traefik with automatic TLS
- Nginx: Full configuration with caching and security headers
Templates:
- namespace.yaml: Creates dedicated
docusaurusnamespace - deployment.yaml: Nginx deployment with 2 replicas
- service.yaml: ClusterIP service on port 80
- ingress.yaml: Traefik ingress with TLS
- configmap.yaml: Nginx configuration
Deployment Commands
Install
# Navigate to cluster-infra
cd /home/matt/projects/fml/cluster-infra
# Install Helm chart
helm install docusaurus helm-charts/docusaurus/ \
--namespace docusaurus \
--create-namespace
# Verify deployment
kubectl get pods -n docusaurus
kubectl get svc -n docusaurus
kubectl get ingress -n docusaurus
# Check certificate
kubectl get certificate -n docusaurus
# View logs
kubectl logs -n docusaurus -l app=docusaurus
Expected Output
NAME READY STATUS RESTARTS AGE
docusaurus-7b8f5d9c6f-abc12 1/1 Running 0 1m
docusaurus-7b8f5d9c6f-xyz89 1/1 Running 0 1m
Upgrade Deployment
# After building new image
docker push ghcr.io/ominis-ai/docusaurus:v1.0.1
# Update chart values
helm upgrade docusaurus helm-charts/docusaurus/ \
--namespace docusaurus \
--set image.tag=v1.0.1
# Rollout status
kubectl rollout status deployment/docusaurus -n docusaurus
# Verify new version
kubectl get pods -n docusaurus -o jsonpath='{.items[0].spec.containers[0].image}'
Rollback Deployment
# View release history
helm history docusaurus -n docusaurus
# Rollback to previous version
helm rollback docusaurus -n docusaurus
# Rollback to specific revision
helm rollback docusaurus 3 -n docusaurus
# Verify rollback
kubectl rollout status deployment/docusaurus -n docusaurus
ADR: Why Nginx Over Other Static Servers?
Context
Multiple options exist for serving static files in Kubernetes. We needed to choose a web server that balances performance, security, and operational simplicity for hosting our Docusaurus documentation.
Decision
Use Nginx Alpine for serving Docusaurus static build
Alternatives Considered
- Apache HTTP Server: Traditional, feature-rich HTTP server
- Caddy: Modern web server with automatic HTTPS
- Node.js http-server: Simple JavaScript-based static server
- Nginx Alpine (chosen): Lightweight, high-performance web server
Comparison Matrix
| Feature | Nginx | Caddy | Apache | http-server |
|---|---|---|---|---|
| Image Size | 40MB | 50MB | 200MB | 100MB |
| Performance | Excellent | Very Good | Good | Fair |
| Config Complexity | Moderate | Simple | Complex | Very Simple |
| Memory Usage | Low | Low | High | Medium |
| Static Files | Excellent | Excellent | Good | Good |
| SPA Routing | Yes | Yes | Yes | Limited |
| Production Ready | Yes | Yes | Yes | No |
Why Nginx?
Pros:
- Performance: Industry-leading static file serving performance
- Small Image: nginx:alpine is ~40MB vs Apache's ~200MB
- Battle-Tested: Proven in production at massive scale
- Flexible Configuration: Powerful nginx.conf syntax
- Caching: Built-in cache control headers and directives
- Health Checks: Easy to configure liveness/readiness probes
- Wide Support: Extensive documentation and community knowledge
- Security: Regular updates, minimal attack surface with Alpine base
- Compression: Built-in gzip support
Cons:
- Configuration syntax less friendly than Caddy
- No automatic HTTPS (handled by cert-manager/Traefik instead)
- Requires more initial configuration
Why Not Alternatives?
Apache:
- ❌ 5x larger image size (200MB vs 40MB)
- ❌ Higher memory footprint
- ❌ More complex configuration for simple static hosting
- ✅ More features than needed for static files
Caddy:
- ✅ Simpler configuration (Caddyfile)
- ✅ Automatic HTTPS (not needed with cert-manager)
- ❌ Less widespread adoption
- ❌ Slightly larger image
http-server:
- ❌ Not production-ready
- ❌ Limited configuration options
- ❌ No health check endpoints
- ❌ Poor performance under load
Consequences
Accepted:
- Nginx configuration stored in Helm chart ConfigMap
- Alpine base for minimal attack surface (40MB image)
- Manual configuration (tradeoff for control and flexibility)
- TLS termination handled by Traefik (not nginx)
Benefits:
- Excellent performance for documentation site
- Minimal resource usage (50m CPU, 64Mi memory)
- Production-grade reliability
- Well-understood operational model
Future Considerations
- CDN Integration: Consider Cloudflare in front of Nginx for global distribution
- HTTP/3: Nginx now supports QUIC/HTTP3 if needed
- Rate Limiting: Can add nginx rate limiting if needed
Ingress Configuration
Pattern
The ingress follows the cluster-infra standard pattern using Traefik with cert-manager for automatic TLS:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: docs-ingress
namespace: docusaurus
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: traefik
tls:
- hosts:
- docs.app.ominis.ai
secretName: docs-tls
rules:
- host: docs.app.ominis.ai
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: docusaurus
port:
number: 80
How It Works
- Ingress Creation: Ingress resource deployed by Helm
- Traefik Discovery: Traefik controller discovers new ingress
- Cert-Manager: Sees
cert-manager.io/cluster-issuerannotation - Certificate Request: Cert-manager requests certificate from Let's Encrypt
- ACME Challenge: HTTP-01 challenge completed automatically
- Certificate Storage: Certificate stored in
docs-tlsKubernetes Secret - TLS Termination: Traefik serves HTTPS traffic using certificate
- Traffic Routing: Traefik routes requests to docusaurus service
Verifying Ingress
# Check ingress resource
kubectl get ingress -n docusaurus
kubectl describe ingress docs-ingress -n docusaurus
# Check certificate status
kubectl get certificate -n docusaurus
kubectl describe certificate docs-tls -n docusaurus
# Check TLS secret
kubectl get secret docs-tls -n docusaurus
# Test HTTPS
curl -I https://docs.app.ominis.ai
CI/CD Integration
Current Manual Workflow
# 1. Generate documentation
cd /home/matt/projects/fml/cluster-manager/agentic-instructions
./run.sh
# 2. Build Docusaurus
cd output/docusaurus
npm run build
# 3. Build Docker image
cd ../..
docker build -t ghcr.io/ominis-ai/docusaurus:latest \
-f Dockerfile.docusaurus .
# 4. Push to registry
docker push ghcr.io/ominis-ai/docusaurus:latest
# 5. Deploy to Kubernetes
cd /home/matt/projects/fml/cluster-infra
helm upgrade docusaurus helm-charts/docusaurus/ \
--install \
--namespace docusaurus
Future GitHub Actions Workflow
name: Deploy Documentation
on:
push:
branches: [main]
paths:
- 'agentic-instructions/**'
- 'docs/**'
workflow_dispatch:
jobs:
build-and-deploy:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '18'
- name: Generate documentation
run: |
cd agentic-instructions
./run.sh
- name: Build Docusaurus
run: |
cd agentic-instructions/output/docusaurus
npm install
npm run build
- name: Login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build Docker Image
run: |
docker build -t ghcr.io/ominis-ai/docusaurus:${{ github.sha }} \
-t ghcr.io/ominis-ai/docusaurus:latest \
-f Dockerfile.docusaurus .
- name: Push to GHCR
run: |
docker push ghcr.io/ominis-ai/docusaurus:${{ github.sha }}
docker push ghcr.io/ominis-ai/docusaurus:latest
- name: Setup Kubernetes
uses: azure/setup-kubectl@v3
- name: Deploy to Kubernetes
run: |
helm upgrade docusaurus helm-charts/docusaurus/ \
--install \
--namespace docusaurus \
--set image.tag=${{ github.sha }}
env:
KUBECONFIG: ${{ secrets.KUBECONFIG }}
Automation Benefits
- Automatic Deployment: Push to main triggers deployment
- Version Tracking: Git SHA tags for every build
- Rollback Support: Can deploy any previous SHA
- Consistency: Same build process every time
- Audit Trail: GitHub Actions logs
Monitoring and Operations
Health Checks
Kubernetes Probes
Liveness Probe:
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
Readiness Probe:
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
Health Endpoint:
# From outside cluster
curl https://docs.app.ominis.ai/health
# From within cluster
kubectl run test-curl --rm -it --image=curlimages/curl -- \
curl http://docusaurus.docusaurus.svc.cluster.local/health
Monitoring Commands
# Check pod status
kubectl get pods -n docusaurus
# View detailed pod info
kubectl describe pod -n docusaurus -l app=docusaurus
# View logs (all pods)
kubectl logs -n docusaurus -l app=docusaurus --tail=100 -f
# View logs (specific pod)
kubectl logs -n docusaurus docusaurus-7b8f5d9c6f-abc12
# Check ingress status
kubectl describe ingress docs-ingress -n docusaurus
# Verify certificate
kubectl get certificate docs-tls -n docusaurus
kubectl describe certificate docs-tls -n docusaurus
# Check service endpoints
kubectl get endpoints docusaurus -n docusaurus
# Resource usage
kubectl top pods -n docusaurus
Metrics
Key metrics to monitor:
- Request Count: Total requests via Traefik dashboard
- Response Times: P50, P95, P99 latencies
- Error Rates: 4xx and 5xx responses
- Certificate Expiry: Days until cert renewal needed
- Pod Health: Liveness/readiness probe failures
- Resource Usage: CPU and memory consumption
Accessing Metrics
# Traefik dashboard
kubectl port-forward -n flow-proxy svc/traefik 9000:9000
# Visit: http://localhost:9000/dashboard/
# Prometheus metrics (if configured)
kubectl port-forward -n monitoring svc/prometheus 9090:9090
# Query: rate(traefik_service_requests_total{service="docusaurus"}[5m])
Troubleshooting
Issue: Site Not Accessible
Symptoms: Cannot reach docs.app.ominis.ai
Diagnosis:
# 1. Check pods
kubectl get pods -n docusaurus
# Look for: Running status, READY 1/1
# 2. Check pod logs
kubectl logs -n docusaurus -l app=docusaurus
# Look for: Nginx startup messages, no errors
# 3. Check service
kubectl get svc -n docusaurus
kubectl describe svc docusaurus -n docusaurus
# Verify: Endpoints are populated
# 4. Check endpoints
kubectl get endpoints docusaurus -n docusaurus
# Should show: Pod IPs and port 80
# 5. Test service internally
kubectl run test-curl --rm -it --image=curlimages/curl -- \
curl -I http://docusaurus.docusaurus.svc.cluster.local
# Should return: 200 OK
# 6. Check ingress
kubectl get ingress -n docusaurus
kubectl describe ingress docs-ingress -n docusaurus
# Verify: Address populated, rules correct
Common Fixes:
- Pod not running: Check image pull, resource limits
- Service not routing: Verify label selectors match
- Ingress not working: Check Traefik controller logs
Issue: Certificate Not Issued
Symptoms: HTTPS not working, certificate errors
Diagnosis:
# 1. Check certificate status
kubectl get certificate -n docusaurus
kubectl describe certificate docs-tls -n docusaurus
# Look for: Ready=True, or error messages
# 2. Check certificate request
kubectl get certificaterequest -n docusaurus
kubectl describe certificaterequest -n docusaurus
# 3. Check challenges
kubectl get challenges -n docusaurus
kubectl describe challenge -n docusaurus
# Look for: ACME challenge status
# 4. Check cert-manager logs
kubectl logs -n cert-manager -l app=cert-manager --tail=100
# Look for: Errors related to docusaurus
# 5. Verify DNS
dig docs.app.ominis.ai
# Should resolve to: Cluster ingress IP
Common Fixes:
- DNS not configured: Update DNS A record
- ClusterIssuer not found: Install cert-manager
- Rate limit: Wait or use staging issuer
- ACME challenge fails: Check firewall, ingress routing
Issue: 404 Errors
Symptoms: Some pages return 404
Diagnosis:
# 1. Exec into pod
kubectl exec -it -n docusaurus deployment/docusaurus -- sh
# 2. Check files exist
ls -la /usr/share/nginx/html/
# Should see: index.html, assets/, docs/, api/, etc.
# 3. Check nginx config
cat /etc/nginx/nginx.conf
# Verify: try_files directive correct
# 4. Test nginx config
nginx -t
# Should return: test is successful
# 5. Check specific path
ls /usr/share/nginx/html/infrastructure/docs-hosting/
Common Fixes:
- Files missing: Rebuild Docker image with correct paths
- Nginx config wrong: Update ConfigMap, restart pods
- SPA routing: Ensure
try_files $uri $uri/ /index.html
Issue: Slow Load Times
Symptoms: Pages load slowly
Diagnosis:
# 1. Check resource usage
kubectl top pods -n docusaurus
# Compare: Against resource limits
# 2. Check if caching works
curl -I https://docs.app.ominis.ai/assets/css/styles.css
# Should see: Cache-Control: public, immutable
# Expires: (1 year in future)
# 3. Check compression
curl -I https://docs.app.ominis.ai/ -H "Accept-Encoding: gzip"
# Should see: Content-Encoding: gzip
# 4. Test from different locations
curl -w "@curl-format.txt" -o /dev/null -s https://docs.app.ominis.ai/
# Measure: time_connect, time_starttransfer, time_total
Common Fixes:
- Resource limits too low: Increase CPU/memory
- Caching not working: Fix nginx cache headers
- No compression: Enable gzip in nginx config
- Network issues: Check Traefik routing, add CDN
Issue: Pod Restarts
Symptoms: Pods restarting frequently
Diagnosis:
# 1. Check restart count
kubectl get pods -n docusaurus
# Look at: RESTARTS column
# 2. Check events
kubectl describe pod -n docusaurus -l app=docusaurus
# Look for: Recent events, OOMKilled, CrashLoopBackOff
# 3. Check previous logs
kubectl logs -n docusaurus -l app=docusaurus --previous
# 4. Check resource usage
kubectl top pods -n docusaurus
# Compare: Against limits (OOMKilled if exceeded)
# 5. Check liveness probe
# In pod describe output, check probe failures
Common Fixes:
- OOMKilled: Increase memory limits
- Liveness probe failing: Adjust probe settings
- Image pull errors: Check registry authentication
- Config error: Check nginx.conf syntax
Backup and Updates
Backup Strategy
Documentation Source:
- ✅ Version controlled in Git: All markdown files
- ✅ Agentic system can regenerate: Run ./run.sh anytime
- ✅ No manual backup needed: Source of truth is code
Docker Images:
- ✅ Tagged in ghcr.io: Every build pushed with SHA tag
- ✅ Can rollback to any version: Use helm rollback
- ✅ Retention: Keep last 10 versions minimum
Kubernetes State:
- ✅ Helm manages state: Can reinstall from chart
- ✅ No persistent data: Stateless application
- ✅ Configuration in values.yaml: Version controlled
Update Procedure
Minor Documentation Updates
For small documentation changes:
# 1. Edit markdown files
cd /home/matt/projects/fml/cluster-manager/docs/
# 2. Regenerate specific section
cd ../agentic-instructions
./run-one.sh 05 # Example: Update specific agent
# 3. Rebuild Docusaurus
cd output/docusaurus
npm run build
# 4. Build and push image
cd ../../..
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
docker build -t ghcr.io/ominis-ai/docusaurus:$TIMESTAMP \
-t ghcr.io/ominis-ai/docusaurus:latest \
-f Dockerfile.docusaurus .
docker push ghcr.io/ominis-ai/docusaurus:$TIMESTAMP
docker push ghcr.io/ominis-ai/docusaurus:latest
# 5. Update deployment
cd /home/matt/projects/fml/cluster-infra
helm upgrade docusaurus helm-charts/docusaurus/ \
--namespace docusaurus \
--set image.tag=$TIMESTAMP
# 6. Verify deployment
kubectl rollout status deployment/docusaurus -n docusaurus
kubectl get pods -n docusaurus
Major Updates (New Docusaurus Version)
For Docusaurus framework updates:
# 1. Update package.json
cd agentic-instructions/output/docusaurus
npm update @docusaurus/core @docusaurus/preset-classic
# 2. Test build locally
npm run build
npm run serve # Test at http://localhost:3000
# 3. If successful, follow update procedure above
# 4. If issues, rollback
helm rollback docusaurus -n docusaurus
Emergency Rollback
# Quick rollback to previous version
helm rollback docusaurus -n docusaurus
# Or specific version
helm history docusaurus -n docusaurus
helm rollback docusaurus 3 -n docusaurus
# Verify
kubectl rollout status deployment/docusaurus -n docusaurus
Security Considerations
Container Security
Base Image: