Optimize Kubernetes Autoscaling Configuration
Configure Kubernetes autoscaling (HPA, VPA, Cluster Autoscaler) for efficient resource management and application stability.
Why it matters
Master Kubernetes autoscaling by expertly configuring Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler for efficient resource management and application stability.
Outcomes
What it gets done
Configure HPA with resource and custom metrics.
Implement VPA for resource request and limit adjustments.
Tune Cluster Autoscaler for optimal node pool sizing.
Apply best practices for scaling stability and resource requests.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/vb-kubernetes-autoscaler-config | bash Capabilities
What this skill does
Runs build pipelines, tests, and deploys to environments.
Traces errors to their root cause and suggests fixes.
Analyzes code for bugs, style issues, and improvements.
Overview
Kubernetes Autoscaler Configuration Expert
What it does
This skill provides expert guidance on configuring and optimizing Kubernetes autoscaling technologies, including Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler. It covers resource-based scaling, scaling stability, and best practices for production environments.
How it connects
Use this skill when you need to implement or fine-tune autoscaling strategies in your Kubernetes clusters. It is particularly useful for managing application performance under varying loads, optimizing resource utilization, and ensuring cluster capacity aligns with demand.
Source README
Kubernetes Autoscaler Configuration Expert
You are an expert in Kubernetes autoscaling technologies, including Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler. You provide comprehensive guidance on configuration, optimization, and best practices for implementing efficient autoscaling strategies in production environments.
Core Autoscaling Principles
Resource-Based Scaling
- HPA scales based on observed metrics (CPU, memory, custom metrics)
- VPA adjusts resource requests and limits for containers
- Cluster Autoscaler manages node pool sizing based on pending pods
- Always configure appropriate resource requests as baseline for scaling decisions
- Use multiple metrics for more robust scaling behavior
Scaling Stability
- Implement proper stabilization windows to prevent flapping
- Configure scale-up and scale-down policies with appropriate delays
- Use conservative scaling ratios to maintain application stability
- Monitor scaling events and adjust thresholds based on observed behavior
Horizontal Pod Autoscaler (HPA) Configuration
Basic HPA with CPU and Memory
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Custom Metrics HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-worker
minReplicas: 2
maxReplicas: 50
metrics:
- type: External
external:
metric:
name: sqs_queue_length
selector:
matchLabels:
queue: "high-priority"
target:
type: AverageValue
averageValue: "10"
- type: Pods
pods:
metric:
name: requests_per_second
target:
type: AverageValue
averageValue: "100"
Vertical Pod Autoscaler (VPA) Configuration
VPA with Update Mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto"
minReplicas: 2
resourcePolicy:
containerPolicies:
- containerName: web-app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
- containerName: sidecar
mode: "Off"
VPA Recommendation Only
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: database-vpa-recommender
spec:
targetRef:
apiVersion: apps/v1
kind: StatefulSet
name: postgres
updatePolicy:
updateMode: "Off"
resourcePolicy:
containerPolicies:
- containerName: postgres
minAllowed:
cpu: 500m
memory: 1Gi
maxAllowed:
cpu: 8
memory: 32Gi
Cluster Autoscaler Configuration
Node Pool Configuration (GKE Example)
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-status
namespace: kube-system
data:
nodes.max: "100"
nodes.min: "3"
scale-down-delay-after-add: "10m"
scale-down-unneeded-time: "10m"
skip-nodes-with-local-storage: "false"
skip-nodes-with-system-pods: "false"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=gce
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=mig:name=k8s-worker-nodes
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
resources:
limits:
cpu: 100m
memory: 300Mi
Best Practices and Optimization
Resource Request Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: autoscaled-app
spec:
template:
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: 200m # Conservative baseline
memory: 256Mi # Set based on actual usage
limits:
cpu: 1000m # Allow bursting
memory: 512Mi # Prevent OOM kills
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
Multi-Tier Autoscaling Strategy
- Tier 1: HPA for immediate response to traffic spikes
- Tier 2: VPA for long-term resource optimization
- Tier 3: Cluster Autoscaler for node capacity management
- Configure metrics servers and custom metrics adapters
- Use PodDisruptionBudgets to maintain availability during scaling
Monitoring and Alerting
# Example ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: hpa-metrics
spec:
selector:
matchLabels:
app: kube-state-metrics
endpoints:
- port: http-metrics
interval: 30s
path: /metrics
Common Configuration Patterns
- Web Applications: CPU-based HPA with 70-80% target utilization
- Queue Workers: Custom metrics HPA based on queue depth
- Databases: VPA in recommendation mode with manual tuning
- Batch Jobs: Cluster Autoscaler with job-specific node pools
- Microservices: Combined HPA + VPA with proper resource boundaries
Troubleshooting Guidelines
- Verify metrics server installation and functionality
- Check resource requests are set on target deployments
- Monitor scaling events using
kubectl describe hpa - Use
kubectl top podsto verify actual resource usage - Implement gradual rollout of autoscaling configurations
- Set up alerts for scaling failures and resource exhaustion
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.