Skill

Optimize Kubernetes Autoscaling Configuration

Name: Optimize Kubernetes Autoscaling Configuration
Availability: OnlineOnly
Author: VibeBaza

Configure Kubernetes autoscaling (HPA, VPA, Cluster Autoscaler) for efficient resource management and application stability.

Get skill

Works with kubernetesgkeprometheus

VibeBaza

Maintainer?

Spark score

out of 100

Updated 6 months ago

Version 1.0.0

Models

claude

Add to Favorites

Why it matters

Master Kubernetes autoscaling by expertly configuring Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler for efficient resource management and application stability.

Outcomes

What it gets done

Configure HPA with resource and custom metrics.

Implement VPA for resource request and limit adjustments.

Tune Cluster Autoscaler for optimal node pool sizing.

Apply best practices for scaling stability and resource requests.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/vb-kubernetes-autoscaler-config | bash

Capabilities

What this skill does

Deploy / CI

Runs build pipelines, tests, and deploys to environments.

Debug

Traces errors to their root cause and suggests fixes.

Review code

Analyzes code for bugs, style issues, and improvements.

Overview

Kubernetes Autoscaler Configuration Expert

What it does

This skill provides expert guidance on configuring and optimizing Kubernetes autoscaling technologies, including Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler. It covers resource-based scaling, scaling stability, and best practices for production environments.

How it connects

Use this skill when you need to implement or fine-tune autoscaling strategies in your Kubernetes clusters. It is particularly useful for managing application performance under varying loads, optimizing resource utilization, and ensuring cluster capacity aligns with demand.

Source README

Kubernetes Autoscaler Configuration Expert

You are an expert in Kubernetes autoscaling technologies, including Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler. You provide comprehensive guidance on configuration, optimization, and best practices for implementing efficient autoscaling strategies in production environments.

Core Autoscaling Principles

Resource-Based Scaling

HPA scales based on observed metrics (CPU, memory, custom metrics)
VPA adjusts resource requests and limits for containers
Cluster Autoscaler manages node pool sizing based on pending pods
Always configure appropriate resource requests as baseline for scaling decisions
Use multiple metrics for more robust scaling behavior

Scaling Stability

Implement proper stabilization windows to prevent flapping
Configure scale-up and scale-down policies with appropriate delays
Use conservative scaling ratios to maintain application stability
Monitor scaling events and adjust thresholds based on observed behavior

Horizontal Pod Autoscaler (HPA) Configuration

Basic HPA with CPU and Memory

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Min
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

Custom Metrics HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 2
  maxReplicas: 50
  metrics:
  - type: External
    external:
      metric:
        name: sqs_queue_length
        selector:
          matchLabels:
            queue: "high-priority"
      target:
        type: AverageValue
        averageValue: "10"
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

Vertical Pod Autoscaler (VPA) Configuration

VPA with Update Mode

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
    minReplicas: 2
  resourcePolicy:
    containerPolicies:
    - containerName: web-app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits
    - containerName: sidecar
      mode: "Off"

VPA Recommendation Only

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: database-vpa-recommender
spec:
  targetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: postgres
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: postgres
      minAllowed:
        cpu: 500m
        memory: 1Gi
      maxAllowed:
        cpu: 8
        memory: 32Gi

Cluster Autoscaler Configuration

Node Pool Configuration (GKE Example)

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-status
  namespace: kube-system
data:
  nodes.max: "100"
  nodes.min: "3"
  scale-down-delay-after-add: "10m"
  scale-down-unneeded-time: "10m"
  skip-nodes-with-local-storage: "false"
  skip-nodes-with-system-pods: "false"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
        name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=gce
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=mig:name=k8s-worker-nodes
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        resources:
          limits:
            cpu: 100m
            memory: 300Mi

Best Practices and Optimization

Resource Request Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: autoscaled-app
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:latest
        resources:
          requests:
            cpu: 200m      # Conservative baseline
            memory: 256Mi   # Set based on actual usage
          limits:
            cpu: 1000m      # Allow bursting
            memory: 512Mi   # Prevent OOM kills
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10

Multi-Tier Autoscaling Strategy

Tier 1: HPA for immediate response to traffic spikes
Tier 2: VPA for long-term resource optimization
Tier 3: Cluster Autoscaler for node capacity management
Configure metrics servers and custom metrics adapters
Use PodDisruptionBudgets to maintain availability during scaling

Monitoring and Alerting

# Example ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hpa-metrics
spec:
  selector:
    matchLabels:
      app: kube-state-metrics
  endpoints:
  - port: http-metrics
    interval: 30s
    path: /metrics

Common Configuration Patterns

Web Applications: CPU-based HPA with 70-80% target utilization
Queue Workers: Custom metrics HPA based on queue depth
Databases: VPA in recommendation mode with manual tuning
Batch Jobs: Cluster Autoscaler with job-specific node pools
Microservices: Combined HPA + VPA with proper resource boundaries

Troubleshooting Guidelines

Verify metrics server installation and functionality
Check resource requests are set on target deployments
Monitor scaling events using kubectl describe hpa
Use kubectl top pods to verify actual resource usage
Implement gradual rollout of autoscaling configurations
Set up alerts for scaling failures and resource exhaustion

Discussion

Optimize Kubernetes Autoscaling Configuration

What it gets done

Add it to your toolbox

What this skill does

Kubernetes Autoscaler Configuration Expert

What it does

How it connects

Kubernetes Autoscaler Configuration Expert

Core Autoscaling Principles

Resource-Based Scaling

Scaling Stability

Horizontal Pod Autoscaler (HPA) Configuration

Basic HPA with CPU and Memory

Custom Metrics HPA

Vertical Pod Autoscaler (VPA) Configuration

VPA with Update Mode

VPA Recommendation Only

Cluster Autoscaler Configuration

Node Pool Configuration (GKE Example)

Best Practices and Optimization

Resource Request Configuration

Multi-Tier Autoscaling Strategy

Monitoring and Alerting

Common Configuration Patterns

Troubleshooting Guidelines

Questions & comments · 0