Improve Kubernetes Performance Without Increasing Resources

Explore top LinkedIn content from expert professionals.

Summary

Improving Kubernetes performance without increasing resources means making your applications and clusters run faster and more reliably by removing bottlenecks, balancing workloads, and reducing waste—without allocating extra CPU, memory, or hardware. Instead of just scaling up, the focus is on smarter engineering, monitoring, and fine-tuning to get the most out of what you already have.

  • Identify hidden delays: Break down where time is spent during requests, such as database calls or network hops, and address the biggest sources of slowdown through smart engineering changes.
  • Audit resource settings: Review and adjust CPU, memory, and storage allocations so they match real-world usage, cleaning up forgotten or idle workloads and automating resource management.
  • Balance workloads automatically: Use tools like Kubernetes Descheduler and API Priority and Fairness to redistribute pods and maintain fair access, preventing overload and making sure every component gets the resources it needs.
Summarized by AI based on LinkedIn member posts
  • View profile for sukhad anand

    Senior Software Engineer @Google | Techie007 | Opinions and views I post are my own

    105,678 followers

    Everyone talks about scalability. Very few talk about where the latency is hiding. I once worked on a system where a single API call took ~450ms. The team kept trying to “scale the service” by adding more replicas. Pods were multiplied. Autoscaling was tuned. Dashboards were made fancier. But the request still took ~450ms. Because the problem was never about scale. It was this: - 180ms spent waiting on a downstream service. - 120ms on a database round-trip over a noisy network hop. - 80ms wasted in JSON -> DTO -> Internal Model conversions. - 40ms in logging + metrics I/O. - The actual business logic: ~15ms. We were scaling the symptom, not the cause. Optimizing that request had nothing to do with distributed systems wizardry. It was mostly about treating latency as a budget, not as a consequence. Here’s the framework we used that changed everything: - Latency Budget = Time Allowed for Request - Breakdown = Where That Time Is Actually Spent - Gap = Budget - Breakdown And then we asked just one question: “What is the single biggest chunk of time we can remove without changing the system’s behavior?” This is what we ended up doing: - Moved DB calls to a closer subnet (dropped ~60ms) - Cached the downstream call response intelligently (saved ~150ms) - Switched internal models to protobuf (saved ~40ms) - Batched our metrics (saved ~20ms) The API dropped to ~120ms. Without more servers. Without more Kubernetes magic. Just engineering clarity. 🚀 Scalability isn’t just about adding compute. It’s about understanding where the time goes. Most “slow” systems aren’t slow. They’re just unobserved.

  • View profile for Deepak Agrawal

    Founder & CEO @ Infra360 | DevOps, FinOps & CloudOps Partner for FinTech, SaaS & Enterprises

    18,220 followers

    Over the last 1 year, we helped 15+ companies cut their cloud bills by 30-40% in 45 days (without a single new tool). Here’s what most cloud teams don’t realize: ❌ You don’t have a cost problem. ✅ You have a waste problem hidden in plain sight. We attacked the invisible waste buried deep in their Kubernetes clusters: 1. 𝐑𝐞𝐪𝐮𝐞𝐬𝐭𝐬 𝐚𝐧𝐝 𝐋𝐢𝐦𝐢𝐭𝐬 𝐖𝐞𝐫𝐞 𝐒𝐞𝐭… 𝐚𝐧𝐝 𝐅𝐨𝐫𝐠𝐨𝐭𝐭𝐞𝐧 Developers set inflated CPU/memory limits “just in case” and never revisited them. We ran real-time profiling using Prometheus + Grafana and recalibrated limits based on actual sustained usage. This alone brought down cluster size by 15-20%. 2. 𝐍𝐨𝐧-𝐏𝐫𝐨𝐝 𝐄𝐧𝐯𝐢𝐫𝐨𝐧𝐦𝐞𝐧𝐭𝐬 𝐖𝐞𝐫𝐞 𝐓𝐫𝐞𝐚𝐭𝐞𝐝 𝐋𝐢𝐤𝐞 𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧 Dev, QA, and Staging environments ran on on-demand instances (24/7). We moved them to spot instances with scheduled shutdowns during non-working hours. That delivered 18-22% savings instantly. 3. 𝐀𝐮𝐭𝐨𝐬𝐜𝐚𝐥𝐞𝐫𝐬 𝐖𝐞𝐫𝐞 𝐌𝐢𝐬𝐜𝐨𝐧𝐟𝐢𝐠𝐮𝐫𝐞𝐝 𝐨𝐫 𝐉𝐮𝐬𝐭 𝐈𝐝𝐥𝐞 Most teams rely purely on CPU-based HPA, which reacts too late. We introduced custom scaling triggers based on business KPIs like request queue lengths, job backlogs, and latency. The result? Clusters scaled proactively, not reactively. 4. 𝐙𝐨𝐦𝐛𝐢𝐞 𝐏𝐨𝐝𝐬 𝐚𝐧𝐝 𝐅𝐨𝐫𝐠𝐨𝐭𝐭𝐞𝐧 𝐑𝐞𝐬𝐨𝐮𝐫𝐜𝐞𝐬 𝐄𝐯𝐞𝐫𝐲𝐰𝐡𝐞𝐫𝐞 One client had 300+ idle pods running outdated builds (nobody knew why). We implemented automated cleanup jobs using lifecycle policies and kubectl prune scripts. That reduced node count immediately. 5. 𝐕𝐞𝐫𝐭𝐢𝐜𝐚𝐥 𝐏𝐨𝐝 𝐀𝐮𝐭𝐨𝐬𝐜𝐚𝐥𝐞𝐫 (𝐕𝐏𝐀) 𝐖𝐚𝐬𝐧’𝐭 𝐄𝐯𝐞𝐧 𝐄𝐧𝐚𝐛𝐥𝐞𝐝 VPA handled unpredictable workloads far better than manual tuning.   For stateful apps with variable patterns, this reduced over-provisioning by up to 25% while maintaining SLAs. 6. 𝐏𝐞𝐫𝐬𝐢𝐬𝐭𝐞𝐧𝐭 𝐕𝐨𝐥𝐮𝐦𝐞 𝐂𝐥𝐚𝐢𝐦𝐬 (𝐏𝐕𝐂𝐬) 𝐖𝐞𝐫𝐞 𝐚 𝐁𝐥𝐚𝐜𝐤 𝐇𝐨𝐥𝐞 Storage costs were silently draining budgets. We audited PVC usage, downgraded unnecessary high-IOPS gp2 volumes to gp3, and cleaned up stale volumes. For one client, this alone saved over $30,000 annually. Before you buy another cloud cost management tool, ask yourself… Have you really optimized what you already own? ♻️ 𝐑𝐄𝐏𝐎𝐒𝐓 𝐒𝐨 𝐎𝐭𝐡𝐞𝐫𝐬 𝐂𝐚𝐧 𝐋𝐞𝐚𝐫𝐧.

  • View profile for Vikash Kumar

    Senior Platform Engineer | Ex-Intel | DevOps Architect | Specializing in Multi-Cloud, AI/ML & Kubernetes | Mentor & Tech Content Creator

    8,515 followers

    🚀 𝐇𝐨𝐰 𝐖𝐞 𝐂𝐮𝐭 𝐊𝐮𝐛𝐞𝐫𝐧𝐞𝐭𝐞𝐬 𝐂𝐨𝐬𝐭𝐬 𝐛𝐲 60% 𝐖𝐢𝐭𝐡𝐨𝐮𝐭 𝐃𝐨𝐰𝐧𝐭𝐢𝐦𝐞 Cloud costs were skyrocketing, and after a deep dive, I found hidden inefficiencies bleeding our budget. 🔥 𝐓𝐨𝐩 𝐊𝐮𝐛𝐞𝐫𝐧𝐞𝐭𝐞𝐬 𝐂𝐨𝐬𝐭 𝐂𝐮𝐥𝐩𝐫𝐢𝐭𝐬 𝐖𝐞 𝐅𝐨𝐮𝐧𝐝: ✅ Idle workloads running 24/7—even when no one was using them. ✅ Over-provisioned CPU & memory, wasting compute power. ✅ Unoptimized autoscaling, keeping expensive nodes active. ✅ Orphaned resources—Persistent Volumes, Load Balancers, and Zombie Pods. ✅ Mismanaged Spot Instances, leading to unexpected evictions & higher on-demand costs. ✅ Excessive network egress charges, especially from cross-region traffic. 🔍 𝐇𝐞𝐫𝐞’𝐬 𝐇𝐨𝐰 𝐖𝐞 𝐅𝐢𝐱𝐞𝐝 𝐈𝐭 & 𝐒𝐥𝐚𝐬𝐡𝐞𝐝 𝐂𝐨𝐬𝐭𝐬 𝐛𝐲 60% 1️⃣ 𝑺𝒎𝒂𝒓𝒕𝒆𝒓 𝑨𝒖𝒕𝒐𝒔𝒄𝒂𝒍𝒊𝒏𝒈: 𝑲𝒂𝒓𝒑𝒆𝒏𝒕𝒆𝒓 + 𝑽𝑷𝑨 + 𝑯𝑷𝑨 ✅ Replaced Cluster Autoscaler with Karpenter for faster & cost-aware node provisioning. ✅ Used Vertical Pod Autoscaler (VPA) to automatically adjust CPU/memory requests. ✅ Optimized Horizontal Pod Autoscaler (HPA) to scale pods dynamically based on actual traffic patterns. 2️⃣ 𝑺𝒄𝒉𝒆𝒅𝒖𝒍𝒆𝒅 & 𝑶𝒏-𝑫𝒆𝒎𝒂𝒏𝒅 𝑾𝒐𝒓𝒌𝒍𝒐𝒂𝒅𝒔 𝒘𝒊𝒕𝒉 𝑲𝑬𝑫𝑨 & 𝑨𝒓𝒈𝒐 𝑾𝒐𝒓𝒌𝒇𝒍𝒐𝒘𝒔 ✅ Used KEDA to spin up workloads only when needed—no more idle background jobs. ✅ Moved non-critical workloads to Argo Workflows, reducing long-running container costs. ✅ Paused dev/test clusters automatically after work hours using custom automation. 3️⃣ Cleaning Up Wasted Resources (Automated) ✅ Ran kubectl top & Kubecost to find & kill over-provisioned workloads. ✅ Created a Garbage Collector Controller to detect & delete: 🔹 Orphaned PVs & PVCs (saved ~$2,000/month). 🔹 Unused Load Balancers & Ingresses. 🔹 Zombie Services & stale Helm releases. 4️⃣ Network Cost Optimization: Egress & Load Balancers ✅ Reduced cross-region traffic by keeping microservices in the same availability zone. ✅ Used Cilium for service-to-service communication, avoiding unnecessary egress charges. ✅ Optimized Load Balancers with Ingress NGINX & Internal Load Balancers to cut external traffic costs. 5️⃣ Smarter Spot Instance Management with Karpenter & Ocean by Spot ✅ Used Karpenter to prioritize Spot Instances while ensuring fallback to On-Demand only when needed. ✅ Implemented Spot.io Ocean to dynamically move workloads across instance types for better cost efficiency. 🔥 The Impact ✅ Cloud spend dropped from $15,000 → $6,000 per month ✅ Zero downtime for production workloads ✅ Automated alerts for cost anomalies & resource spikes 💡 Pro Tip: Don’t Just Look at Nodes! 🔹 Check for unused Persistent Volumes & Load Balancers 🔹 Optimize network traffic to reduce egress costs 🔹 Automate workload shutdowns when idle 💬 Want access to the YAMLs & automation scripts we used? Drop a comment, and I’ll share the GitHub repo! #Kubernetes #CloudCostOptimization #DevOps #FinOps #K8s #CloudComputing #SRE #Observability #CostReduction #KEDA #Karpenter #Kubecost

  • View profile for Jayas Balakrishnan

    Director Solutions Architecture & Hands-On Technical/Engineering Leader | 8x AWS, KCNA, KCSA & 3x GCP Certified | Multi-Cloud

    3,020 followers

    𝗘𝘃𝗲𝗿 𝗵𝗮𝗱 𝗼𝗻𝗲 𝗺𝗶𝘀𝗯𝗲𝗵𝗮𝘃𝗶𝗻𝗴 𝗮𝗽𝗽 𝗯𝗿𝗶𝗻𝗴 𝘆𝗼𝘂𝗿 𝗲𝗻𝘁𝗶𝗿𝗲 𝗞𝘂𝗯𝗲𝗿𝗻𝗲𝘁𝗲𝘀 𝗰𝗹𝘂𝘀𝘁𝗲𝗿 𝘁𝗼 𝗶𝘁𝘀 𝗸𝗻𝗲𝗲𝘀? In multi-tenant Kubernetes environments, especially where tenants or custom controllers interact directly with the API server, this happens more often than we'd like to admit. One tenant's flood of API requests can starve critical components, leading to cascading cluster-wide failures. This is where Kubernetes API Priority and Fairness (APF) becomes your control plane's guardian. Unlike basic max-in-flight settings, APF intelligently classifies and prioritizes API requests using:  • 𝗙𝗹𝗼𝘄𝗦𝗰𝗵𝗲𝗺𝗮𝘀: Categorize requests by user, namespace, resource type, or verb.  • 𝗣𝗿𝗶𝗼𝗿𝗶𝘁𝘆𝗟𝗲𝘃𝗲𝗹𝗖𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗶𝗼𝗻𝘀: Allocate a share of the API server's total concurrency capacity to each priority level. 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹 𝗺𝗮𝗴𝗶𝗰? APF uses a fair-queuing algorithm to prevent any single flow from monopolizing resources. Depending on your configuration, it can handle traffic bursts by queuing requests or, if set otherwise, immediately rejecting excess requests with a 429 error. For platform teams, implementing APF properly means:  • Thanks to their default high-priority settings, essential system components (like controllers and leader election) remain operational during overload.  • Each tenant or workload gets a fair share of API server resources, reducing the risk of noisy neighbors.  • Traffic bursts can be handled gracefully or rejected quickly, according to your needs.  • Critical operations always have priority. 𝗔 𝗳𝗲𝘄 𝗶𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝘁 𝗻𝗼𝘁𝗲𝘀:  • Some long-running requests (exec, logs, and watch operations) are exempt from APF limits.  • APF is enabled by default in Kubernetes 1.20+, but default settings may require tuning for your specific workloads and multi-tenant use cases. In a production clusters, a well-tuned APF configuration can transform how you handle multi-tenant environments, ensuring service reliability even under extreme load. #AWS #awscommunity #kubernetes

  • View profile for Trenton VanderWert

    Kubernetes and Cloud Native Engineer || Ex-Rancher || Ex-Amazon

    4,376 followers

    One of my big complaints with Kubernetes is the lack of auto-balancing of resources. If a node goes down - pods will reschedule on active nodes to meet up requirements. But once the node comes up Kubernetes wont re-balance the scheduling to the node again. Instead it will keep the workloads where they are. This is understandable as (sadly) a lot of applications running in kubernetes clusters today probably have no business running in kubernetes at all (extremely stateful). But Kubernetes is pretty mature and I'm certainly not the only person to share this complaint about kubernetes scheduling - this is where the kubernetes-descheduler comes into the picture: https://lnkd.in/gD3Xv7Ca This application allows for balancing rules and will kill pods on nodes that are over-provisioned. This is very useful for stateless applications that can afford a bit of backend turbulence in order to better utilize hardware. The descheduler uses a variety of sources such as the Metrics Controller and even prometheus data to determine the utilization of nodes and the workloads on them. It then uses policies enforced by the CRDs to flatten the workloads across the clusters. The Descheduler uses the Kubernetes Scoring strategy built into the scheduler to determine if a change is needed such as NodeResourceFit and NodeResourcesBalancedAllocation (https://lnkd.in/eU9zmtPs) which is used by the scheduler for making decisions around scheduling a NEW workload. The Descheduler in turn looks at if it can effectively rebalance when a policy violation is met. Here is an example policy: apiVersion: "descheduler/v1alpha2" kind: "DeschedulerPolicy" profiles: - name: ProfileName pluginConfig: - name: "LowNodeUtilization" args: thresholds: "memory": 20 targetThresholds: "memory": 70 plugins: balance: enabled: - "LowNodeUtilization" This policy will find workloads running on nodes with over 70% memory utilization and rebalance them to nodes with less than 20% utilization. This assures the maximal optimization of your horizontal scaling strategy! Happy hacking!

  • View profile for Jaswindder Kummar

    Engineering Director | Cloud, DevOps & DevSecOps Strategist | Security Specialist | Published on Medium & DZone | Hackathon Judge & Mentor

    22,492 followers

    𝐌𝐨𝐬𝐭 𝐓𝐞𝐚𝐦𝐬 𝐎𝐯𝐞𝐫𝐬𝐩𝐞𝐧𝐝 𝟕𝟎%+ 𝐨𝐧 𝐊𝐮𝐛𝐞𝐫𝐧𝐞𝐭𝐞𝐬 𝐖𝐢𝐭𝐡𝐨𝐮𝐭 𝐑𝐞𝐚𝐥𝐢𝐳𝐢𝐧𝐠 𝐈𝐭. 𝐇𝐞𝐫𝐞 𝐚𝐫𝐞 𝟔 𝐓𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬 𝐭𝐡𝐚𝐭 𝐜𝐮𝐭 𝐨𝐮𝐫 𝐊𝟖𝐬 𝐛𝐢𝐥𝐥 𝐟𝐫𝐨𝐦 $𝟓𝟎𝐊 𝐭𝐨 $𝟏𝟓𝐊 𝐦𝐨𝐧𝐭𝐡𝐥𝐲: 𝟏. 𝐑𝐈𝐆𝐇𝐓 𝐒𝐈𝐙𝐈𝐍𝐆 - Analyze real CPU/memory usage - Adjust container requests/limits accordingly - Stop paying for unused capacity Impact: 60% resource reduction with zero performance loss 𝟐. 𝐄𝐅𝐅𝐈𝐂𝐈𝐄𝐍𝐓 𝐀𝐔𝐓𝐎 𝐒𝐂𝐀𝐋𝐈𝐍𝐆 - Cluster Autoscaler + HPA + KEDA - Scale nodes and pods on actual demand - Workload-driven, not predictions Impact: 80% weekend cost reduction when traffic drops 𝟑. 𝐏𝐎𝐃 𝐃𝐈𝐒𝐑𝐔𝐏𝐓𝐈𝐎𝐍 𝐁𝐔𝐃𝐆𝐄𝐓 (𝐏𝐃𝐁) - Define minimum pods during disruptions - Prevents over-provisioning for HA - Balance availability with cost Impact: 50% replica count reduction while maintaining SLAs 𝟒. 𝐍𝐎𝐃𝐄 𝐓𝐀𝐈𝐍𝐓𝐈𝐍𝐆 & 𝐓𝐎𝐋𝐄𝐑𝐀𝐓𝐈𝐎𝐍 - Taint expensive nodes for specific workloads - GPU/high-memory for intensive tasks only - Cheaper nodes for regular services Impact: $8K/month saved on GPU scheduling 𝟓. 𝐂𝐎𝐍𝐓𝐀𝐈𝐍𝐄𝐑 𝐈𝐌𝐀𝐆𝐄 𝐎𝐏𝐓𝐈𝐌𝐈𝐙𝐀𝐓𝐈𝐎𝐍 - Minimal base images (Alpine, Distroless) - Multi-stage builds, remove dependencies - Layer caching Impact: 1.2GB → 200MB images, 6x faster deployments 𝟔. 𝐒𝐏𝐎𝐓 𝐈𝐍𝐒𝐓𝐀𝐍𝐂𝐄𝐒 - Fault-tolerant workloads on spot - 70-90% infrastructure savings - Graceful interruption handling Impact: 85% compute cost reduction for batch jobs Quick Wins: - Right-size containers - Enable autoscaling - Switch to spot instances Tools: Kubecost, Goldilocks, KEDA, Karpenter Formula: Right-Sizing (30%) + Autoscaling (40%) + Spot (60%) + Images (10%) = 70%+ savings Truth: K8s isn't expensive—default configs are. Which technique gave you biggest savings? ♻️ Repost to help your network ➕ Follow Jaswindder for more #Kubernetes #DevOps #FinOps

  • View profile for Thiruppathi Ayyavoo

    🚀 |Cloud & DevOps Advocate|Application Support Engineer |PIAM|Broadcom Automic Batch Operation|Zerto Certified Associate|

    3,584 followers

    Post 19: Real-Time Cloud & DevOps Scenario Scenario: Your organization’s Kubernetes-based microservices faced a production outage due to a misconfigured pod overusing CPU and memory, causing resource starvation. As a DevOps engineer, your task is to prevent such issues and maintain system stability. Step-by-Step Solution: Set Resource Requests and Limits: Define resources.requests and resources.limits in pod specifications to control CPU and memory usage. Example: yaml Copy code resources: requests: memory: "500Mi" cpu: "250m" limits: memory: "1Gi" cpu: "500m" Enable Namespace Resource Quotas: Use ResourceQuota objects to restrict the total resource consumption within a namespace. Example: yaml Copy code apiVersion: v1 kind: ResourceQuota metadata: name: namespace-quota spec: hard: requests.cpu: "4" requests.memory: "8Gi" limits.cpu: "8" limits.memory: "16Gi" Leverage Horizontal Pod Autoscaler (HPA): Use HPA to scale pods dynamically based on CPU, memory, or custom metrics. Example: yaml Copy code apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: example-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 80 Implement Pod Priority and Preemption: Assign priority classes to pods to ensure critical workloads get resources during contention. Example: yaml Copy code apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000 globalDefault: false description: "Priority for critical workloads" Monitor and Analyze Resource Usage: Use tools like Prometheus, Grafana, or Kubernetes Metrics Server to monitor CPU and memory usage trends. Set up alerts for resource usage thresholds. Implement Node Affinity and Taints: Use node affinity and taints/tolerations to distribute workloads effectively across nodes, avoiding resource bottlenecks. Audit Configurations Regularly: Periodically review and update resource configurations for pods and namespaces. Conduct load tests to validate performance under different conditions. Enable Cluster Autoscaler: Use Cluster Autoscaler to add or remove nodes dynamically based on overall resource demand.This ensures sufficient capacity during peak loads. Outcome: Improved resource allocation prevents single pod failures from impacting other services. The system becomes more resilient and scales dynamically based on demand. 💬 How do you handle resource contention in your Kubernetes clusters? Let’s discuss strategies in the comments! ✅ Follow Thiruppathi Ayyavoo for daily real-time scenarios in Cloud and DevOps. Together, we learn and grow! #DevOps #Kubernetes #CloudComputing #ResourceManagement #Containers #HorizontalPodAutoscaler #RealTimeScenarios #CloudEngineering #LinkedInLearning #careerbytecode #thirucloud #linkedin #USA CareerByteCode

  • View profile for Matteo Collina

    Platformatic Co-Founder & CTO, Node.js Technical Steering Committee member, Fastify Lead Maintainer, Conference Speaker

    18,925 followers

    We just made Next.js 93% faster in Kubernetes. Median latency dropped from 182ms to 11.6ms, and success rates jumped from 91.9% to 99.8%. The solution was surprisingly simple: stop fighting the Linux kernel and start working with it. If you run Node.js at scale, you know the pain. Traffic spikes cause some pods to max out at 100% CPU while others idle at 30%. You overprovision to compensate, your cloud bill explodes, but the problem persists. Traditional approaches are broken. PM2 adds 30% IPC overhead for worker coordination. Single-CPU pods create isolated queues where one pod drowns while another sits idle. We solved this with Watt, the Node.js application server, leveraging SO_REUSEPORT, a kernel feature introduced in 2013 that almost nobody uses properly. Instead of master-worker coordination, the kernel distributes connections directly. Zero overhead, pure efficiency. The AWS EKS benchmarks under 1000 req/s load tell the story. With identical 6 CPU resources, single-CPU pods hit 155ms median latency, PM2 reached 182ms, while Watt delivered 11.6ms. At P95, Watt stays at 235ms versus PM2's 1260ms. That's not marginal improvement, that's transformative. In e-commerce, the difference between 182ms and 11.6ms is the difference between a sale and an abandoned cart. Every 100ms of latency measurably impacts conversion rates. Implementation is trivial. From PM2, remove ecosystem files and set worker count. From single-CPU pods, reduce pod count and increase CPU per pod. No code changes, just better architecture. This works for any CPU-bound Node.js workload. GraphQL servers, API gateways, SSR frameworks. If you're running Node in Kubernetes, you're leaving performance on the table. Watt is open source, production-ready, and already delivering these results at scale. 93.6% faster latency, 99.8% reliability, 9.6% more throughput with the same resources. Full technical deep dive at our blog, code at https://lnkd.in/dsmneTBt

  • View profile for Indu Tharite

    Senior SRE | DevOps Engineer | AWS, Azure, GCP | Terraform| Docker, Kubernetes | Splunk, Prometheus, Grafana, ELK Stack |Data Dog, New Relic | Jenkins, Gitlab CI/CD, Argo CD | TypeScript | Unix, Linux | AI/ML, LLM| GenAI

    4,949 followers

    In traditional Kubernetes autoscaling, scaling is often tied to CPU and memory thresholds. But real-world workloads don’t always spike in predictable patterns. We needed a way to scale based on external event metrics-like message queue length, API request rates, or database lag. That’s where KEDA (Kubernetes Event-Driven Autoscaler) came in. Real-World Implementation Use Case: Autoscale Kubernetes workloads based on custom metrics like Prometheus alerts, Kafka lag, and SQS message depth. Execution: Deployed KEDA as a lightweight controller in our EKS cluster Defined ScaledObjects with custom Prometheus queries as event sources Integrated with external systems (Kafka, Redis, AWS SQS, PostgreSQL) using KEDA scalers Tuned cooldown periods, polling intervals, and scale target thresholds per workload type Monitored metrics using Grafana, confirmed responsiveness in production spikes Used Metrics Server and Prometheus Adapter to bridge HPA requirements with KEDA triggers Benefits Realized Enabled fine-grained autoscaling for asynchronous and background jobs Reduced idle pod costs in low-traffic windows by over 60% Ensured instant scale-up during peak event load-no need for pre-provisioned buffers Centralized scaling logic into GitOps-managed ScaledObjects Achieved tighter alignment between actual demand and resource provisioning Event-driven scaling helped us optimize cost, performance, and resource efficiency in a unified Kubernetes-native model. Tools Used KEDA, Kubernetes, Prometheus, Metrics Server, Grafana, Kafka, SQS, Redis, PostgreSQL, ScaledObject, Helm #Kubernetes #KEDA #Autoscaling #EventDrivenArchitecture #SRE #CloudNative #Prometheus #Kafka #AWS #Redis #PostgreSQL #PlatformEngineering #GitOps #CI_CD #Helm #MetricsServer #JobSearch #Observability #SiteReliabilityEngineering #InfrastructureAsCode #Scalability #CloudEfficiency #TechCareers #SREJobs #DevOpsJobs #C2C

  • View profile for Henrik Rexed

    CNCF Ambassador, Cloud Native Advocate at Dynatrace, Owner of IsitObservable

    6,430 followers

    🌟 New Episode Released! Automating Kubernetes Resource Optimization with Smartscape V2 🚀 Over the last few month, as soon as Smartscape V2 landed in dev, I started playing with it… and wow this changes the game. Yes, the new UI is great. But what really unlocks its power is this:  -> everything is now queryable. Every cluster, node, namespace, workload, pod, container and even the manifest itself is now accessible through DQL. And that opened the door for something I’ve wanted to do for a long time. 💡 I built a workflow that automatically opens PRs to right‑size CPU & memory requests. Here’s the idea: 👉 Look at the last 7 days of real usage 👉 Detect workloads that are over‑provisioned 👉 Calculate a recommended value with a safety buffer 👉 Patch the manifest 👉 Automatically create a GitHub Pull Request with the updated resource requests No more guessing. No more manually hunting down slack. And absolutely no more wasted nodes because everything was over-requested “to be safe.” 😅 You could easily run this workflow once a week and continuously optimize your cluster . GitOps style, fully auditable, and based on real observability data. And the best part? All of this is insanely easy now that Smartscape V2 makes the entire K8s topology fully queryable. 🎥 In this episode, I walk through: - How Smartscape V2 stores & exposes K8s entity data - The DQL queries used to detect underutilized workloads - How to fetch manifests directly from the graph - How the workflow generates patches - How the GitHub PR is created If you're running Kubernetes at scale, this is one of those “why didn’t we have this sooner?” moments. 🔗 Watch the full episode here: https://lnkd.in/dcdsrPxT 💬 If you have ideas for other automations : security checks, compliance scans, anomaly-based tuning, drop them in the comments. I might build them next. 😉

Explore categories