Improving Code Generation Using Context Curation

Explore top LinkedIn content from expert professionals.

Summary

Improving code generation using context curation means providing AI coding tools with the most relevant information from your codebase to help them write, debug, or review code more accurately and quickly. Instead of overwhelming the AI with all available data, context curation organizes, filters, and presents only the important details, making the AI more precise and resource-efficient.

Curate carefully: Regularly review and update context files to include only information that the AI can't easily discover, like hidden dependencies or custom rules.
Compress intelligently: Use smart retrieval methods to filter and compress large codebases, so the AI receives just the crucial sections needed for each task.
Refine iteratively: Continuously improve your context management by analyzing feedback from code generation outcomes and restructuring the code or rules as needed.

Summarized by AI based on LinkedIn member posts

Philipp Schmid

AI Developer Experience at Google DeepMind 🔵 prev: Tech Lead at Hugging Face, AWS ML Hero 🤗 Sharing my own views and AI News

165,274 followers 6mo
Report this post
Is ACE the next Context Engineering Technique? ACE (Agentic Context Engineering) is a new framework that beats current state-of-the-art optimizers like GEPA by treating context as an evolving, structured space of accumulated knowledge. What is ACE? ACE treats context as an evolving space rather than a static prompt. Instead of rewriting the entire context it manages it as a collection of discrete, structured items (strategies, code snippets, error handlers) that are incrementally accumulated, refined, and organized over time based on performance feedback. ACE vs. GEPA (Current SOTA) GEPA (Genetic-Pareto) is a popular method that uses evolutionary algorithms to iteratively rewrite and optimize prompts for brevity and general performance, but it can suffer from "brevity bias" and "context collapse", erasing specific, detailed heuristics needed for complex domain tasks. ACE builds a comprehensive context. It prioritizes retaining detailed domain insights and uses non-LLM logic to manage context growth, ensuring that hard-learned constraints and edge-case strategies are preserved rather than summarized away. How it works: 1️⃣ Three components: a Generator (to solve tasks), a Reflector (to analyze outcomes), and a Curator (to manage the context). 2️⃣ The Generator attempts a task using the current context, creates a reasoning trajectory and environment feedback (e.g., code execution results). 3️⃣ The Reflector provides feedback to extract concrete insights, identifying successful tactics or root causes of errors. 4️⃣ The Curator synthesizes these into structured, itemized "delta" entries (specific additions or edits to knowledge bullets). 5️⃣ Programmatically merge these delta updates into the context, ensuring the context grows and refines incrementally for the next task. Insights: 💡 GEPA optimize for concise prompts, ACE prioritizes comprehensive, detailed context. 📈 ACE outperformed baselines by +10.6% on agentic benchmarks and +8.6% on complex financial reasoning. 📚 ACE's incremental "delta" update approach reduced adaptation latency by an average of 86.9% compared to methods that rewrite full prompts. 📝 Generator, Reflector and Curator Prompts are part of the paper appendix. Paper: https://lnkd.in/eBknvYcR
No more previous content

No more next content
14 Comments
Like Comment
Mohammad Ghodratigohar (MG)

Staff AI Architect @ Google | Ex-Microsoft

7,033 followers 6mo
Report this post
The #1 Killer of LLM Coding Performance? It's not the model; it's the noise in your Large Code prompt! It's a developer tax we all pay: Quadratic complexity in transformer attention means long code prompts waste time, tank efficiency, and cost more. When you paste a huge codebase hoping your LLM will debug or generate new code, it often struggles to find the signal in the noise, leading to decreased performance. And forget using traditional RAG, lexical similarity fails for complex code dependencies. The Solution is Not a Bigger Context Window (It's a Compressor) New research on Long Code Zip proves we don't need trillion-token context limits; we need smarter retrieval. This approach achieves up to 5.6x code compression without sacrificing the LLM’s generation accuracy. How does it beat RAG for code? It uses Approximate Mutual Information (AMI): Instead of simple embedding search, it measures the perplexity shift when a function is added to the prompt. If the function significantly increases the model’s certainty, it's relevant. It filters in two stages: Coarse-Grained: Filters relevant functions using the AMI score. Fine-Grained: Filters relevant sub-blocks within those functions by tracking perplexity spikes per line of code. The result? You give your LLM a fraction of the code, but with all the crucial context, leading to an almost 2X speedup in latency and significant cost savings. If you’re building with LLMs over large code repositories and tired of paying the price for massive context windows, this is the architecture you need to know about. I break down the mechanics, the performance metrics (including benchmarks) , and show you how to start using the open-source repo today. Video Link in the comments!👇

2 Comments
Like Comment
Raphaël MANSUY

Data Engineering | DataScience | AI & Innovation | Author | Follow me for deep dives on AI & data-engineering

33,960 followers 1y
Report this post
How Context Awareness Transforms AI Programming Tools ... 👉 Why Context Matters More Than You Think What if the key to better AI code assistants lies not in bigger models, but in "smarter context usage"? A new systematic review of 146 studies reveals that leveraging contextual signals (like API docs, code dependencies, or compiler feedback) can boost performance in code tasks by up to "206%" compared to isolated approaches. Yet, most systems today underutilize these signals, leading to: - Hallucinated API calls - Missed cross-file dependencies - Generic commit messages 👉 What We Now Understand About Context The paper establishes the first taxonomy of context types in code intelligence: 1. Direct context: Source code, API docs, bug reports 2. Indirect context: Abstract syntax trees (ASTs), control flow graphs, IDE usage patterns Key findings: - Code completion benefits most from repository structure context - Bug detection requires multi-layered context (code changes + historical reports) - 52% of LLM-based code generation errors stem from "ignoring project-specific patterns" 👉 How Leading Teams Are Implementing Context Three emerging best practices: 1. Hybrid retrieval: Combine static analysis with neural retrieval to gather relevant code snippets 2. Iterative refinement: Use compiler feedback to progressively improve generated code 3. Task-specific modeling: Clone detection benefits from semantic context (e.g., API docs), while commit messages need code change histories The rise of LLMs introduces new patterns—31 studies now use retrieval-augmented generation to pull contextual clues from entire repositories. 👉 Critical Gaps and Opportunities Despite progress, major challenges remain: - No standard benchmarks for evaluating context-aware systems - Limited reuse of preprocessing pipelines (only 35% of tools are open-source) - Under-explored synergies between compiler data and API documentation The authors propose a roadmap focusing on: - Context-aware evaluation metrics - Cross-language generalization - Unified frameworks for combining multiple context types 👉 Why This Matters This survey provides a missing foundation for developing code tools that truly understand "your" codebase—not just generic patterns. As AI-assisted coding evolves, systematic context utilization could bridge the gap between prototype demos and production-ready tools. P.S. How is your team handling context in code intelligence systems? Share your experiences below.

4 Comments
Like Comment
Patrick Jean

CTO | Advisor | Founder Builder | Growth Leader

8,076 followers 9mo
Report this post
Context Engineering: quietly becoming one of the most important design problems in AI-native development. We’re watching something interesting happen in real time: LLMs are getting smarter—faster reasoning, better planning, more fluid code generation. But they’re still not remembering well. So instead of waiting for “memory” to mature, we’re building systems around that gap. That’s where Context Engineering comes in. A common approach from teams is to start with AI-assisted coding or AI PR reviews, codify rules and guidance into monorepo or something highly accessible to the LLM, then achieve decent results with AI pair programming. Along the way prune dead code and docs and improve your system architecture where you can. That's essentially level 1 context engineering. ⸻ So what is Context Engineering? It’s shaping, curating, and delivering relevant information to a language model at inference time, so it can behave more intelligently—without actually learning or remembering anything long-term. We see this everywhere in dev tools now: • Cursor: thread-local + repo-aware context stacking • Claude Code: claude.md as a persistent summary of dev history • Prompt-engineered PRDs living next to source files • Custom eval + test suites piped into the session as scaffolding • Vector stores / RAG / MCP servers acting like external memory prosthetics I use all of these in my developer workflow day to day. It's a constant time and effort commitment to optimize for the best context for my AI coding assistants - currently using Claude Code primarily with Cursor as backup. Agents running in GitHub are Copilot with some autonomous troubleshooting with GPT-4.1. We're basically tricking the LLM, with a lot of effort, into feeling like it remembers, by embedding the right context at the right time—without overwhelming its token window. ⸻ Why this matters -> LLMs today are like brilliant interns with no long-term memory: You get great results if you prep them with your wisdom but constrain the thinking boundaries. -> Context Engineering becomes the new “prompt discipline”—but across system design, repo architecture, and real-time tooling. We’re not teaching models to remember (yet). This is a major AI gap-and something we're working on at momentiq. We’re teaching ourselves how to communicate, in a relatively inefficient manner, with high-leverage, stateless minds. Context Engineering is working for now and absolutely should be a focus for teams on the AI-native journey. ⸻ Question → How are you engineering context into your LLM workflows today? What's your best practice for context management for your AI code assistants? And where does it still break down? #AIEngineering #ContextEngineering #SoftwareDevelopment #DevTools
No more previous content

No more next content
13 Comments
Like Comment
Addy Osmani

Director, Google Cloud AI. Best-selling Author. Speaker. AI, DX, UX. I want to see you win.

264,295 followers 1mo
Report this post
Tip: Stop using /init for AGENTS.md to get better performance My latest deep-dive: https://lnkd.in/gkmZ3HJs ✍ As we transition deeper into AI-assisted engineering and background agent orchestration, there's a common ritual: setting up a new project, running an auto-generation tool, and committing a comprehensive AGENTS.md context file. It feels like the responsible thing to do. But recent 2026 research from ETH Zurich and others reveals a different reality: auto-generated context files can actually reduce task success by 2-3% while inflating costs by over 20%. Why? Because coding agents can already discover your directory structure, tech stack, and module explanations on their own. Handing them an auto-generated summary just adds noise, burns tokens, and dilutes their attention from the actual task. So, what actually earns a line in your AGENTS.md? 1. The undiscoverable: Tooling gotchas that change operations 2. Operational landmines: "The legacy/ directory is deprecated but imported by production - do not delete." 3. Non-obvious conventions: Custom middleware patterns that shouldn't be refactored to standard conventions. If an agent can discover it by reading your code, delete it from the file. A better mental model: Treat your AGENTS.md as a living list of codebase smells you haven't fixed yet. If an agent keeps reaching for the wrong dependency or putting utilities in the wrong folder, don't just add a prose instruction to the context file. Restructure the code, add a linter rule, or improve the test coverage. Treat it as a diagnostic tool, fix the underlying friction, and then delete the line. I dive into the data, the "pink elephant" problem of context anchoring, and why monolithic context files need to evolve into dynamic routing layers in my latest post. How is your team currently managing context for your background coding agents? Are you relying on static files, or exploring more dynamic context loading? Let me know in the comments. #ai #programming #softwareengineering
No more previous content

No more next content
84 Comments
Like Comment
Anthony Alcaraz

GTM Agentic Engineering @AWS | Author of Agentic Graph RAG (O’Reilly) | Business Angel |

46,752 followers 6mo
Report this post
To build effective agents, you need sophisticated context engineering. But to achieve sophisticated context engineering at scale, you need agentic systems managing that context ⁉️ Everyone assumes larger context windows solve the problem. They don't. Transformers have an n² attention problem: every token attends to every other token. As context grows, the model's ability to capture these pairwise relationships gets stretched thin. Why Manual Curation Fails at Scale Consider a real agent workflow: multi-hour codebase migration, complex research synthesis, or financial analysis across hundreds of documents. Your agent generates: → Thousands of tool outputs → Multi-step reasoning chains → Execution traces with success/failure signals → Architectural decisions and dependencies → Domain-specific heuristics discovered through trial-and-error A human cannot process this velocity of information and make real-time decisions about what to compress, persist to memory, or discard. The cognitive load exceeds human reaction time capabilities. The Agentic Context Engineering Solution Research from Stanford's ACE (Agentic Context Engineering) framework proves this approach works in production. They implement a three-agent architecture: Generator: Produces reasoning trajectories and surfaces effective strategies Reflector: Critiques execution traces to extract concrete lessons Curator: Synthesizes updates into structured, itemized contexts Results: 10.6% improvement on agent benchmarks, 8.6% on domain-specific tasks. They matched IBM's production-level system while using smaller open-source models. The Technical Mechanisms That Matter Three core techniques emerged across all research: 1️⃣ Incremental Delta Updates: Instead of rewriting entire contexts (which causes "context collapse"), use structured bullets with metadata. Update only relevant sections. ACE reduced adaptation latency by 87% using this approach. 2️⃣ Just-in-Time Retrieval: Don't pre-load everything. Agents maintain lightweight identifiers (file paths, graph entity IDs) and dynamically load data using tools. Anthropic's Claude Code demonstrates this: it uses commands like head, tail, and grep to analyze large datasets without loading full objects into context. 3️⃣ Grow-and-Refine with De-duplication: Let contexts expand adaptively while using semantic embeddings to prune redundancy. This prevents both information loss and context bloat. GEPA (Genetic-Pareto prompt evolution) demonstrates this with reflective optimization. An agent analyzes execution traces, identifies which context elements were useful or misleading, and autonomously proposes improvements. It achieved 10-19% better performance than reinforcement learning while using 35× fewer rollouts. Knowledge graphs are essentially pre-computed indexes of high-signal relationships. Instead of hoping an LLM extracts relationships from unstructured text in context, you make them explicit and queryable.
No more previous content

No more next content
43 Comments
Like Comment
Mukunda S

Co-Founder @ SuperAGI | Software Development, AI

5,173 followers 4w
Report this post
Most discussions around agentic coding are stuck at code generation That’s the wrong abstraction Code generation is just a stateless transformation: The real system you’re trying to build is: An iterative, stateful, self-correcting system This is the problem we have tried to solve with SuperAGI Code Factory The actual problem is that LLM agents are: stateless non-deterministic prone to regression unaware of runtime behavior But production software requires: state continuity deterministic guarantees regression safety runtime observability So naive agent pipelines fail after the first iteration They can generate code, but they cannot maintain systems Failure modes we observed Context collapsePrompt-based context ≠ system state Agents lose structural understanding of large codebases No global invariant checking Test driftStatic test suites become invalid as code evolves No co-evolution of tests with code No runtime groundingAgents optimize for compilation, not execution Logs, latency, memory, edge-case failures ignored Unclosed feedback loopsErrors are observed but not fed back into generation No convergence, only iteration What we built: a closed-loop agentic SDLC We stopped thinking in terms of “generate code” and instead modeled the system as a continuous control loop: (Codebase State) ↓ [Agent Write] ↓ [Agent Review] ↓ [Agent Test Synthesis] ↓ [Execution + Runtime Signals] ↓ [Error Attribution + Root Cause] ↓ [Agent Patch] ↓ [Regression Evaluation] ↓ (next state) Core system primitives for SuperAGI Code Factory 1. Persistent Codebase Graph Code is represented as a graph (files, functions, dependencies) Agents operate on structured diffs, not raw text blobs Enables locality-aware edits and impact analysis 2. Deterministic Execution Harness Sandboxed environments for every iteration Reproducible runs (same inputs → same outputs) 3. Autonomous Test Generation Tests are generated per change, not static Includes: unit tests (function-level invariants) integration tests (cross-module contracts) 4. Runtime Signal Ingestion Logs, exceptions, traces, metrics Converted into structured signals: { error_type, stack_trace, input, expected_behavior } Not just pass/fail — rich debugging context 5. Error Attribution Engine Maps runtime failures → code regions → agent actions Enables targeted patching instead of blind regeneration 6. Patch Agents (not rewrite agents) Constrained edits Operate on minimal diff surface Reduces regression surface area 7. Regression Evaluation Layer Historical behavior is preserved as invariants Every change evaluated against: previous test corpus behavioral snapshots 8. Multi-Agent Specialization Coder: synthesis Reviewer: static analysis + style + invariants QA: test generation Debugger: root cause + patch Coordination happens via shared state, not prompt chaining We are shipping at 100x speed with SuperAGI Code Factory

1 Comment
Like Comment

Improving Code Generation Using Context Curation

Summary

More in Best Programming Practices for Clean Code

Explore categories