Role of AST in Software Code Quality

Explore top LinkedIn content from expert professionals.

Summary

The abstract syntax tree (AST) is a structural representation of code that helps analyze its logical flow and detect errors, playing a crucial role in improving software code quality. By breaking code into its fundamental components, ASTs allow both humans and AI tools to review, fix, and maintain programs more accurately.

Spot hidden mistakes: Use AST-based tools to catch subtle bugs and logic errors that might slip through simple syntax checks.
Streamline code reviews: Apply AST parsing to automate code reviews and corrections, making it easier to maintain consistent quality and style across large projects.
Improve maintainability: Rely on AST structure to help organize and trace code, so future changes and debugging become quicker and less error-prone.

Summarized by AI based on LinkedIn member posts

Nicolas Gaudilliere

Vice President - Chief Technology Officer Invent France | Tech, Cloud & Agentic AI for Enterprise Transformation

5,087 followers 1y
Report this post
🤔 How is Google using AI for internal code migrations? In a new paper, Google computer scientists Stoyan Nikolov, Daniele Codecasa, Anna Sjovall, Maxim Tabachnyk, Satish Chandra, Siddharth Taneja, and Celal Ziftci answer the question. At Google, AI-powered code migrations are transforming how engineers maintain and modernize their massive codebase. A recent paper details how Google's product teams leverage large language models (LLMs) for systematic code migrations across diverse use cases. Google's code migrations involved: changing 32-bit IDs in the 500-plus-million-line codebase for Google Ads to 64-bit IDs; converting its old JUnit3 testing library to JUnit4; and replacing the Joda time library with Java's standard java.time package. The approach combines LLMs with abstract syntax tree (AST) techniques to handle complex transformations that would traditionally require hundreds of engineering years. The team developed a workflow where LLMs handle context-aware code changes while AST methods manage precise syntax modifications. Notable successes include migrating 5,359 files from JUnit3 to JUnit4 in just 3 months, with 87% of AI-generated code requiring no human modifications. For the Joda Time to Java Time migration, the solution achieved an 89% reduction in engineering time. The system excels at handling nuanced changes like updating test files and managing dependencies across Google's monorepo. Rather than replacing human engineers, the AI augments their capabilities by generating initial changes that humans then review and refine. This practical approach demonstrates how enterprises can effectively combine AI with existing tools to accelerate large-scale code modernization efforts. Original paper: https://lnkd.in/edeBczgn Adil Hihi Guillaume Renaud Julie Spens #SoftwareEngineering #CodeMigration #LLM #GoogleEngineering #TechInnovation #DeveloperProductivity

3 Comments
Like Comment
Raphaël MANSUY

Data Engineering | DataScience | AI & Innovation | Author | Follow me for deep dives on AI & data-engineering

33,964 followers 1y
Report this post
Grammar in Code LLMs: An Unnecessary Complexity or Hidden Advantage? 👉 The Counterintuitive Finding Do billion-parameter code LLMs "still" need explicit grammar rules? After all, models like DeepSeek-Coder and Qwen2.5 already generate syntactically valid code >99% of the time. But new research reveals a surprising twist: - Syntax correctness ≠ semantic accuracy - Minor token-level changes (e.g., missing parentheses) often lead to critical logic errors - Grammar-based representation helps LLMs detect these subtle differences 👉 How Grammar Rules Work in LLMs Traditional token-based models learn code as sequences of words/symbols. Grammar-based models instead: 1. Parse code into abstract syntax trees (ASTs) 2. Represent programs as sequences of grammar rules + tokens 3. Train models to generate code by "walking" the AST structure This forces models to learn "why" code works rather than just "how" it looks. 👉 The Surprising Results The team trained GrammarCoder (1.3B/1.5B params) by extending DeepSeek-Coder and Qwen2.5 with grammar rules. Key outcomes: - +7% absolute improvement on MBPP (67.3% vs 60.3% Pass@1) - +29% gain on HumanEval (63.4% vs 43.9% Pass@1) - Even token-based models with perfect syntax scored lower Grammar-based models better distinguished between "similar-looking but logically distinct" code patterns. 👉 Why This Matters for Practitioners 1. Code quality – Reduces "careless programmer" errors in generated code 2. Maintainability – Structural awareness creates more predictable outputs 3. Interpretability – AST-based generation offers clearer error tracing The takeaway? Grammar rules aren’t just for catching missing semicolons. They help LLMs develop deeper "structural reasoning" – a critical capability as we push models toward complex software engineering tasks.

1 Comment
Like Comment
Shubham Kothiya

AI Engineer | Prev SDE Intern @Amazon | MSSE @SJSU | Cal Hacks 12.0 Winner | Building & learning AI 1% every day | Python · ML · GenAI · LLMs · AWS

4,442 followers 4mo
Report this post
𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀 - I've Built 𝗖𝗼𝗱𝗲 𝗥𝗲𝘃𝗶𝗲𝘄 𝗔𝘀𝘀𝗶𝘀𝘁𝗮𝗻𝘁: 𝗔𝗻 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝘁 𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗦𝘆𝘀𝘁𝗲𝗺 𝗨𝘀𝗶𝗻𝗴 𝗚𝗼𝗼𝗴𝗹𝗲 𝗔𝗗𝗞, 𝗚𝗲𝗺𝗶𝗻𝗶, 𝗮𝗻𝗱 𝗔𝗦𝗧 𝗣𝗮𝗿𝘀𝗶𝗻𝗴 𝗖𝗼𝗱𝗲 𝗥𝗲𝘃𝗶𝗲𝘄 𝗔𝘀𝘀𝗶𝘀𝘁𝗮𝗻𝘁 acts as an intelligent code mentor connecting AST-based structural analysis, PEP 8 style validation, sandboxed test execution, and iterative self-healing fixes to provide comprehensive Python code review and automated corrections using Google's Agent Development Kit (ADK). 🔗 GitHub Repo: https://lnkd.in/d9f9HZJB 𝗕𝘂𝘁 𝘄𝗵𝘆 𝗖𝗼𝗱𝗲 𝗥𝗲𝘃𝗶𝗲𝘄 𝗔𝘀𝘀𝗶𝘀𝘁𝗮𝗻𝘁: Static Code analysis tools are passive, they find bugs but leave the work to you. I built this to create an active agent that bridges the gap between rigid linters and LLMs. By combining AST precision with Gemini's reasoning, it doesn't just flag issues; it autonomously fixes them in a self-correcting loop, acting as a true "junior engineer" rather than just a tool. 𝗖𝗼𝗿𝗲 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀 • 𝗗𝘂𝗮𝗹-𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: Sequential review workflow + iterative fix loop with validation • 𝗔𝗦𝗧-𝗣𝗼𝘄𝗲𝗿𝗲𝗱 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀: Deep code parsing for functions, classes, complexity metrics • 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗲𝗱 𝗙𝗶𝘅𝗶𝗻𝗴: Self-healing code with up to 3 iterative refinement attempts • 𝗦𝘁𝘆𝗹𝗲 𝗩𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻: PEP 8 compliance checking with weighted scoring system • 𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻: 8 specialized agents working in coordinated pipelines 𝗧𝗲𝗰𝗵 𝗦𝘁𝗮𝗰𝗸 • Google 𝗔𝗗𝗞 (𝗣𝘆𝘁𝗵𝗼𝗻) → Multi-agent orchestration & state management • 𝗚𝗲𝗺𝗶𝗻𝗶 𝟮.𝟱 𝗣𝗿𝗼/𝗙𝗹𝗮𝘀𝗵 → Dual-model strategy for analysis & feedback • 𝗖𝗹𝗼𝘂𝗱 𝗦𝗤𝗟 (𝗣𝗼𝘀𝘁𝗴𝗿𝗲𝗦𝗤𝗟) → Session persistence for Cloud Run deployments • 𝗣𝘆𝗰𝗼𝗱𝗲𝘀𝘁𝘆𝗹𝗲 → PEP 8 validation engine • 𝗣𝘆𝘁𝗵𝗼𝗻 𝗔𝗦𝗧 → Deep structural code analysis 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄: User submits code → Sequential review pipeline analyzes structure, style, and tests → Feedback synthesizer generates comprehensive report → User accepts fix offer → Iterative fix loop attempts corrections up to 3 times → Final synthesizer presents before/after comparison. 𝗞𝗲𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻𝘀: • Async/await with ThreadPoolExecutor for CPU-bound AST parsing operations • Type-safe state management using centralized StateKeys constants • Weighted style scoring algorithm based on PEP 8 violation severity • Conditional loop exit using escalate flag to prevent unnecessary iterations #AgenticAI #GenAI #GoogleCloud #Python #SoftwareEngineering #GeminiAI #MachineLearning #DevOps #AI #CodeQuality #Innovation #TechCommunity #BuildWithAI #LLM
- +2
No more previous content

No more next content
2 Comments
Like Comment

Role of AST in Software Code Quality

Summary

More in Software Engineering Principles

Explore categories