Our co-founder Wangda Tan breaks down how to make LLMs generate reliable charts and visualizations. It’s often treated as easier than SQL or code, but it isn't. Getting it right takes structure, validation, and a lot of iteration.
Tip to generate visualization for your data using LLM: Using LLMs to generate visualizations is a commonly underestimated area (and surprisingly, I haven’t seen many discussions on this). I’ve spoken with many customers recently, and they all ask: how can we get LLMs to produce better visualizations? Some tips, lessons learned: 🚀 (Choose a framework & spec) First, decide where you’ll display these visuals: a Python notebook, a JavaScript app, or static images. Then pick a popular framework with a published specification you can validate. Validating the spec isn’t just about correctness—it also guards against security risks. You don’t want an LLM randomly spitting out unchecked JavaScript that could expose vulnerabilities. 📊 (Syntax correctness) LLMs “know” Plotly, ECharts, Recharts, Vega-Lite, etc., but not every library fits every context. Embedding Plotly in a web app can balloon dependencies; Recharts lacks a formal spec; that often leaves Vega-Lite or ECharts. We chose Vega-Lite for its balance of power and a clear grammar. Even then, you can’t trust an LLM to handle every corner case. E.g. Misusing `timeUnit` can completely destroy an area chart. Always validate: the chart must render, reference the right fields, apply the right aggregations, and avoid conflicting settings. ✨ (Readability) Once it renders, readability is your next frontier. A bar chart with 50 tick labels is like reading street names on a crowded map. For such case, swap axes or show only the top N entries to keep things legible. Watch your labels too: “1,997” for a year or “21,290,189,100” for a total value looks confusing and hard to read. Better to format as “1997” or “21B.” Relying on prompt engineering alone to enforce all these conventions is a losing battle. We’ve tried, and don't do it, it is a painful and frustrating process! 🛠️ (Implementation) To turn these principles into production, we follow this flow: 1. Analyze dataset → survey distributions, min/max, distinct counts 2. Use agents to draft an IR (Intermediate Representation) → a simpler, purposely created spec which include simple flags like `regression_line: true` (rendering regression line + scatter plot on vegalite needs layered chart, which is very hard to get it right) 3. Transpile IR → target Vega-Lite (or ECharts/Metabase) spec 4. Compile & validate → check against published spec for correctness & security 5. Fix & test → catch corner cases before release 6. Ship & monitor → hundreds of test cases each cycle ensure model swaps, new rules, change of IR spec improve, not regress, results It still isn’t perfect, but with a validated spec, an intermediate representation, and relentless testing, it’s far more consistent and reliable than asking an LLM to go it alone.