Excited to share that mm-ctx is now live on Hugging Face Spaces! Try it in the browser via an interactive terminal without installing anything: https://lnkd.in/g_jhKyk8 mm-ctx – fast, multimodal context for agents. LLM-based agents handle text fine, but as soon as a directory contains images, videos, or PDFs with visual content, they struggle to understand the full context. mm-ctx is meant to feel familiar: the Unix tools we already love (find/cat/grep/wc), rebuilt for file types LLMs can't read natively and designed to work with agents via the CLI. - mm grep "invoice #1234" ~/Downloads searches across PDFs and returns line-numbered matches - mm cat <document>.pdf returns a metadata description of the file - mm cat <photo>.jpg returns a caption of the photo - mm cat <video>.mp4 returns a caption of the video A few things we obsessed over: ⚡ Speed: Rust core for the hot paths 🏠 Local-first, BYO model: Uses any OpenAI-compatible endpoint: Ollama, vLLM/SGLang, LMStudio with any multimodal LLM (Gemma4, Qwen3.5, GLM-4.6V). 🔗 Composable: stdin + structured outputs 🤖 Drops into any agent via mm-cli-skills: Claude Code, Codex, Gemini CLI, OpenClaw. We’d love to hear your feedback! Especially on the CLI and what file types and workflows you would like to see next.
VLM Run
Technology, Information and Internet
Palo Alto, CA 3,963 followers
The Unified Gateway for Visual AI - Extract JSON from images, videos, and PDFs with our Vision Language Models.
About us
Unified Gateway for Visual Intelligence.
- Website
-
https://vlm.run
External link for VLM Run
- Industry
- Technology, Information and Internet
- Company size
- 2-10 employees
- Headquarters
- Palo Alto, CA
- Type
- Privately Held
Locations
-
Primary
Get directions
Palo Alto, CA 94301, US
-
Get directions
2445 Augustine Dr
Spaces - Santa Clara, Suite 103
Santa Clara, CA 95054, US
Employees at VLM Run
Updates
-
VLM Run reposted this
I made a rock climbing tool using computer vision! I prompted VLM Run’s visual agent Orion to segment all of the blue bouldering holds, and it did a good job! It is interesting that now we can prompt VLMs to segment all of the holds, rather than creating a new dataset from scratch to train a model. With holds detection + pose estimation, I can show how each hold gets activated as a hand or foot uses it. Once we touch the final hold with both hands, the route is completed, and I show the overall path of my torso midpoint. A tool like this could help climbers understand their movement better. I’m still very much a beginner at bouldering, so I could use all the help I can get 🤣 There are definitely things to improve, but overall I’m encouraged by this first demo 🙂 Let me know what you think in the comments! Models used: - VLM Run’s Orion for segmentation - ViTPose+ Huge for pose estimation (via Hugging Face 🤗) - RT-DETR for person detection (via Hugging Face 🤗) Shoutout to Daniel Reiff and his bouldering + computer vision project for the inspiration! #ai #machinelearning #computervision #vlmrun #huggingface #rockclimbing #bouldering
-
Manually parsing handwritten intake forms can be slow and prone to error, while VLM Run's HIPAA-ready API allows you to extract the same details in seconds. In this tutorial by Jeremy Park, PhD, learn how to use VLM Run to extract structured JSON from handwritten healthcare documents at scale. Through this walkthrough, you will learn how to: - Upload documents in the Requests tab and run them against your saved skills - Enable confidence scores and grounding to see exactly where each field came from in the original document - Edit incorrect extractions and provide feedback to improve extraction over time - Run the same workflow programmatically via the VLM Run API as shown in Google Colab
-
Announcing Orion Skills! 🚀 Rather than rewriting prompts every time you want to define a specific task, you can now package all of that knowledge into a reusable skill. Why skills? - Reusable: Create a skill once, reference it from any endpoint (image, document, video, audio, agent) - Versionable: Pin a specific skill version for reproducible results, or use "latest" to always get the newest revision - Composable: Pass multiple skills in a single request, or combine them with custom schemas Unlike purely text-based skills, we have reimagined what skills mean for visual agents and how to codify visual workflows into skills. Try skills in chat today! And check out this skills creation tutorial by Jeremy Park, PhD 👇
-
The AI agent skills conversation focuses mainly on text. But for many tasks, words fall short. We need visual skills: providing images and videos as context, not just text. It's one thing to describe to a robot how to fold a t-shirt. It's another to show it a video. At VLM Run, that's what we're building: visual agents that understand visual data and act on it.
-
Visual AI agents for identifying the most delicious blueberries?! 🫐 Jeremy Park, PhD recently shared his PhD research on computer vision for blueberries and new results using VLM Run's visual agent Orion for segmentation, detection, and metadata tagging. Link in the comments! #agtech #visualanalytics #computervision #blueberry
-
What if visual AI agents could help give feedback on exercise? Jeremy Park, PhD recently reviewed our Orion visual AI agent for providing deadlift feedback. He raises the question: what if visual intelligence could be made accessible for applications in exercise, all through a chat interface? Read the Substack blog here: https://lnkd.in/gx_krdsM
-
Healthcare documents come fragmented across PDFs, images, emails, and faxed scans. OCR fails because real-world documents require visual reasoning of layout and context – not just plain text extraction. Scan.com processes high volumes of documents and images where both speed and accuracy matter. They needed automation that could handle the diversity and complexity of healthcare documents. We built it together with Orion. In a single call: • Classifies multi-page document bundles • Extracts data from emails and attachments • Understands checkboxes, handwriting, and layout • Visually verifies for high confidence The result: faster processing, reduced manual QA, reliable structured data. Document automation isn't a text problem. It's a visual reasoning problem. Read more: https://lnkd.in/gd93-xUY
-
-
We're hiring our first infra engineer (senior/staff) at VLM Run! We're processing tens of millions of VLM requests per month and scaling fast; we're looking for a founding Infrastructure Engineer to serve and operationalize our GPU workloads (custom runtimes on vLLM/Hugging Face transformers, orchestrated with Ray/Modal). The work is technically challenging, the learning curve is steep (in the best way), and you'll be joining a stellar ML team building the go-to visual intelligence platform for enterprises. In-person ONLY. Apply here → https://lnkd.in/gc8GJGVu Tag someone who'd crush this 👇 #hiring #llm #vlm #ai #computervision #infra #k8s
-
🏈 Super Bowl hits different when it's less than a mile from your office. We couldn’t help but generate this 4K poster with chat.vlm.run combining two star quarterbacks into a single face-off. You’ll likely see it while walking to the stadium. Wishing both teams the best of luck tonight. #SuperBowl #SBLX #VLMRun #SeattleSeahawks #NewEnglandPatriots VLM Run
-