VLM Run’s cover photo
VLM Run

VLM Run

Technology, Information and Internet

Palo Alto, CA 3,963 followers

The Unified Gateway for Visual AI - Extract JSON from images, videos, and PDFs with our Vision Language Models.

About us

Unified Gateway for Visual Intelligence.

Website
https://vlm.run
Industry
Technology, Information and Internet
Company size
2-10 employees
Headquarters
Palo Alto, CA
Type
Privately Held

Locations

Employees at VLM Run

Updates

  • Excited to share that mm-ctx is now live on Hugging Face Spaces! Try it in the browser via an interactive terminal without installing anything: https://lnkd.in/g_jhKyk8 mm-ctx – fast, multimodal context for agents. LLM-based agents handle text fine, but as soon as a directory contains images, videos, or PDFs with visual content, they struggle to understand the full context. mm-ctx is meant to feel familiar: the Unix tools we already love (find/cat/grep/wc), rebuilt for file types LLMs can't read natively and designed to work with agents via the CLI. - mm grep "invoice #1234" ~/Downloads searches across PDFs and returns line-numbered matches - mm cat <document>.pdf returns a metadata description of the file - mm cat <photo>.jpg returns a caption of the photo - mm cat <video>.mp4 returns a caption of the video A few things we obsessed over: ⚡ Speed: Rust core for the hot paths 🏠 Local-first, BYO model: Uses any OpenAI-compatible endpoint: Ollama, vLLM/SGLang, LMStudio with any multimodal LLM (Gemma4, Qwen3.5, GLM-4.6V). 🔗 Composable: stdin + structured outputs 🤖 Drops into any agent via mm-cli-skills: Claude Code, Codex, Gemini CLI, OpenClaw. We’d love to hear your feedback! Especially on the CLI and what file types and workflows you would like to see next.

    • No alternative text description for this image
  • VLM Run reposted this

    I made a rock climbing tool using computer vision! I prompted VLM Run’s visual agent Orion to segment all of the blue bouldering holds, and it did a good job! It is interesting that now we can prompt VLMs to segment all of the holds, rather than creating a new dataset from scratch to train a model. With holds detection + pose estimation, I can show how each hold gets activated as a hand or foot uses it. Once we touch the final hold with both hands, the route is completed, and I show the overall path of my torso midpoint. A tool like this could help climbers understand their movement better. I’m still very much a beginner at bouldering, so I could use all the help I can get 🤣 There are definitely things to improve, but overall I’m encouraged by this first demo 🙂 Let me know what you think in the comments! Models used: - VLM Run’s Orion for segmentation - ViTPose+ Huge for pose estimation (via Hugging Face 🤗) - RT-DETR for person detection (via Hugging Face 🤗) Shoutout to Daniel Reiff and his bouldering + computer vision project for the inspiration! #ai #machinelearning #computervision #vlmrun #huggingface #rockclimbing #bouldering

  • Manually parsing handwritten intake forms can be slow and prone to error, while VLM Run's HIPAA-ready API allows you to extract the same details in seconds. In this tutorial by Jeremy Park, PhD, learn how to use VLM Run to extract structured JSON from handwritten healthcare documents at scale. Through this walkthrough, you will learn how to: - Upload documents in the Requests tab and run them against your saved skills - Enable confidence scores and grounding to see exactly where each field came from in the original document - Edit incorrect extractions and provide feedback to improve extraction over time - Run the same workflow programmatically via the VLM Run API as shown in Google Colab

  • Announcing Orion Skills! 🚀 Rather than rewriting prompts every time you want to define a specific task, you can now package all of that knowledge into a reusable skill. Why skills? - Reusable: Create a skill once, reference it from any endpoint (image, document, video, audio, agent) - Versionable: Pin a specific skill version for reproducible results, or use "latest" to always get the newest revision - Composable: Pass multiple skills in a single request, or combine them with custom schemas Unlike purely text-based skills, we have reimagined what skills mean for visual agents and how to codify visual workflows into skills. Try skills in chat today! And check out this skills creation tutorial by Jeremy Park, PhD 👇

  • The AI agent skills conversation focuses mainly on text. But for many tasks, words fall short. We need visual skills: providing images and videos as context, not just text. It's one thing to describe to a robot how to fold a t-shirt. It's another to show it a video. At VLM Run, that's what we're building: visual agents that understand visual data and act on it.

  • View organization page for VLM Run

    3,963 followers

    Healthcare documents come fragmented across PDFs, images, emails, and faxed scans. OCR fails because real-world documents require visual reasoning of layout and context – not just plain text extraction. Scan.com processes high volumes of documents and images where both speed and accuracy matter. They needed automation that could handle the diversity and complexity of healthcare documents. We built it together with Orion. In a single call: • Classifies multi-page document bundles • Extracts data from emails and attachments • Understands checkboxes, handwriting, and layout • Visually verifies for high confidence The result: faster processing, reduced manual QA, reliable structured data. Document automation isn't a text problem. It's a visual reasoning problem. Read more: https://lnkd.in/gd93-xUY

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • We're hiring our first infra engineer (senior/staff) at VLM Run! We're processing tens of millions of VLM requests per month and scaling fast; we're looking for a founding Infrastructure Engineer to serve and operationalize our GPU workloads (custom runtimes on vLLM/Hugging Face transformers, orchestrated with Ray/Modal). The work is technically challenging, the learning curve is steep (in the best way), and you'll be joining a stellar ML team building the go-to visual intelligence platform for enterprises. In-person ONLY. Apply here → https://lnkd.in/gc8GJGVu Tag someone who'd crush this 👇 #hiring #llm #vlm #ai #computervision #infra #k8s

Similar pages

Browse jobs

Funding

VLM Run 1 total round

Last Round

Seed
See more info on crunchbase