Production-ready, open-source inference framework designed to deliver high-performance, cross-platform LLM deployments on edge devices.

Spotlight

Check out our latest blog to discover how LiteRT-LM supercharges your on-device GenAI deployments, unlocking Gemma 4's full potential with blazing speed and incredible efficiency with newly added Swift, JavaScript, and Flutter APIs.

Why LiteRT-LM?

Deploy LLMs across Android, iOS, Web, and Desktop.
Maximize performance with GPU and NPU acceleration.
Support for popular LLMs as well as multi-modality (Vision, Audio) and Tool Use.

Start building

Python APIs with hardware acceleration on Linux, MacOS, Windows, and Raspberry Pi.
Native Android apps and JVM-based desktop tools.
Native iOS (macOS coming soon) Swift APIs.
JavaScript and TypeScript APIs for browser-based web apps with WebGPU acceleration.
Build cross-platform Flutter apps using the community-maintained flutter_gemma package.
x-platform C++ APIs .
Build .litertlm files from converted LiteRT models.

Join the Community

Contribute to the open-source project, report issues, and see examples.
Download pre-converted models (Gemma, Qwen and more), and join the discussion.

Blogs and Announcements

Experience >2x faster decode speeds on mobile GPUs with zero quality degradation.
Deploy Gemma 4 in-app and across a broader range of devices with stellar performance and reach using LiteRT-LM.
Deploy language models on wearables and browser-based platforms using LiteRT-LM at scale.
Explore how to fine-tune FunctionGemma and enable function calling capabilities powered by LiteRT-LM Tool Use APIs.
Latest insights on RAG, multimodality, and function calling for edge language models.