LiteRT-LM is Google's production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices.
π Product Website
- Swift APIs: Natively integrate LiteRT-LM into iOS applications with Metal GPU acceleration. See the Swift Guide.
- Web JavaScript APIs: Run models inside web browsers with high performance via web GPU/CPU. See the JavaScript Guide.
- LiteRT-LM CLI Update: The command-line interface now supports NPU, besides CPU and GPU backends across Linux, macOS, and Windows. See the CLI Guide.
- Community-Maintained Flutter APIs: Build cross-platform Flutter applications using the community flutter_gemma package. See the Flutter Guide.
π Try Gemma4-E4B with MTP on Linux, macOS, Windows or Raspberry Pi with the LiteRT-LM CLI:
litert-lm run \
--from-huggingface-repo=litert-community/gemma-4-E4B-it-litert-lm \
gemma-4-E4B-it.litertlm \
--backend=gpu \
--enable-speculative-decoding=true \
--prompt="What is the capital of France?"- π± Cross-Platform Support: Android, iOS, Web, Desktop, and IoT (e.g. Raspberry Pi).
- π Hardware Acceleration: Peak performance via GPU and NPU accelerators.
- ποΈ Multi-Modality: Support for vision and audio inputs.
- π§ Tool Use: Function calling support for agentic workflows.
- π Broad Model Support: Gemma, Llama, Phi-4, Qwen, and more.
LiteRT-LM powers on-device GenAI experiences in Chrome, Chromebook Plus, Pixel Watch, and more.
You can also try the Google AI Edge Gallery app to run models immediately on your device.
| Install the app today from Google Play | Install the app today from App Store |
|---|---|
![]() |
|
| Link | Description |
|---|---|
| Blazing-fast on-device GenAI with LiteRT-LM | Unlock Gemma 4's full potential with blazing speed and incredible efficiency using newly added Swift, JavaScript, and Flutter APIs. |
| Accelerating Gemma 4: faster inference with multi-token prediction drafters | An overview of how Multi-Token Prediction (MTP) drafters are making Gemma 4 models up to 3x faster at inference. |
| Bring state-of-the-art agentic skills to the edge with Gemma 4 | Deploy Gemma 4 in-app and across a broader range of devices with stellar performance and broad reach using LiteRT-LM. |
| On-device GenAI in Chrome, Chromebook Plus and Pixel Watch | Deploy language models on wearables and browser-based platforms using LiteRT-LM at scale. |
| On-device Function Calling in Google AI Edge Gallery | Explore how to fine-tune FunctionGemma and enable function calling capabilities powered by LiteRT-LM Tool Use APIs. |
| Google AI Edge small language models, multimodality, and function calling | Latest insights on RAG, multimodality, and function calling for edge language models. |
- π Technical Overview including performance benchmarks, model support, and more.
- π LiteRT-LM CLI Guide including installation, getting started, and advanced usage.
Try LiteRT-LM immediately from your terminal without writing a single line of
code using uv:
uv tool install litert-lm
litert-lm run \
--from-huggingface-repo=google/gemma-3n-E2B-it-litert-lm \
gemma-3n-E2B-it-int4 \
--prompt="What is the capital of France?"Ready to get started? Explore our language-specific guides and setup instructions.
| Language | Status | Best For... | Documentation |
|---|---|---|---|
| Python | β Stable | Prototyping & Scripting | Python Guide |
| Kotlin | β Stable | Android apps & JVM | Kotlin Guide |
| Swift | π Early Preview | Native iOS & macOS | Swift Guide |
| JavaScript (web) | π Early Preview | Browser environments | JavaScript Guide |
| Flutter | π Community | Cross-platform mobile | Flutter Guide |
| C++ | β Stable | High-performance native | C++ Guide |
This guide shows how you can compile
LiteRT-LM from source. If you want to build the program from source, you should
checkout the stable
tag.
- v0.12.0: Added early preview of Swift and Web JavaScript APIs, and community Flutter support. Updated LiteRT-LM CLI to have full CPU and GPU backend support across Linux, macOS, and Windows.
- v0.11.0: Support Single Position Multi-token Prediction (MTP) for Gemma 4. Expand LiteRT-LM CLI to run natively on Windows with CPU and GPU backends.
- v0.10.1: Deploy Gemma 4 with stellar performance (blog) and introduce LiteRT-LM CLI.
- v0.9.0: Improvements to function calling capabilities, better app performance stability.
- v0.8.0: Desktop GPU support and Multi-Modality.
- v0.7.0: NPU acceleration for Gemma models.
For a full list of releases, see GitHub Releases.

