Name	Name	Last commit message	Last commit date
Latest commit History 1,615 Commits
.github/workflows	.github/workflows
build_config	build_config
c	c
cmake	cmake
cxxbridge_cmd	cxxbridge_cmd
docs	docs
kotlin	kotlin
prebuilt	prebuilt
python	python
runtime	runtime
rust	rust
samples/ios	samples/ios
schema	schema
src	src
swift	swift
tools/test	tools/test
.bazeliskrc	.bazeliskrc
.bazelrc	.bazelrc
.bazelversion	.bazelversion
.gitattributes	.gitattributes
.gitignore	.gitignore
BUILD	BUILD
BUILD.antlr4	BUILD.antlr4
BUILD.directx_shader_compiler	BUILD.directx_shader_compiler
BUILD.llguidance	BUILD.llguidance
BUILD.miniaudio	BUILD.miniaudio
BUILD.minizip	BUILD.minizip
BUILD.minja	BUILD.minja
BUILD.nanobind_json	BUILD.nanobind_json
BUILD.sentencepiece	BUILD.sentencepiece
BUILD.stb	BUILD.stb
BUILD.tokenizers_cpp	BUILD.tokenizers_cpp
CMakeLists.txt	CMakeLists.txt
CMakePresets.json	CMakePresets.json
CONTRIBUTING.md	CONTRIBUTING.md
Cargo.lock	Cargo.lock
Cargo.toml	Cargo.toml
LICENSE	LICENSE
PATCH.llguidance	PATCH.llguidance
PATCH.llguidance_grammar	PATCH.llguidance_grammar
PATCH.llguidance_numeric	PATCH.llguidance_numeric
PATCH.llguidance_parser	PATCH.llguidance_parser
PATCH.llguidance_perf	PATCH.llguidance_perf
PATCH.llguidance_regexvec	PATCH.llguidance_regexvec
PATCH.minja	PATCH.minja
PATCH.nanobind_json	PATCH.nanobind_json
PATCH.rules_rust	PATCH.rules_rust
PATCH.sentencepiece	PATCH.sentencepiece
PATCH.tensorflow	PATCH.tensorflow
PATCH.toktrie	PATCH.toktrie
Package.swift	Package.swift
README.md	README.md
WORKSPACE	WORKSPACE
__init__.py	__init__.py
android_ndk_env.bzl	android_ndk_env.bzl
cargo-bazel-lock.json	cargo-bazel-lock.json
requirements.txt	requirements.txt
rust_cxx_bridge.bzl	rust_cxx_bridge.bzl
version.bzl	version.bzl

Name

Last commit message

Last commit date

1,615 Commits

BUILD.directx_shader_compiler

PATCH.llguidance_grammar

PATCH.llguidance_numeric

PATCH.llguidance_parser

PATCH.llguidance_perf

PATCH.llguidance_regexvec

cargo-bazel-lock.json

requirements.txt

rust_cxx_bridge.bzl

version.bzl

LiteRT-LM

LiteRT-LM is Google's production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices.

🔗 Product Website

🔥 What's New: `v0.12.0`

Swift APIs: Natively integrate LiteRT-LM into iOS applications with Metal GPU acceleration. See the Swift Guide.
Web JavaScript APIs: Run models inside web browsers with high performance via web GPU/CPU. See the JavaScript Guide.
LiteRT-LM CLI Update: The command-line interface now supports NPU, besides CPU and GPU backends across Linux, macOS, and Windows. See the CLI Guide.
Community-Maintained Flutter APIs: Build cross-platform Flutter applications using the community flutter_gemma package. See the Flutter Guide.

👉 Try Gemma4-E4B with MTP on Linux, macOS, Windows or Raspberry Pi with the LiteRT-LM CLI:

litert-lm run  \
   --from-huggingface-repo=litert-community/gemma-4-E4B-it-litert-lm \
   gemma-4-E4B-it.litertlm \
   --backend=gpu \
   --enable-speculative-decoding=true \
   --prompt="What is the capital of France?"

🌟 Key Features

📱 Cross-Platform Support: Android, iOS, Web, Desktop, and IoT (e.g. Raspberry Pi).
🚀 Hardware Acceleration: Peak performance via GPU and NPU accelerators.
👁️ Multi-Modality: Support for vision and audio inputs.
🔧 Tool Use: Function calling support for agentic workflows.
📚 Broad Model Support: Gemma, Llama, Phi-4, Qwen, and more.

🚀 Production-Ready for Google's Products

LiteRT-LM powers on-device GenAI experiences in Chrome, Chromebook Plus, Pixel Watch, and more.

You can also try the Google AI Edge Gallery app to run models immediately on your device.

Install the app today from Google Play	Install the app today from App Store

📰 Blogs & Announcements

Link	Description
Bring state-of-the-art agentic skills to the edge with Gemma 4	Deploy Gemma 4 in-app and across a broader range of devices with stellar performance and broad reach using LiteRT-LM.
On-device GenAI in Chrome, Chromebook Plus and Pixel Watch	Deploy language models on wearables and browser-based platforms using LiteRT-LM at scale.
On-device Function Calling in Google AI Edge Gallery	Explore how to fine-tune FunctionGemma and enable function calling capabilities powered by LiteRT-LM Tool Use APIs.
Google AI Edge small language models, multimodality, and function calling	Latest insights on RAG, multimodality, and function calling for edge language models.

🏃 Quick Start

🔗 Key Links

👉 Technical Overview including performance benchmarks, model support, and more.
👉 LiteRT-LM CLI Guide including installation, getting started, and advanced usage.

⚡ Quick Try (No Code)

Try LiteRT-LM immediately from your terminal without writing a single line of code using uv:

uv tool install litert-lm

litert-lm run \
  --from-huggingface-repo=google/gemma-3n-E2B-it-litert-lm \
  gemma-3n-E2B-it-int4 \
  --prompt="What is the capital of France?"

📚 Supported Language APIs

Ready to get started? Explore our language-specific guides and setup instructions.

Language	Status	Best For...	Documentation
Python	✅ Stable	Prototyping & Scripting	Python Guide
Kotlin	✅ Stable	Android apps & JVM	Kotlin Guide
Swift	🚀 Early Preview	Native iOS & macOS	Swift Guide
JavaScript (web)	🚀 Early Preview	Browser environments	JavaScript Guide
Flutter	🚀 Community	Cross-platform mobile	Flutter Guide
C++	✅ Stable	High-performance native	C++ Guide

🏗️ Build From Source

This guide shows how you can compile LiteRT-LM from source. If you want to build the program from source, you should checkout the stable tag.

📦 Releases

v0.12.0: Added early preview of Swift and Web JavaScript APIs, and community Flutter support. Updated LiteRT-LM CLI to have full CPU and GPU backend support across Linux, macOS, and Windows.
v0.11.0: Support Single Position Multi-token Prediction (MTP) for Gemma 4. Expand LiteRT-LM CLI to run natively on Windows with CPU and GPU backends.
v0.10.1: Deploy Gemma 4 with stellar performance (blog) and introduce LiteRT-LM CLI.
v0.9.0: Improvements to function calling capabilities, better app performance stability.
v0.8.0: Desktop GPU support and Multi-Modality.
v0.7.0: NPU acceleration for Gemma models.

For a full list of releases, see GitHub Releases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiteRT-LM

🔥 What's New: `v0.12.0`

🌟 Key Features

🚀 Production-Ready for Google's Products

📰 Blogs & Announcements

🏃 Quick Start

🔗 Key Links

⚡ Quick Try (No Code)

📚 Supported Language APIs

🏗️ Build From Source

📦 Releases

About

Uh oh!

Releases 21

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LiteRT-LM

🔥 What's New: v0.12.0

🌟 Key Features

🚀 Production-Ready for Google's Products

📰 Blogs & Announcements

🏃 Quick Start

🔗 Key Links

⚡ Quick Try (No Code)

📚 Supported Language APIs

🏗️ Build From Source

📦 Releases

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

🔥 What's New: `v0.12.0`

Packages