Google AI Edge | Google AI for Developers

Introducing Google AI Edge Portal: Benchmark Edge AI at scale. Sign-up to request access during private preview.

Run LLMs on-device with LiteRT-LM

Production-ready, open-source inference framework designed to deliver high-performance, cross-platform LLM deployments on edge devices.

Overview Get Started

Spotlight

Check out our latest blog to discover how LiteRT-LM supercharges your on-device GenAI deployments, unlocking Gemma 4's full potential with blazing speed and incredible efficiency with newly added Swift, JavaScript, and Flutter APIs.

Blogpost Overview

Why LiteRT-LM?

Cross-platform

Deploy LLMs across Android, iOS, Web, and Desktop.

Hardware accelerated

Maximize performance with GPU and NPU acceleration.

Broad GenAI Capabilities

Support for popular LLMs as well as multi-modality (Vision, Audio) and Tool Use.

Start building

Python

Python APIs with hardware acceleration on Linux, MacOS, Windows, and Raspberry Pi.

Python Guide

Android

Native Android apps and JVM-based desktop tools.

Android Guide

iOS

Native iOS (macOS coming soon) Swift APIs.

Swift Guide

Web

JavaScript and TypeScript APIs for browser-based web apps with WebGPU acceleration.

Web Guide

Flutter

Build cross-platform Flutter apps using the community-maintained flutter_gemma package.

Flutter Guide

C++

x-platform C++ APIs .

C++ Guide

File Builder

Build .litertlm files from converted LiteRT models.

File Builder Guide

Join the Community

LiteRT-LM on GitHub

Contribute to the open-source project, report issues, and see examples.

View on GitHub

Hugging Face

Download pre-converted models (Gemma, Qwen and more), and join the discussion.

View on Hugging Face

Blogs and Announcements

Supercharge Gemma 4 on-device inference with Multi-Token Prediction (MTP)

Experience >2x faster decode speeds on mobile GPUs with zero quality degradation.

Bring state-of-the-art agentic skills to the edge with Gemma 4.

Deploy Gemma 4 in-app and across a broader range of devices with stellar performance and reach using LiteRT-LM.

On-device GenAI in Chrome, Chromebook Plus, and Pixel Watch

Deploy language models on wearables and browser-based platforms using LiteRT-LM at scale.

On-device function calling in Google AI Edge Gallery

Explore how to fine-tune FunctionGemma and enable function calling capabilities powered by LiteRT-LM Tool Use APIs.

Google AI Edge small language models, multimodality, and function calling

Latest insights on RAG, multimodality, and function calling for edge language models.