A real-time computer vision system that eliminates the physical drone controller entirely. Hand gestures are detected through a live camera feed, classified by a dual rule-based and ML recognition engine, and translated directly into drone flight commands — with safety fail-safes built in.
The system is structured across six integrated tiers, from raw hardware input through to deployed application and CI/CD automation.
┌─────────────────────────────────────────────────────────────────────┐
│ Tier 1 · Hardware Input │
│ OpenCV · xFly SDK · DJI Tello SDK · AirSim │
├─────────────────────────────────────────────────────────────────────┤
│ Tier 2 · CV & Gesture Pipeline [ Core Intelligence ] │
│ MediaPipe Hands · NumPy · TFLite · Asyncio │
├─────────────────────────────────────────────────────────────────────┤
│ Tier 3 · Backend API [ Orchestration ] │
│ FastAPI · SQLite · Pytest │
├─────────────────────────────────────────────────────────────────────┤
│ Tier 4 · Frontend Dashboard [ Command & Control ] │
│ React + TypeScript · Recharts · Jest │
├─────────────────────────────────────────────────────────────────────┤
│ Tier 5 · Application Deployment │
│ Electron · PWA / Capacitor · Docker │
├─────────────────────────────────────────────────────────────────────┤
│ Tier 6 · DevOps & Docs │
│ Git · GitHub Actions · SwaggerDocs · Overleaf │
└─────────────────────────────────────────────────────────────────────┘
gesture-drone-control/
├── apps/
│ ├── backend/ # FastAPI backend (REST + WebSocket)
│ │ ├── app/
│ │ │ ├── api/ # Route handlers & WebSocket gateway
│ │ │ ├── core/ # Config, startup, app factory
│ │ │ └── dependencies/ # Dependency injection (auth, db, etc.)
│ │ └── tests/ # Pytest unit & integration tests
│ ├── desktop/ # Electron wrapper (Win/Linux/macOS)
│ ├── frontend/ # React + TypeScript dashboard
│ │ ├── public/ # Static assets (favicon, icons)
│ │ ├── src/ # Components, pages, hooks
│ │ └── tests/ # Playwright E2E tests
│ └── mobile/ # Capacitor shell (iOS + Android)
│ ├── android/
│ └── ios/
│
├── services/ # Core Python service layer
│ ├── cv_pipeline/ # Computer vision & gesture recognition
│ │ ├── camera/ # Camera feed capture (OpenCV)
│ │ ├── gestures/ # Gesture engine
│ │ │ └── recognizers/ # Rule-based & ML recognizers
│ │ ├── hand-detection/ # MediaPipe landmark detection
│ │ └── processing/ # Async queue & pipeline
│ ├── drone_control/ # Flight controller
│ │ └── adapters/ # AirSim, Gazebo, xFly adapters
│ ├── input/ # Input source abstraction
│ │ └── sources/ # Gesture, keyboard & generic adapters
│ ├── commands/ # Command model & dispatch logic
│ ├── telemetry/ # Observer, manager & storage
│ │ └── storage/ # SQLite & PostgreSQL repos
│ └── tests/
│ └── cv_pipeline_testing/
│
├── packages/ # Shared code across apps & services
│ ├── contracts/
│ │ ├── python/ # Pydantic schemas
│ │ └── typescript/ # Shared type definitions
│ ├── domain/ # Domain models
│ └── utils/ # Shared utility helpers
│
├── infrastructure/
│ ├── docker/ # Per-service Dockerfiles
│ │ └── airsim/
│ └── scripts/ # Setup & utility scripts
│
├── docs/ # All project documentation
│ ├── api/
│ ├── assets/
│ │ ├── Sequence Diagrams/
│ │ ├── UC Diagrams/
│ │ └── UI/
│ ├── demo/
│ ├── diagrams/
│ ├── reports/
│ └── testing/
│
├── sandbox/ # Manual testing & throwaway scripts
├── tests/
│ └── integration/
├── docker-compose.yml
├── makefile
└── README.md
▸ Computer Vision & ML Backend
| Technology |
Purpose |
Version |
 |
Primary language — CV, ML, drone SDK |
3.11.x |
 |
Live camera feed & image preprocessing |
4.8+ |
 |
Real-time hand landmark detection & tracking |
0.10+ |
 |
Vector math — angles, distances, gesture ID |
1.26+ |
 |
ML-based gesture recognition (secondary) |
2.x |
| Rule-Based Engine |
Deterministic, zero-latency gesture mapping |
Custom |
| Asyncio + Bounded Queue |
Non-blocking real-time pipeline |
Built-in |
▸ Drone Integration
| Technology |
Purpose |
Version |
| xFly SDK |
Primary physical drone communication |
Latest |
| DJI Tello SDK |
Alternative drone hardware interface |
Latest |
| AirSim |
Primary simulation environment (lightweight) |
Latest |
| Gazebo + ArduPilot |
High-fidelity physics simulation |
4.x |
▸ Backend & Database
| Technology |
Purpose |
Version |
 |
REST + WebSocket API (ASGI) |
0.110+ |
 |
Gesture logs, telemetry, command history |
3.x |
 |
Scale-out alternative database |
15+ |
 |
Backend unit, integration & API testing |
Latest |
▸ Frontend
| Technology |
Purpose |
Version |
 |
Component-based UI framework |
18 |
 |
Static typing across the frontend |
5.x |
| Recharts |
Real-time telemetry & data visualisation |
2.x |
 |
Frontend unit & component testing |
Latest |
| Playwright |
End-to-end browser testing |
Latest |
▸ Deployment
| Technology |
Purpose |
Version |
 |
Native desktop packaging (Win/Linux/macOS) |
Latest |
| PWA + Capacitor |
Cross-platform mobile delivery (iOS/Android) |
Latest |
 |
Containerisation — consistent environments |
Latest |
▸ DevOps & Documentation
| Technology |
Purpose |
 |
CI/CD — automated testing, linting, quality gates |
 |
Version control — trunk-based, main / dev / feature/* |
| SwaggerDocs |
Auto-generated interactive API documentation |
| Overleaf |
Formal written documentation (LaTeX) |
| # |
Use Case |
Status |
| UC-01 |
User Registration & Login |
◆ Core |
| UC-02 |
Live Hand Gesture Detection & Tracking |
◆ Core |
| UC-03 |
Gesture → Drone Command Mapping |
◆ Core |
| UC-04 |
Real-Time Dashboard (telemetry, status, feed) |
◆ Core |
| UC-05 |
Safety Logic (hover on tracking loss, emergency stop) |
◆ Core |
| UC-06 |
Manual Override (gesture → keyboard/controller) |
◇ Optional |
| UC-07 |
Gesture Recording & Automated Playback |
◇ Optional |
| UC-08 |
Idle Detection Auto-Land |
◇ Optional |
| UC-09 |
Gesture-Activated Follow Mode |
◇ Optional |
Prerequisites

Clone & install
git clone https://github.com/codex-merchants/gesture-drone-control.git
cd gesture-drone-control
# Backend
cd apps/backend && pip install -r requirements.txt
# Services layer
cd ../../services && pip install -e .
# Frontend
cd ../apps/frontend && yarn install
Run locally
# Terminal A — backend API
cd apps/backend && uvicorn app.main:app --reload
# Terminal B — frontend dashboard
cd apps/frontend && yarn dev
Or with Docker (recommended)
docker-compose up --build
Run tests
# Backend (from apps/backend)
pytest
# Services layer (from services/)
pytest
# Frontend (from apps/frontend)
yarn test # Jest unit tests
yarn playwright # E2E tests
Trunk-based development. The main branch must always be in a deployable state — all merges happen before each demo.
| Branch |
Purpose |
main |
Production. Protected. Merged into before every demo. |
dev |
Integration. Completed features land here before main. |
feature/<name> |
Short-lived. One branch per feature or fix, branched from dev. |
| Tool |
Badge |
| Build (GitHub Actions) |
 |
| Code Coverage (Coveralls) |
 |
| Issues |
 |
| Uptime |
 |
| Constraint |
Detail |
| ▸ Single-User Operation |
One operator tracked at a time — reduces tracking complexity |
| ▸ Indoor Only |
Designed and tested for indoor, controlled-lighting environments |
| ▸ Fixed Gesture Set |
Predefined commands only (take-off, land, move, hover) |
COS 301 Software Engineering · University of Pretoria · 2026 · EPI-USE