6.5 KiB
6.5 KiB
Kip — Voice Assistant
Codename: Kip Purpose: Alexa replacement for D J's girlfriend Architecture: Steam Deck (thin client) ↔ Proxmox LXC (brains)
Overview
Kip is a privacy-first voice assistant. The Steam Deck acts as a dumb terminal (mic, speaker, screen). All intelligence runs on an LXC container on Proxmox.
Hardware
Steam Deck (Client)
- Always-on, propped up in kitchen/living room on charging dock
- Runs: wake word detection, audio capture, audio playback, display UI
- Connects to LXC over local WiFi
LXC Container (Server)
- OS: Ubuntu 22.04 or 24.04
- RAM: 4GB recommended (Whisper needs ~1.5GB, Piper ~200MB, OpenClaw ~500MB)
- Disk: 10GB (models + data)
- CPU: 2-4 cores (Whisper STT is CPU-bound)
- Network: Static IP on LAN, accessible from Steam Deck
Software Stack
LXC Container
| Component | Purpose | Tool |
|---|---|---|
| STT | Speech-to-text | Faster Whisper (base.en model) |
| TTS | Text-to-speech | Piper (en_US voice) |
| Agent | Intelligence | OpenClaw with Kip agent |
| API | Communication | FastAPI HTTP server |
| Data | Grocery list, timers | JSON files + SQLite |
Steam Deck
| Component | Purpose | Tool |
|---|---|---|
| Wake word | "Hey Kip" detection | OpenWakeWord |
| Audio capture | Record after wake | PyAudio / sounddevice |
| Audio playback | Play TTS responses | PyAudio / sounddevice |
| UI | Display info | Web browser (fullscreen PWA) or PyQt |
| Client | Talk to LXC | Python HTTP client |
LXC Setup Checklist
D J creates the LXC with:
- Ubuntu 22.04 or 24.04 template
- 4GB RAM, 2-4 CPU cores, 10GB disk
- Static IP on LAN (e.g., 192.168.86.XX)
- SSH access enabled (key-based)
- Audio passthrough NOT needed (LXC doesn't play audio — Deck does)
- Internet access (for OpenClaw, model downloads)
- Hostname:
kip(optional, nice to have)
Once created, give Case:
- IP address
- SSH credentials or key
API Design (LXC ↔ Steam Deck)
POST /listen
Steam Deck sends audio, gets back text response + TTS audio.
Request:
Content-Type: multipart/form-data
Body: audio file (WAV, 16kHz mono)
Response:
{
"text": "user said this",
"response": "Kip says this",
"audio": "<base64 WAV of TTS response>",
"ui_update": {
"type": "grocery_list",
"data": ["eggs", "milk", "bread"]
}
}
GET /status
Health check + current state (timers, lists, etc.)
GET /grocery
Returns current grocery list (for phone web view)
POST /grocery
Add/remove items (for phone web view)
GET /ui
Returns current display state for the Deck's screen.
Kip Agent (OpenClaw)
Kip gets its own OpenClaw agent with:
SOUL.md (Personality)
- Name: Kip
- Friendly, concise, warm
- Optimized for voice — short responses, no markdown
- Knows the household (D J, girlfriend, 4 cats)
- Designed for non-technical user
Capabilities
- Weather: "Hey Kip, what's the weather?" → Nashville forecast
- Timers: "Hey Kip, set a timer for 15 minutes" → countdown with alarm
- Grocery list: "Hey Kip, add eggs to the list" → persistent list
- Grocery check: "Hey Kip, what's on the grocery list?" → reads it back
- General Q&A: "Hey Kip, how long do I bake chicken at 400?" → answer
- Time/Date: "Hey Kip, what time is it?"
Future Capabilities
- Calendar integration
- Music control (Spotify)
- Smart home (if they get devices)
- Recipe lookup
- Kroger API for prices/ordering
Phone Web View
Simple responsive web page served by the LXC:
- Shows grocery list
- Can add/remove items by tapping
- Accessible at
http://kip.local:8080orhttp://192.168.86.XX:8080 - Girlfriend can open it on her iPhone in the store
Build Phases
Phase 1: Voice Loop (MVP)
- LXC setup + dependencies installed
- Faster Whisper running (base.en model)
- Piper TTS running (pick a good voice)
- FastAPI server handling /listen endpoint
- Steam Deck: wake word + record + send + play response
- Test: "Hey Kip, hello" → Kip responds with voice
- Goal: End-to-end voice working
Phase 2: Grocery List + Weather
- Grocery list CRUD (voice + API)
- Weather skill (Nashville)
- Timer system with alarm sounds
- Phone web view for grocery list
- Steam Deck display: clock + weather + active timers
- Goal: Actually useful in the kitchen
Phase 3: OpenClaw Integration
- Kip agent running on OpenClaw
- General Q&A via Claude/Qwen
- Smarter conversations (context, follow-ups)
- Cost optimization: simple commands (timer, list) handled locally, only complex Q&A hits Claude
- Goal: Smart assistant, not just a voice command box
Phase 4: Polish
- Custom wake word model trained on "Hey Kip"
- Better TTS voice selection
- Deck UI polish (nice weather widget, timer display, list view)
- Ambient mode (clock/weather when idle)
- Multiple room support (add Pi later)
- Goal: Girlfriend actually wants to use it daily
Cost
| Item | Cost |
|---|---|
| Steam Deck | Already owned |
| LXC container | Free (Proxmox) |
| OpenWakeWord | Free (open source) |
| Faster Whisper | Free (open source) |
| Piper TTS | Free (open source) |
| OpenClaw | Already running |
| Claude API for Q&A | Covered by existing subscription |
| Total | $0 |
File Structure
projects/kip/
├── PROJECT.md # This file
├── server/ # LXC-side code
│ ├── main.py # FastAPI server
│ ├── stt.py # Whisper STT wrapper
│ ├── tts.py # Piper TTS wrapper
│ ├── skills/ # Timer, grocery, weather handlers
│ ├── data/ # Grocery lists, state
│ └── requirements.txt
├── client/ # Steam Deck code
│ ├── kip_client.py # Main client app
│ ├── wake_word.py # OpenWakeWord listener
│ ├── audio.py # Record/playback
│ ├── ui/ # Display UI
│ └── requirements.txt
└── agent/ # Kip's OpenClaw agent config
├── SOUL.md
└── config.yaml
Notes
- All voice processing (STT/TTS) on LXC, not Deck — keeps client thin
- Wake word is the ONLY thing that runs on Deck locally
- Grocery list syncs to a web page for phone access
- Simple commands (timer, list) should be handled WITHOUT hitting Claude to save tokens
- Only complex Q&A ("how long to bake chicken?") routes through OpenClaw/Claude
- Qwen on Ollama (192.168.86.137) as fallback for simple Q&A