# Kip — Voice Assistant **Codename:** Kip **Purpose:** Alexa replacement for D J's girlfriend **Architecture:** Steam Deck (thin client) ↔ Proxmox LXC (brains) --- ## Overview Kip is a privacy-first voice assistant. The Steam Deck acts as a dumb terminal (mic, speaker, screen). All intelligence runs on an LXC container on Proxmox. ## Hardware ### Steam Deck (Client) - Always-on, propped up in kitchen/living room on charging dock - Runs: wake word detection, audio capture, audio playback, display UI - Connects to LXC over local WiFi ### LXC Container (Server) - **OS:** Ubuntu 22.04 or 24.04 - **RAM:** 4GB recommended (Whisper needs ~1.5GB, Piper ~200MB, OpenClaw ~500MB) - **Disk:** 10GB (models + data) - **CPU:** 2-4 cores (Whisper STT is CPU-bound) - **Network:** Static IP on LAN, accessible from Steam Deck ## Software Stack ### LXC Container | Component | Purpose | Tool | |-----------|---------|------| | STT | Speech-to-text | Faster Whisper (base.en model) | | TTS | Text-to-speech | Piper (en_US voice) | | Agent | Intelligence | OpenClaw with Kip agent | | API | Communication | FastAPI HTTP server | | Data | Grocery list, timers | JSON files + SQLite | ### Steam Deck | Component | Purpose | Tool | |-----------|---------|------| | Wake word | "Hey Kip" detection | OpenWakeWord | | Audio capture | Record after wake | PyAudio / sounddevice | | Audio playback | Play TTS responses | PyAudio / sounddevice | | UI | Display info | Web browser (fullscreen PWA) or PyQt | | Client | Talk to LXC | Python HTTP client | --- ## LXC Setup Checklist D J creates the LXC with: - [ ] Ubuntu 22.04 or 24.04 template - [ ] 4GB RAM, 2-4 CPU cores, 10GB disk - [ ] Static IP on LAN (e.g., 192.168.86.XX) - [ ] SSH access enabled (key-based) - [ ] Audio passthrough NOT needed (LXC doesn't play audio — Deck does) - [ ] Internet access (for OpenClaw, model downloads) - [ ] Hostname: `kip` (optional, nice to have) Once created, give Case: - IP address - SSH credentials or key --- ## API Design (LXC ↔ Steam Deck) ### POST /listen Steam Deck sends audio, gets back text response + TTS audio. ``` Request: Content-Type: multipart/form-data Body: audio file (WAV, 16kHz mono) Response: { "text": "user said this", "response": "Kip says this", "audio": "", "ui_update": { "type": "grocery_list", "data": ["eggs", "milk", "bread"] } } ``` ### GET /status Health check + current state (timers, lists, etc.) ### GET /grocery Returns current grocery list (for phone web view) ### POST /grocery Add/remove items (for phone web view) ### GET /ui Returns current display state for the Deck's screen. --- ## Kip Agent (OpenClaw) Kip gets its own OpenClaw agent with: ### SOUL.md (Personality) - Name: Kip - Friendly, concise, warm - Optimized for voice — short responses, no markdown - Knows the household (D J, girlfriend, 4 cats) - Designed for non-technical user ### Capabilities - **Weather:** "Hey Kip, what's the weather?" → Nashville forecast - **Timers:** "Hey Kip, set a timer for 15 minutes" → countdown with alarm - **Grocery list:** "Hey Kip, add eggs to the list" → persistent list - **Grocery check:** "Hey Kip, what's on the grocery list?" → reads it back - **General Q&A:** "Hey Kip, how long do I bake chicken at 400?" → answer - **Time/Date:** "Hey Kip, what time is it?" ### Future Capabilities - Calendar integration - Music control (Spotify) - Smart home (if they get devices) - Recipe lookup - Kroger API for prices/ordering --- ## Phone Web View Simple responsive web page served by the LXC: - Shows grocery list - Can add/remove items by tapping - Accessible at `http://kip.local:8080` or `http://192.168.86.XX:8080` - Girlfriend can open it on her iPhone in the store --- ## Build Phases ### Phase 1: Voice Loop (MVP) - [ ] LXC setup + dependencies installed - [ ] Faster Whisper running (base.en model) - [ ] Piper TTS running (pick a good voice) - [ ] FastAPI server handling /listen endpoint - [ ] Steam Deck: wake word + record + send + play response - [ ] Test: "Hey Kip, hello" → Kip responds with voice - **Goal: End-to-end voice working** ### Phase 2: Grocery List + Weather - [ ] Grocery list CRUD (voice + API) - [ ] Weather skill (Nashville) - [ ] Timer system with alarm sounds - [ ] Phone web view for grocery list - [ ] Steam Deck display: clock + weather + active timers - **Goal: Actually useful in the kitchen** ### Phase 3: OpenClaw Integration - [ ] Kip agent running on OpenClaw - [ ] General Q&A via Claude/Qwen - [ ] Smarter conversations (context, follow-ups) - [ ] Cost optimization: simple commands (timer, list) handled locally, only complex Q&A hits Claude - **Goal: Smart assistant, not just a voice command box** ### Phase 4: Polish - [ ] Custom wake word model trained on "Hey Kip" - [ ] Better TTS voice selection - [ ] Deck UI polish (nice weather widget, timer display, list view) - [ ] Ambient mode (clock/weather when idle) - [ ] Multiple room support (add Pi later) - **Goal: Girlfriend actually wants to use it daily** --- ## Cost | Item | Cost | |------|------| | Steam Deck | Already owned | | LXC container | Free (Proxmox) | | OpenWakeWord | Free (open source) | | Faster Whisper | Free (open source) | | Piper TTS | Free (open source) | | OpenClaw | Already running | | Claude API for Q&A | Covered by existing subscription | | **Total** | **$0** | --- ## File Structure ``` projects/kip/ ├── PROJECT.md # This file ├── server/ # LXC-side code │ ├── main.py # FastAPI server │ ├── stt.py # Whisper STT wrapper │ ├── tts.py # Piper TTS wrapper │ ├── skills/ # Timer, grocery, weather handlers │ ├── data/ # Grocery lists, state │ └── requirements.txt ├── client/ # Steam Deck code │ ├── kip_client.py # Main client app │ ├── wake_word.py # OpenWakeWord listener │ ├── audio.py # Record/playback │ ├── ui/ # Display UI │ └── requirements.txt └── agent/ # Kip's OpenClaw agent config ├── SOUL.md └── config.yaml ``` --- ## Notes - All voice processing (STT/TTS) on LXC, not Deck — keeps client thin - Wake word is the ONLY thing that runs on Deck locally - Grocery list syncs to a web page for phone access - Simple commands (timer, list) should be handled WITHOUT hitting Claude to save tokens - Only complex Q&A ("how long to bake chicken?") routes through OpenClaw/Claude - Qwen on Ollama (192.168.86.137) as fallback for simple Q&A