Kip voice assistant project plan

This commit is contained in:
2026-02-09 17:49:24 -06:00
parent 6592590dac
commit f8e83da59e
4 changed files with 5076 additions and 1 deletions

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,5 @@
{
"last_check": "2026-02-09T23:25:59.773254+00:00",
"last_check": "2026-02-09T23:44:00.012836+00:00",
"total_tracked": 3100,
"new_this_check": 0
}

224
projects/kip/PROJECT.md Normal file
View File

@ -0,0 +1,224 @@
# Kip — Voice Assistant
**Codename:** Kip
**Purpose:** Alexa replacement for D J's girlfriend
**Architecture:** Steam Deck (thin client) ↔ Proxmox LXC (brains)
---
## Overview
Kip is a privacy-first voice assistant. The Steam Deck acts as a dumb terminal (mic, speaker, screen). All intelligence runs on an LXC container on Proxmox.
## Hardware
### Steam Deck (Client)
- Always-on, propped up in kitchen/living room on charging dock
- Runs: wake word detection, audio capture, audio playback, display UI
- Connects to LXC over local WiFi
### LXC Container (Server)
- **OS:** Ubuntu 22.04 or 24.04
- **RAM:** 4GB recommended (Whisper needs ~1.5GB, Piper ~200MB, OpenClaw ~500MB)
- **Disk:** 10GB (models + data)
- **CPU:** 2-4 cores (Whisper STT is CPU-bound)
- **Network:** Static IP on LAN, accessible from Steam Deck
## Software Stack
### LXC Container
| Component | Purpose | Tool |
|-----------|---------|------|
| STT | Speech-to-text | Faster Whisper (base.en model) |
| TTS | Text-to-speech | Piper (en_US voice) |
| Agent | Intelligence | OpenClaw with Kip agent |
| API | Communication | FastAPI HTTP server |
| Data | Grocery list, timers | JSON files + SQLite |
### Steam Deck
| Component | Purpose | Tool |
|-----------|---------|------|
| Wake word | "Hey Kip" detection | OpenWakeWord |
| Audio capture | Record after wake | PyAudio / sounddevice |
| Audio playback | Play TTS responses | PyAudio / sounddevice |
| UI | Display info | Web browser (fullscreen PWA) or PyQt |
| Client | Talk to LXC | Python HTTP client |
---
## LXC Setup Checklist
D J creates the LXC with:
- [ ] Ubuntu 22.04 or 24.04 template
- [ ] 4GB RAM, 2-4 CPU cores, 10GB disk
- [ ] Static IP on LAN (e.g., 192.168.86.XX)
- [ ] SSH access enabled (key-based)
- [ ] Audio passthrough NOT needed (LXC doesn't play audio — Deck does)
- [ ] Internet access (for OpenClaw, model downloads)
- [ ] Hostname: `kip` (optional, nice to have)
Once created, give Case:
- IP address
- SSH credentials or key
---
## API Design (LXC ↔ Steam Deck)
### POST /listen
Steam Deck sends audio, gets back text response + TTS audio.
```
Request:
Content-Type: multipart/form-data
Body: audio file (WAV, 16kHz mono)
Response:
{
"text": "user said this",
"response": "Kip says this",
"audio": "<base64 WAV of TTS response>",
"ui_update": {
"type": "grocery_list",
"data": ["eggs", "milk", "bread"]
}
}
```
### GET /status
Health check + current state (timers, lists, etc.)
### GET /grocery
Returns current grocery list (for phone web view)
### POST /grocery
Add/remove items (for phone web view)
### GET /ui
Returns current display state for the Deck's screen.
---
## Kip Agent (OpenClaw)
Kip gets its own OpenClaw agent with:
### SOUL.md (Personality)
- Name: Kip
- Friendly, concise, warm
- Optimized for voice — short responses, no markdown
- Knows the household (D J, girlfriend, 4 cats)
- Designed for non-technical user
### Capabilities
- **Weather:** "Hey Kip, what's the weather?" → Nashville forecast
- **Timers:** "Hey Kip, set a timer for 15 minutes" → countdown with alarm
- **Grocery list:** "Hey Kip, add eggs to the list" → persistent list
- **Grocery check:** "Hey Kip, what's on the grocery list?" → reads it back
- **General Q&A:** "Hey Kip, how long do I bake chicken at 400?" → answer
- **Time/Date:** "Hey Kip, what time is it?"
### Future Capabilities
- Calendar integration
- Music control (Spotify)
- Smart home (if they get devices)
- Recipe lookup
- Kroger API for prices/ordering
---
## Phone Web View
Simple responsive web page served by the LXC:
- Shows grocery list
- Can add/remove items by tapping
- Accessible at `http://kip.local:8080` or `http://192.168.86.XX:8080`
- Girlfriend can open it on her iPhone in the store
---
## Build Phases
### Phase 1: Voice Loop (MVP)
- [ ] LXC setup + dependencies installed
- [ ] Faster Whisper running (base.en model)
- [ ] Piper TTS running (pick a good voice)
- [ ] FastAPI server handling /listen endpoint
- [ ] Steam Deck: wake word + record + send + play response
- [ ] Test: "Hey Kip, hello" → Kip responds with voice
- **Goal: End-to-end voice working**
### Phase 2: Grocery List + Weather
- [ ] Grocery list CRUD (voice + API)
- [ ] Weather skill (Nashville)
- [ ] Timer system with alarm sounds
- [ ] Phone web view for grocery list
- [ ] Steam Deck display: clock + weather + active timers
- **Goal: Actually useful in the kitchen**
### Phase 3: OpenClaw Integration
- [ ] Kip agent running on OpenClaw
- [ ] General Q&A via Claude/Qwen
- [ ] Smarter conversations (context, follow-ups)
- [ ] Cost optimization: simple commands (timer, list) handled locally, only complex Q&A hits Claude
- **Goal: Smart assistant, not just a voice command box**
### Phase 4: Polish
- [ ] Custom wake word model trained on "Hey Kip"
- [ ] Better TTS voice selection
- [ ] Deck UI polish (nice weather widget, timer display, list view)
- [ ] Ambient mode (clock/weather when idle)
- [ ] Multiple room support (add Pi later)
- **Goal: Girlfriend actually wants to use it daily**
---
## Cost
| Item | Cost |
|------|------|
| Steam Deck | Already owned |
| LXC container | Free (Proxmox) |
| OpenWakeWord | Free (open source) |
| Faster Whisper | Free (open source) |
| Piper TTS | Free (open source) |
| OpenClaw | Already running |
| Claude API for Q&A | Covered by existing subscription |
| **Total** | **$0** |
---
## File Structure
```
projects/kip/
├── PROJECT.md # This file
├── server/ # LXC-side code
│ ├── main.py # FastAPI server
│ ├── stt.py # Whisper STT wrapper
│ ├── tts.py # Piper TTS wrapper
│ ├── skills/ # Timer, grocery, weather handlers
│ ├── data/ # Grocery lists, state
│ └── requirements.txt
├── client/ # Steam Deck code
│ ├── kip_client.py # Main client app
│ ├── wake_word.py # OpenWakeWord listener
│ ├── audio.py # Record/playback
│ ├── ui/ # Display UI
│ └── requirements.txt
└── agent/ # Kip's OpenClaw agent config
├── SOUL.md
└── config.yaml
```
---
## Notes
- All voice processing (STT/TTS) on LXC, not Deck — keeps client thin
- Wake word is the ONLY thing that runs on Deck locally
- Grocery list syncs to a web page for phone access
- Simple commands (timer, list) should be handled WITHOUT hitting Claude to save tokens
- Only complex Q&A ("how long to bake chicken?") routes through OpenClaw/Claude
- Qwen on Ollama (192.168.86.137) as fallback for simple Q&A