Kip voice assistant project plan
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@ -1,5 +1,5 @@
|
||||
{
|
||||
"last_check": "2026-02-09T23:25:59.773254+00:00",
|
||||
"last_check": "2026-02-09T23:44:00.012836+00:00",
|
||||
"total_tracked": 3100,
|
||||
"new_this_check": 0
|
||||
}
|
||||
224
projects/kip/PROJECT.md
Normal file
224
projects/kip/PROJECT.md
Normal file
@ -0,0 +1,224 @@
|
||||
# Kip — Voice Assistant
|
||||
|
||||
**Codename:** Kip
|
||||
**Purpose:** Alexa replacement for D J's girlfriend
|
||||
**Architecture:** Steam Deck (thin client) ↔ Proxmox LXC (brains)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Kip is a privacy-first voice assistant. The Steam Deck acts as a dumb terminal (mic, speaker, screen). All intelligence runs on an LXC container on Proxmox.
|
||||
|
||||
## Hardware
|
||||
|
||||
### Steam Deck (Client)
|
||||
- Always-on, propped up in kitchen/living room on charging dock
|
||||
- Runs: wake word detection, audio capture, audio playback, display UI
|
||||
- Connects to LXC over local WiFi
|
||||
|
||||
### LXC Container (Server)
|
||||
- **OS:** Ubuntu 22.04 or 24.04
|
||||
- **RAM:** 4GB recommended (Whisper needs ~1.5GB, Piper ~200MB, OpenClaw ~500MB)
|
||||
- **Disk:** 10GB (models + data)
|
||||
- **CPU:** 2-4 cores (Whisper STT is CPU-bound)
|
||||
- **Network:** Static IP on LAN, accessible from Steam Deck
|
||||
|
||||
## Software Stack
|
||||
|
||||
### LXC Container
|
||||
|
||||
| Component | Purpose | Tool |
|
||||
|-----------|---------|------|
|
||||
| STT | Speech-to-text | Faster Whisper (base.en model) |
|
||||
| TTS | Text-to-speech | Piper (en_US voice) |
|
||||
| Agent | Intelligence | OpenClaw with Kip agent |
|
||||
| API | Communication | FastAPI HTTP server |
|
||||
| Data | Grocery list, timers | JSON files + SQLite |
|
||||
|
||||
### Steam Deck
|
||||
|
||||
| Component | Purpose | Tool |
|
||||
|-----------|---------|------|
|
||||
| Wake word | "Hey Kip" detection | OpenWakeWord |
|
||||
| Audio capture | Record after wake | PyAudio / sounddevice |
|
||||
| Audio playback | Play TTS responses | PyAudio / sounddevice |
|
||||
| UI | Display info | Web browser (fullscreen PWA) or PyQt |
|
||||
| Client | Talk to LXC | Python HTTP client |
|
||||
|
||||
---
|
||||
|
||||
## LXC Setup Checklist
|
||||
|
||||
D J creates the LXC with:
|
||||
- [ ] Ubuntu 22.04 or 24.04 template
|
||||
- [ ] 4GB RAM, 2-4 CPU cores, 10GB disk
|
||||
- [ ] Static IP on LAN (e.g., 192.168.86.XX)
|
||||
- [ ] SSH access enabled (key-based)
|
||||
- [ ] Audio passthrough NOT needed (LXC doesn't play audio — Deck does)
|
||||
- [ ] Internet access (for OpenClaw, model downloads)
|
||||
- [ ] Hostname: `kip` (optional, nice to have)
|
||||
|
||||
Once created, give Case:
|
||||
- IP address
|
||||
- SSH credentials or key
|
||||
|
||||
---
|
||||
|
||||
## API Design (LXC ↔ Steam Deck)
|
||||
|
||||
### POST /listen
|
||||
Steam Deck sends audio, gets back text response + TTS audio.
|
||||
|
||||
```
|
||||
Request:
|
||||
Content-Type: multipart/form-data
|
||||
Body: audio file (WAV, 16kHz mono)
|
||||
|
||||
Response:
|
||||
{
|
||||
"text": "user said this",
|
||||
"response": "Kip says this",
|
||||
"audio": "<base64 WAV of TTS response>",
|
||||
"ui_update": {
|
||||
"type": "grocery_list",
|
||||
"data": ["eggs", "milk", "bread"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### GET /status
|
||||
Health check + current state (timers, lists, etc.)
|
||||
|
||||
### GET /grocery
|
||||
Returns current grocery list (for phone web view)
|
||||
|
||||
### POST /grocery
|
||||
Add/remove items (for phone web view)
|
||||
|
||||
### GET /ui
|
||||
Returns current display state for the Deck's screen.
|
||||
|
||||
---
|
||||
|
||||
## Kip Agent (OpenClaw)
|
||||
|
||||
Kip gets its own OpenClaw agent with:
|
||||
|
||||
### SOUL.md (Personality)
|
||||
- Name: Kip
|
||||
- Friendly, concise, warm
|
||||
- Optimized for voice — short responses, no markdown
|
||||
- Knows the household (D J, girlfriend, 4 cats)
|
||||
- Designed for non-technical user
|
||||
|
||||
### Capabilities
|
||||
- **Weather:** "Hey Kip, what's the weather?" → Nashville forecast
|
||||
- **Timers:** "Hey Kip, set a timer for 15 minutes" → countdown with alarm
|
||||
- **Grocery list:** "Hey Kip, add eggs to the list" → persistent list
|
||||
- **Grocery check:** "Hey Kip, what's on the grocery list?" → reads it back
|
||||
- **General Q&A:** "Hey Kip, how long do I bake chicken at 400?" → answer
|
||||
- **Time/Date:** "Hey Kip, what time is it?"
|
||||
|
||||
### Future Capabilities
|
||||
- Calendar integration
|
||||
- Music control (Spotify)
|
||||
- Smart home (if they get devices)
|
||||
- Recipe lookup
|
||||
- Kroger API for prices/ordering
|
||||
|
||||
---
|
||||
|
||||
## Phone Web View
|
||||
|
||||
Simple responsive web page served by the LXC:
|
||||
- Shows grocery list
|
||||
- Can add/remove items by tapping
|
||||
- Accessible at `http://kip.local:8080` or `http://192.168.86.XX:8080`
|
||||
- Girlfriend can open it on her iPhone in the store
|
||||
|
||||
---
|
||||
|
||||
## Build Phases
|
||||
|
||||
### Phase 1: Voice Loop (MVP)
|
||||
- [ ] LXC setup + dependencies installed
|
||||
- [ ] Faster Whisper running (base.en model)
|
||||
- [ ] Piper TTS running (pick a good voice)
|
||||
- [ ] FastAPI server handling /listen endpoint
|
||||
- [ ] Steam Deck: wake word + record + send + play response
|
||||
- [ ] Test: "Hey Kip, hello" → Kip responds with voice
|
||||
- **Goal: End-to-end voice working**
|
||||
|
||||
### Phase 2: Grocery List + Weather
|
||||
- [ ] Grocery list CRUD (voice + API)
|
||||
- [ ] Weather skill (Nashville)
|
||||
- [ ] Timer system with alarm sounds
|
||||
- [ ] Phone web view for grocery list
|
||||
- [ ] Steam Deck display: clock + weather + active timers
|
||||
- **Goal: Actually useful in the kitchen**
|
||||
|
||||
### Phase 3: OpenClaw Integration
|
||||
- [ ] Kip agent running on OpenClaw
|
||||
- [ ] General Q&A via Claude/Qwen
|
||||
- [ ] Smarter conversations (context, follow-ups)
|
||||
- [ ] Cost optimization: simple commands (timer, list) handled locally, only complex Q&A hits Claude
|
||||
- **Goal: Smart assistant, not just a voice command box**
|
||||
|
||||
### Phase 4: Polish
|
||||
- [ ] Custom wake word model trained on "Hey Kip"
|
||||
- [ ] Better TTS voice selection
|
||||
- [ ] Deck UI polish (nice weather widget, timer display, list view)
|
||||
- [ ] Ambient mode (clock/weather when idle)
|
||||
- [ ] Multiple room support (add Pi later)
|
||||
- **Goal: Girlfriend actually wants to use it daily**
|
||||
|
||||
---
|
||||
|
||||
## Cost
|
||||
|
||||
| Item | Cost |
|
||||
|------|------|
|
||||
| Steam Deck | Already owned |
|
||||
| LXC container | Free (Proxmox) |
|
||||
| OpenWakeWord | Free (open source) |
|
||||
| Faster Whisper | Free (open source) |
|
||||
| Piper TTS | Free (open source) |
|
||||
| OpenClaw | Already running |
|
||||
| Claude API for Q&A | Covered by existing subscription |
|
||||
| **Total** | **$0** |
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
projects/kip/
|
||||
├── PROJECT.md # This file
|
||||
├── server/ # LXC-side code
|
||||
│ ├── main.py # FastAPI server
|
||||
│ ├── stt.py # Whisper STT wrapper
|
||||
│ ├── tts.py # Piper TTS wrapper
|
||||
│ ├── skills/ # Timer, grocery, weather handlers
|
||||
│ ├── data/ # Grocery lists, state
|
||||
│ └── requirements.txt
|
||||
├── client/ # Steam Deck code
|
||||
│ ├── kip_client.py # Main client app
|
||||
│ ├── wake_word.py # OpenWakeWord listener
|
||||
│ ├── audio.py # Record/playback
|
||||
│ ├── ui/ # Display UI
|
||||
│ └── requirements.txt
|
||||
└── agent/ # Kip's OpenClaw agent config
|
||||
├── SOUL.md
|
||||
└── config.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
- All voice processing (STT/TTS) on LXC, not Deck — keeps client thin
|
||||
- Wake word is the ONLY thing that runs on Deck locally
|
||||
- Grocery list syncs to a web page for phone access
|
||||
- Simple commands (timer, list) should be handled WITHOUT hitting Claude to save tokens
|
||||
- Only complex Q&A ("how long to bake chicken?") routes through OpenClaw/Claude
|
||||
- Qwen on Ollama (192.168.86.137) as fallback for simple Q&A
|
||||
Reference in New Issue
Block a user