Files
workspace/projects/kip/PROJECT.md
2026-02-09 17:49:24 -06:00

6.5 KiB

Kip — Voice Assistant

Codename: Kip Purpose: Alexa replacement for D J's girlfriend Architecture: Steam Deck (thin client) ↔ Proxmox LXC (brains)


Overview

Kip is a privacy-first voice assistant. The Steam Deck acts as a dumb terminal (mic, speaker, screen). All intelligence runs on an LXC container on Proxmox.

Hardware

Steam Deck (Client)

  • Always-on, propped up in kitchen/living room on charging dock
  • Runs: wake word detection, audio capture, audio playback, display UI
  • Connects to LXC over local WiFi

LXC Container (Server)

  • OS: Ubuntu 22.04 or 24.04
  • RAM: 4GB recommended (Whisper needs ~1.5GB, Piper ~200MB, OpenClaw ~500MB)
  • Disk: 10GB (models + data)
  • CPU: 2-4 cores (Whisper STT is CPU-bound)
  • Network: Static IP on LAN, accessible from Steam Deck

Software Stack

LXC Container

Component Purpose Tool
STT Speech-to-text Faster Whisper (base.en model)
TTS Text-to-speech Piper (en_US voice)
Agent Intelligence OpenClaw with Kip agent
API Communication FastAPI HTTP server
Data Grocery list, timers JSON files + SQLite

Steam Deck

Component Purpose Tool
Wake word "Hey Kip" detection OpenWakeWord
Audio capture Record after wake PyAudio / sounddevice
Audio playback Play TTS responses PyAudio / sounddevice
UI Display info Web browser (fullscreen PWA) or PyQt
Client Talk to LXC Python HTTP client

LXC Setup Checklist

D J creates the LXC with:

  • Ubuntu 22.04 or 24.04 template
  • 4GB RAM, 2-4 CPU cores, 10GB disk
  • Static IP on LAN (e.g., 192.168.86.XX)
  • SSH access enabled (key-based)
  • Audio passthrough NOT needed (LXC doesn't play audio — Deck does)
  • Internet access (for OpenClaw, model downloads)
  • Hostname: kip (optional, nice to have)

Once created, give Case:

  • IP address
  • SSH credentials or key

API Design (LXC ↔ Steam Deck)

POST /listen

Steam Deck sends audio, gets back text response + TTS audio.

Request:
  Content-Type: multipart/form-data
  Body: audio file (WAV, 16kHz mono)

Response:
  {
    "text": "user said this",
    "response": "Kip says this",
    "audio": "<base64 WAV of TTS response>",
    "ui_update": {
      "type": "grocery_list",
      "data": ["eggs", "milk", "bread"]
    }
  }

GET /status

Health check + current state (timers, lists, etc.)

GET /grocery

Returns current grocery list (for phone web view)

POST /grocery

Add/remove items (for phone web view)

GET /ui

Returns current display state for the Deck's screen.


Kip Agent (OpenClaw)

Kip gets its own OpenClaw agent with:

SOUL.md (Personality)

  • Name: Kip
  • Friendly, concise, warm
  • Optimized for voice — short responses, no markdown
  • Knows the household (D J, girlfriend, 4 cats)
  • Designed for non-technical user

Capabilities

  • Weather: "Hey Kip, what's the weather?" → Nashville forecast
  • Timers: "Hey Kip, set a timer for 15 minutes" → countdown with alarm
  • Grocery list: "Hey Kip, add eggs to the list" → persistent list
  • Grocery check: "Hey Kip, what's on the grocery list?" → reads it back
  • General Q&A: "Hey Kip, how long do I bake chicken at 400?" → answer
  • Time/Date: "Hey Kip, what time is it?"

Future Capabilities

  • Calendar integration
  • Music control (Spotify)
  • Smart home (if they get devices)
  • Recipe lookup
  • Kroger API for prices/ordering

Phone Web View

Simple responsive web page served by the LXC:

  • Shows grocery list
  • Can add/remove items by tapping
  • Accessible at http://kip.local:8080 or http://192.168.86.XX:8080
  • Girlfriend can open it on her iPhone in the store

Build Phases

Phase 1: Voice Loop (MVP)

  • LXC setup + dependencies installed
  • Faster Whisper running (base.en model)
  • Piper TTS running (pick a good voice)
  • FastAPI server handling /listen endpoint
  • Steam Deck: wake word + record + send + play response
  • Test: "Hey Kip, hello" → Kip responds with voice
  • Goal: End-to-end voice working

Phase 2: Grocery List + Weather

  • Grocery list CRUD (voice + API)
  • Weather skill (Nashville)
  • Timer system with alarm sounds
  • Phone web view for grocery list
  • Steam Deck display: clock + weather + active timers
  • Goal: Actually useful in the kitchen

Phase 3: OpenClaw Integration

  • Kip agent running on OpenClaw
  • General Q&A via Claude/Qwen
  • Smarter conversations (context, follow-ups)
  • Cost optimization: simple commands (timer, list) handled locally, only complex Q&A hits Claude
  • Goal: Smart assistant, not just a voice command box

Phase 4: Polish

  • Custom wake word model trained on "Hey Kip"
  • Better TTS voice selection
  • Deck UI polish (nice weather widget, timer display, list view)
  • Ambient mode (clock/weather when idle)
  • Multiple room support (add Pi later)
  • Goal: Girlfriend actually wants to use it daily

Cost

Item Cost
Steam Deck Already owned
LXC container Free (Proxmox)
OpenWakeWord Free (open source)
Faster Whisper Free (open source)
Piper TTS Free (open source)
OpenClaw Already running
Claude API for Q&A Covered by existing subscription
Total $0

File Structure

projects/kip/
├── PROJECT.md          # This file
├── server/             # LXC-side code
│   ├── main.py         # FastAPI server
│   ├── stt.py          # Whisper STT wrapper
│   ├── tts.py          # Piper TTS wrapper
│   ├── skills/         # Timer, grocery, weather handlers
│   ├── data/           # Grocery lists, state
│   └── requirements.txt
├── client/             # Steam Deck code
│   ├── kip_client.py   # Main client app
│   ├── wake_word.py    # OpenWakeWord listener
│   ├── audio.py        # Record/playback
│   ├── ui/             # Display UI
│   └── requirements.txt
└── agent/              # Kip's OpenClaw agent config
    ├── SOUL.md
    └── config.yaml

Notes

  • All voice processing (STT/TTS) on LXC, not Deck — keeps client thin
  • Wake word is the ONLY thing that runs on Deck locally
  • Grocery list syncs to a web page for phone access
  • Simple commands (timer, list) should be handled WITHOUT hitting Claude to save tokens
  • Only complex Q&A ("how long to bake chicken?") routes through OpenClaw/Claude
  • Qwen on Ollama (192.168.86.137) as fallback for simple Q&A