Architecture Overview
Utsuwa is a client-side web application that combines 3D avatar rendering, LLM chat, text-to-speech, and a relationship simulation engine. Everything runs in the browser with no backend required.
System Diagram
┌─────────────────────────────────────────────────────────────────┐
│ Browser │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ SvelteKit App │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐│ │
│ │ │ Chat UI │ │ 3D Scene │ │ Settings Panel ││ │
│ │ └──────┬──────┘ └──────┬──────┘ └─────────────────────┘│ │
│ │ │ │ │ │
│ │ ┌──────▼──────┐ ┌──────▼──────┐ │ │
│ │ │ xsAI SDK │ │ Three.js │ │ │
│ │ │ (LLM calls) │ │ + Threlte │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ │ │
│ │ │ │ │ │
│ │ ┌──────▼──────┐ ┌──────▼──────┐ ┌─────────────────────┐│ │
│ │ │ Companion │ │ VRM Model │ │ TTS Pipeline ││ │
│ │ │ Engine │ │ @pixiv/vrm │ │ + Lip-sync ││ │
│ │ └──────┬──────┘ └─────────────┘ └──────────┬──────────┘│ │
│ │ │ │ │ │
│ │ ┌──────▼─────────────────────────────────────▼──────────┐│ │
│ │ │ Svelte 5 Runes Stores ││ │
│ │ │ (character.svelte.ts, vrm.svelte.ts, theme.svelte.ts)││ │
│ │ └──────────────────────────┬────────────────────────────┘│ │
│ │ │ │ │
│ │ ┌──────────────────────────▼────────────────────────────┐│ │
│ │ │ IndexedDB (Dexie.js) ││ │
│ │ │ Character state, facts, turns, events ││ │
│ │ └───────────────────────────────────────────────────────┘│ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────┐
│ External APIs │
│ OpenAI / Anthropic / etc. │
│ ElevenLabs / Web Speech API │
└───────────────────────────────┘ Core Components
VRM Rendering
The 3D avatar system uses Three.js with Threlte (a Svelte wrapper) for integration.
Key files:
src/lib/components/vrm/Scene.svelte— Main 3D scene with camera, lighting, and post-processingsrc/lib/components/vrm/VrmModel.svelte— VRM model loading, animation, and expression controlsrc/lib/stores/vrm.svelte.ts— VRM state including head tracking for UI positioning
Libraries:
@pixiv/three-vrm— VRM model loading and runtime@pixiv/three-vrm-animation— VRMA animation support@threlte/core— Svelte-Three.js integrationn8aoandpostprocessing— Visual effects
How it works:
- User uploads a
.vrmfile or URL - VRM loader parses the model and creates a Three.js scene object
- Threlte manages the render loop and integrates with Svelte’s reactivity
- Expressions and animations are applied via the VRM humanoid and expression APIs
Chat System
Messages flow through the xsAI SDK to reach LLM providers.
Key files:
src/lib/components/chat/BottomChatBar.svelte— User input interfacesrc/lib/components/chat/SpeechBubble.svelte— Message displaysrc/lib/engine/— Core companion engine logic
Flow:
User Input
│
▼
┌──────────────┐
│ Heuristics │ ── Calculate baseline state changes
│ Engine │ (energy decay, streak updates)
└──────┬───────┘
│
▼
┌──────────────┐
│ Memory │ ── Retrieve relevant facts and recent turns
│ Retrieval │ using semantic search (embeddings)
└──────┬───────┘
│
▼
┌──────────────┐
│ Prompt │ ── Combine system prompt + character state
│ Builder │ + memory context + instructions
└──────┬───────┘
│
▼
┌──────────────┐
│ LLM Provider │ ── Stream response from OpenAI/Anthropic/etc.
│ (xsAI SDK) │
└──────┬───────┘
│
▼
┌──────────────┐
│ Response │ ── Extract text + JSON state suggestions
│ Parser │
└──────┬───────┘
│
▼
┌──────────────┐
│ State │ ── Merge heuristic + LLM deltas
│ Merger │ and persist to IndexedDB
└──────────────┘ TTS Pipeline
Text-to-speech converts LLM responses to audio with lip-sync.
Key files:
src/lib/services/lipsync/analyzer.ts— Lip-sync audio analysis- Provider implementations in
src/lib/services/tts/
Supported providers:
- Web Speech API (built-in, no key required)
- ElevenLabs (high quality, requires API key)
Flow:
- LLM response text is sent to TTS provider
- Audio is received as a stream or buffer
- Web Audio API plays the audio
- Audio analyzer extracts volume/frequency data
- Expression controller maps audio to mouth shapes
- Mouth blend shapes are applied to VRM model in real-time
Memory System
Three-tier memory architecture for context and recall.
Key files:
src/lib/engine/memory.ts— Memory managementsrc/lib/types/memory.ts— Memory type definitionssrc/lib/db/index.ts— Database schema
Tiers:
- Working Memory — In-memory buffer of recent conversation turns
- Facts — IndexedDB-stored facts with vector embeddings for semantic search
- Sessions — Conversation summaries for long-term context
Semantic search: Uses @xenova/transformers to run embedding models locally in the browser. Facts are embedded and can be retrieved by semantic similarity to the current conversation.
See Companion System for detailed memory documentation.
State Management
Svelte 5 runes-based stores for reactive state.
Key stores:
src/lib/stores/character.svelte.ts— Character/companion statesrc/lib/stores/vrm.svelte.ts— 3D model state, head trackingsrc/lib/stores/theme.svelte.ts— UI theme (light/dark)src/lib/stores/settings.svelte.ts— Provider configurations
Pattern:
// Svelte 5 runes pattern
let count = $state(0);
const doubled = $derived(count * 2);
$effect(() => {
console.log('Count changed:', count);
}); Storage Layer
All data persists client-side via IndexedDB using Dexie.js.
Database tables:
characters— Character state and relationship datafacts— Memory facts with embeddingsturns— Conversation historycompletedEvents— Milestone events that have fired
Key file: src/lib/db/index.ts
Benefits:
- No server required
- Data stays on user’s device
- Works offline after initial load
- Large storage capacity (typically 50MB+)
Project Structure
src/
├── lib/
│ ├── components/
│ │ ├── chat/ # Chat UI components
│ │ ├── docs/ # Documentation components
│ │ ├── ui/ # Shared UI primitives
│ │ └── vrm/ # 3D scene and model
│ ├── config/ # App configuration
│ ├── data/ # Static data (events, etc.)
│ ├── db/ # Database schema
│ ├── engine/ # Companion engine core
│ ├── services/ # Provider integrations
│ ├── stores/ # Svelte stores
│ ├── types/ # TypeScript types
│ └── utils/ # Utility functions
├── routes/
│ ├── (app)/ # Main app routes
│ └── docs/ # Documentation site
└── content/
└── docs/ # Markdown documentation Key Interactions
Expression Updates
When the companion’s mood changes:
- Companion engine calculates new mood state
- State is written to
character.svelte.tsstore - VRM component reacts to store change
- Expression controller maps mood to VRM blend shapes
- VRM model’s face updates in real-time
Event Triggering
When relationship thresholds are crossed:
- State merger detects threshold crossing
- Event system checks for eligible events
- Matching event is marked as triggered
- UI displays event content (if any)
- Event ID is added to
completedEvents
Technologies
| Category | Technology |
|---|---|
| Framework | SvelteKit 2 |
| Language | TypeScript |
| 3D Rendering | Three.js + Threlte |
| VRM Support | @pixiv/three-vrm |
| LLM Integration | xsAI SDK |
| Styling | Tailwind CSS 4 |
| Database | IndexedDB (Dexie.js) |
| Embeddings | Transformers.js |
| Build Tool | Vite |