Dev Log @2026.7.3

Last dev log I wrote out a roadmap. This one is mostly me actually working through it.

The past month has been the most focused stretch Utsuwa has had since I started it, and a big chunk of that list is now shipped. More than that, the app finally crossed a line I have wanted to cross since the beginning: you can run the whole thing locally.

The entire stack runs local now

This was the goal from day one and it is real now. Utsuwa can talk to a local LLM through Ollama or LM Studio, speak through a local text-to-speech server, and, as of this month, transcribe your voice with a local Whisper server too. Local speech-to-text was the missing piece.

If you wire all three up, nothing you say, nothing she says, and nothing you type ever leaves your machine. No keys, no per-minute billing, no company sitting in the middle of your conversations. That is the version of an AI companion I actually want to exist, and now it is one you can build for yourself. There is a short setup guide in the docs if you want to try it.

Built for modest hardware, not just a 5090

Going local only matters if it works on the hardware people actually have. Most of you are not running a 5090, and I did not want the good experience to be locked behind a top-end GPU. So a lot of this month went into making the companion behave on smaller local models.

The tricky part is that Utsuwa asks the model for a bit of structure behind the scenes. Every reply also carries hidden information about mood, memory, and how the relationship is shifting. Big models handle that in stride. Smaller ones get sloppy with it, and when they do, you either lose those updates or you get raw formatting leaking into what she says out loud.

I spent real time hardening that path. If a smaller model skips the structured part, the app now does a quiet second pass to recover it instead of dropping the mood and memory entirely. I tightened the prompts so they are easier for a small model to follow. Reasoning models that like to think out loud get their scratch work stripped before anything reaches the screen. And if a response gets cut off partway through, it no longer dumps half-written formatting into the conversation.

The result is that a 4B model running on a laptop and a big cloud model both feel like the same companion. One is just quieter about it. This was for the people running modest setups, and it made the whole thing more solid for everyone.

Point it at almost anything

Alongside the local work, Utsuwa now supports any OpenAI-compatible endpoint. If a service or a local server speaks the OpenAI API, you drop in a base URL, a key if it needs one, and a model, and you are running. That covers OpenRouter, Together, a self-hosted vLLM, and a long tail of others without me hardcoding every provider one at a time.

Voice input got the same idea. On top of local Whisper and Groq, you can now use OpenAI’s Whisper for transcription. More options, fewer walls.

Showing her things

One of my favorite additions is being able to show your companion a photo. Hand her an image from the camera button or just drag one in, and vision-capable models actually see it. The pictures you keep live on a little scrapbook board, and like everything else, they stay on your device. It makes a conversation feel less like typing into a box and more like sharing a moment with someone.

The desktop app grew up

The desktop app is now on macOS, Windows, and Linux, with quiet in-app updates so you are not hunting for download links every time. Overlay mode lets your companion float on top of everything else on your screen, so she can sit alongside your work instead of being buried in a tab you forget about.

The unglamorous, important stuff

I also spent real time on the parts nobody sees. I did a full pass over the codebase and fixed a pile of things. Conversations are properly remembered between sessions now. The relationship system no longer gets stuck partway through. Time away from the app behaves the way it should instead of quietly punishing you for taking a break. Onboarding is cleaner. None of this is flashy, but it is the difference between a demo and something you can actually live with day to day.

Looking into Live2D

Here is the one I am excited to think out loud about. I am looking into Live2D support.

VRM is core to Utsuwa and it is not going anywhere. The 3D model, the head tracking, the way she exists in space, that is the heart of the experience and it stays that way. But a lot of people love the flat, hand-drawn Live2D style, the kind you see across so many vtuber and character apps, and it is a genuinely different feel. I would like Utsuwa to be able to hold both.

So the plan is to add Live2D as an alternative renderer that sits next to VRM, not on top of it. You pick the style of character that fits you, and the rest of the app works the same either way. It is early, and I am still digging into what it would really take, but I wanted to say it out loud.

Where this is going

The north star has not moved. An open, local-first AI companion that you can shape, inspect, and own. The last month was about turning that from a promise into something you can actually use, and making the app solid enough to trust with a real relationship over time.

Thank you to everyone who has been using it, filing bugs, and sending ideas. It genuinely shapes what I build next. More soon.