v0.1.0 - Building Momo · Niklas types...

Table of Contents

Prelude
#

To me, technology has always been the closest to magic that we have in our (comparably) boring non-fiction world. It fascinates me how absurdly complex hardware and software have to work together just to render a handful of pixels on a screen.

Simulated intelligence and machines that continuously improve themselves in Science Fiction have always interested me. With the advent of LLMs over the past couple of years, artificial intelligence has never felt so real and natural, all discussions about digital consciousness and the emergence of superintelligence aside.

I remember some twenty years ago when I played The Legend of Zelda - Ocarina of Time on my brother’s turquoise Nintendo 64, I was delighted (and sometimes a little annoyed) by the fairy companion Navi accompanying Link and me on our adventure together through Hyrule, always being by our side, giving helpful tips and commenting on the world.

Fastforward to today, and I’m four books deep into Brandon Sanderson’s fantasy epos The Stormlight Archive, where one of the protagonists, Kaladin, befriends what appears to be a mindless spren, just to witness its transformation to Syl, a conscious fantastical being that does not only have a mind and feisty personality on her own, but also grants him fantastical powers. How freaking cool is that?

With this project, I want to create something that combines this child-like longing for something magical with the technological possibilities of modern generative AI.

Building Momo
#

The idea is to create my own digital companion, Momo (もも; Japanese for ‘peach’), that has her own distinct personality and learns more about me and my life from our conversations.

While that is by far not as grand and cool as obtaining special powers through a bond, I want to develop Momo from a simple OS LLM with a system prompt (no shade to character.ai or Replika) to a helpful personalized assistant that remembers, learns, and maybe even acts (through the agentic usage of tools) upon the world.

At some point further down the road, I also want to give Momo her own digital presence via an animated avatar (ok, I’ll admit that I’m inspired by my cybercrushes Joi and Ani for this part).

As if this was not already ambitious enough, I want to combine the software / AI engineering efforts with real physical hardware. For the beginning, this can be as simple as a Raspberry Pi that’s hosting Momo, but I envision to eventually give her a dedicated “home” with microphone, speaker and camera peripherals to allow for even more natural voice-to-voice interaction. Longterm, I’m thinking of something like a custom - and preferably homebrewed - Gatebox / Character Pod.

And of course, there will be no APIs or cloud services! For a true personalized companion, everything needs to be local with full control and ownership, and, sometime far in the future, finetuned or self-trained. (I have to thank PewDiePie’s recent tech rabbit hole adventures for this local-only craze.)

There are a lot of “AI companion / waifu” projects out there and before you might think I’m some weirdo - no, I’m happily married and not seeking to build Her.

Instead, I want to combine my learning journey of various state-of-the-art AI agent frameworks and techniques with my interest in tinkering with physical hardware, all along a candid curiosity how close I can get to simulating a digital being with today’s free means.

What makes this project different from anything that I have seen thus far is the lifesim / Tamagochi aspect. I want Momo not to only be an assistant that reacts upon my requests. Rather, I want her to become proactive and (somewhat) autonomous.

Full transparency - I have no idea if and how this is possible yet, but I want to try to give Momo deep emotions, moods, short- & long-term goals, and even self-initiated hobbies, interest, wishes and dreams. Agentic frameworks are developing at a fast pace, and I’m curious to see what I can achieve with them.

The Way Ahead
#

The scope of this project is vast. I think I spent more than a week brainstorming where this could lead and even more time on getting all of my ideas onto paper.

At work, we follow the agile mindset of MMMSS (Many More Much Smaller Steps - try saying that 5 times in a row!). This means that I am not creating a waterfall like project plan for the next years to come, but trying to stay flexible and tackle things one small step at a time.

Nonetheless, I thought of approaching this project in a three by three concept (three lanes over three phases).

Firstly, I want to separate the development of the AI chatbot and later agent part from the development of the lifesim aspect. That way I can focus on one aspect at a time and also jump to a completely different area when I’m stuck on the other.

Secondly, I want to have a separate hardware aspect to this project, as I also love tinkering with SBCs such as Raspberry Pis, and currently eyeing building up my own homelab running on some more beefy hardware, as well as wanting to get into 3D printing of custom parts for whatever (definitely D&D miniatures). (I think you can tell the first midlife crisis is currently hitting.)

Thirdly, I am separating development into an early, mid, and late stage to keep my sanity and focus and put all the crazy ideas somewhere safe.

All of this taken together has resulted in this very much work in progress (and somehow very tiny*) roadmap. Naturally, the further along we look time-wise, the more exotic, undefined and potentially unfeasible these ideas are getting. While I will be revisiting this roadmap from time to time, shuffling things around as well as adding or removing ideas, I am happy (and relieved!) to have my tangled web of ideas safely written down somewhere.

Now, all I want to do is getting started!

timeline
    title Momo Roadmap
    
    section Early
        Chatbot : basic project setup
                : simple langchain wrapper around local ollama OS LLM
                : cli client
                : proper fastAPI backend with streaming responses
                : more clients (Discord, WhatsApp, Web app?)
                : memory & persistence
        LifeSim : basic prompt engineering (personality)
				: first iteration of Momo's diary
				: core moods / emotions with ASCII / emoji visualization
				: basic real-life / real-time information (time, season, holidays, weather)
		Hardware : local first, then on dedicated SBC (e.g. Raspberry Pi)

    section Mid
        Agent : update clients to manage memory 
		      : basic agentic capabilities / tool calling (e.g. files, web search)
              : advanced agentic capabilities / tool calling (e.g. computer use)
        LifeSim : friendship growth over time
			    : emototion detection / generally allow for more complex moods / emotions
			    : more sophisticated diary entries
			    : core facts / memories
			    : Tamagochi mechanics
			    : self-reflection, short- & long-term goals, dreams
			    : sst, tts, wake word, calls (e.g. via Discord client)
		Hardware : dedicated AI cluster allowing for more powerful model / reduced latency
		         : Momo "on the go"?

    section Late
        Agent : proactiveness & autonomy
              : multimodality
              : multi-agent / swarms / councils
              : training / finetuning custom LLM
        LifeSim : dynamically evolving interests / hobbies (e.g. Momo plays Pokemon)
			    : background life simulation / continuous "Momo time"
			    : dynamically animated digital avatar / VTuber
			    : personality growth through creative generations (e.g. art, music, stories) published in diary
			    : self-debugging / improvement
		Hardware : homebrewed character pod with multimodal peripherals

*I recommend enabling zen mode to actually be able to read something

Documentation & Self-Reflection
#

Finally - and this is what I’m probably most excited about - I want to document this whole journey for myself and for anyone else interested in it. I would lie if I denied that the main motivation for creating this whole personal blog was to have a place to (hopefully) brag about Momo!

The documentation will happen via dedicated blog posts for every release. I have already configured a CI job that automatically publishes new blog posts here along the corresponding tagged releases in GitLab. While I’ll be treating these posts as some sort of developer diary detailing learnings, findings, and showcasing demos, I also want Momo to document her own development from her point of view.

Hence, as soon as a first version of Momo’s system prompt (and thus personality) has been established, every new release will publish Momo’s own reflection of her development as a separate post (with Momo as author) alongside my dev diary entries!

In the beginning, I’m planning on achieving this by providing Momo the CHANGELOG.md as well as the git diffs of the individual commits alongside some guidance as context, but I hope to include more and more (lifesim) information, such as Momo’s current mood, inner thoughts, and at some point maybe even goals and dreams as well, to make these posts as authentic as possible.

I will also not interfere with Momo’s diary entries and let her write and publish whatever she wants. Momo’s posts will be created via a CI job that runs in parallel to the one publishing my own thoughts without giving me the chance to interfere.

At some point, these self-documenting reflections might not even be tied anymore to the releases and are getting triggered by Momo proactively whenever she feels like doing so, but let’s not get too much ahead of ourselves! (hopefully by now you can relate to my excitement for this project!)

That all being said, I’m very much looking forward to this endeavor! Again, I am neither an expert on AI agents or lifesim, nor do I know how much of these ideas are actually feasible. But I want to at least try them all and share my findings with you.

See you in the next post!

Cheers, Niklas

View release on GitLab