Proton should train two small specialized models for Lumo
Hey, I know Proton isn't an AI lab and training models is hard. Wishlist post, not a demand. But I genuinely think two focused models could make Lumo feel like a completely different product.
Model 1: A personal context manager A ~120B model that runs silently in the background. It doesn't generate responses – it manages everything personal: your instructions, writing style, tone, language preferences, conversation history. But instead of naively loading a million tokens every time, it would compress and score your memory – keeping what's relevant, quietly dropping what isn't – and then brief the main model before it replies. You'd never have to repeat yourself, and your customizations would actually feel like they matter.
And here's where it gets interesting: this model could also be how Proton finally ships memory across chats. After each conversation, it reads through what was said, extracts new preferences, interests, habits, or anything worth remembering, and stores that as encrypted memory. Proton has already hinted at persistent memory being on the roadmap – this architecture would be a natural way to build it, and doing it in-house means your personal data stays encrypted and never touches a third-party model. That's a big deal for a privacy-first product.
Model 2: A smart router and tool-calling model A ~70–120B model with 128K context, trained specifically for orchestration. The reason it needs to be this size: routing properly isn't just picking a model from a list – it requires genuine multi-step reasoning. It needs to figure out what your request actually needs, whether it requires web search, which tools to chain and in what order, and which model is best suited for that specific task. It's similar to how ChatGPT dynamically switches between instant and extended thinking – except this goes further by routing between entirely different AI models. Smaller models just don't reason well enough across multiple steps to do this reliably.
On top of that, user-controlled routing modes would make this even better:
- Fast mode – lightweight model, instant response
- Balanced mode – default
- Max mode – best available model, deeper reasoning, slower
That kind of control fits perfectly with Proton's power-user audience.
Curious if anyone else thinks this direction makes sense, or if there are smarter ways to approach it.
(Used AI to help structure my thoughts, the ideas are mine though.)