Hacker Newsnew | past | comments | ask | show | jobs | submit | sparacha's commentslogin

Hi HN — we’re the team behind Arch-Router [1], A 1.5B preference-aligned LLM router that guides model selection by matching queries to user-defined domains (e.g., travel) or action types (e.g., image editing). Offering a practical mechanism to encode preferences and subjective evaluation criteria in routing decisions.

Today we’re extending that approach to Claude Code via Arch Gateway[2], bringing multi-LLM access into a single CLI agent with two main benefits:

1. Model Access: Use Claude Code alongside Grok, Mistral, Gemini, DeepSeek, GPT or local models via Ollama.

2. Preference-aware Routing: Assign different models to specific coding tasks, such as – Code generation – Code reviews and comprehension – Architecture and system design – Debugging

Why not route based on public benchmarks? Most routers lean on performance metrics — public benchmarks like MMLU or MT-Bench, or raw latency/cost curves. The problem: they miss domain-specific quality, subjective evaluation criteria, and the nuance of what a “good” response actually means for a particular user. They can be opaque, hard to debug, and disconnected from real developer needs.

[1] Arch-Router: https://huggingface.co/katanemo/Arch-Router-1.5B

[2] Arch Gateway: https://github.com/katanemo/archgw


Hey! I built this. AMA. The model router is built into the proxy layer here: https://github.com/katanemo/archgw


But you can also use tokens to implement routing decisions in a proxy. You can make RBAC natively available to all agents outside code. The incremental feature work in code vs an out of process server is the trade off. One gets you going super fast the other offers a design choice that (I think) scales a lot better


There is liteLLM, OpenRouter, Arch (although that’s an edge/service proxy for agents) and now this. We all need a new problem to solve


LiteLLM is kind of a mess TBH, I guess it's ok if you just want a docker container to proxy to for personal projects, but actually using it in production isn't great.


I definitely appreciate all the work that has gone in to LiteLLM but it doesn't take much browsing through the 7000+ line `utils.py` to see where using it could become problematic (https://github.com/BerriAI/litellm/blob/main/litellm/utils.p...)


can you double click a little bit? many files in professional repos are 1000s of lines. LoC in it self is not a code smell.


LiteLLM is the worst code I have ever read in my life. Quite an accomplishment, lol.


ok still not helpful in giving substantial criticism


Sorry if this sounds harsh, but I'm not really interested in spending time to code review the worst code I've ever seen in 30 years of programming.

Is LiteLLM's code written by an LLM?


and you say you aren't "vested" in liteLLM?


yes, green text hn account, i am not. i just want help in properly identifying flaws in litellm. clearly nobody here is offering actual analysis.


> but actually using it in production isn't great.

I only use it in development. Could you elaborate on why you don't recommend using it in production?


the people behind envoy proxy built: https://github.com/katanemo/archgw - has the learnings of Envoy but natively designed to process/route prompts to agents and LLMs. Would be curious about your thoughts


And all of them despite 80% of model providers offering an OpenAI compatible endpoint


I think Mozilla of all people would understand why standardizing on one private organization's way of doing things might not be best for the overall ecosystem. Building a tool that meets LLM providers where they are instead of relying on them to homogenize on OpenAI's choices seems like a great reason for this project.


portkey as well which is both js and open source https://www.latent.space/p/gateway


why provide link if there is not a single portkey keyword there?


its my interview w portkey folks which has more thoughts on the category


we are trying to apply model-routing to academic work and pdf chat with ubik.studio -- def lmk what you think


That’s an example of what the edge component could do. Did you give the preference-based automatic routing a try?


No, but I've already put this at the top of my tinker pile. I'm sure I will soon


RouteLLM is essentially a benchmark-driven approach. Their framework chooses between a weak and a strong model and helps developers optimize for a metric called APGR (Average Performance Gap Recovered) — a measure of how much of the stronger model’s performance can be recovered when routing some queries to the weaker, cheaper model. However, their routing models are trained to maximize performance on public benchmarks like MMLU, BBH, or MT-Bench. These benchmarks may not capture subjective, domain-specific quality signals that surface in practice.

Arch-Router takes a different approach. Instead of focusing benchmark scores, we lets developers define routing policies in plain language based on their preferences — like “contract analysis → GPT-4o” or “lightweight brainstorming → Gemini Flash.” Our 1.5B model learns to map prompts (along with conversational context) to these policies, enabling routing decisions that align with real-world expectations, not abstract leaderboards. Also our approach doesn't require router model retraining when new LLMs are swapped in or when preferences change.

Hope this helps.



Arch is developer friendly, but designed for enterprise-grade customers in mind. The core contributors of Envoy redesigned the proxy substrate to handle prompts - offering something that is battle tested in terms of resiliency, speed, and deployments. Second, OpenRouter offers choice of models, but dynamically routing to LLMs based on user-defined usage policies is uniquely available in Arch. Hope that helps


Can you share more about your evaluation setup? I would love to see the specific usage pattern as we have tested our model against smaller LLMs and foundational models and our results show things differently. Of course, routing policies should follow best practices here: https://docs.archgw.com/guides/llm_router.html

Nonetheless, super curious to learn more and see what we may be able to improve. This is technically not a classifier model - its a usage prediction model (feels like a classifier, but not quite in terms of intended usage)


yes - we have already published a quantized version here: https://huggingface.co/katanemo/Arch-Router-1.5B.gguf. The performance difference with a quant version is negligible. I'll run another analysis and update the thread shortly


Overall performance degrades from 93.17 -> 92.99 with a quantized version


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: