More

sparacha · 2025-10-02T21:40:30 1759441230

Hi HN — we’re the team behind Arch-Router [1], A 1.5B preference-aligned LLM router that guides model selection by matching queries to user-defined domains (e.g., travel) or action types (e.g., image editing). Offering a practical mechanism to encode preferences and subjective evaluation criteria in routing decisions.

Today we’re extending that approach to Claude Code via Arch Gateway[2], bringing multi-LLM access into a single CLI agent with two main benefits:

1. Model Access: Use Claude Code alongside Grok, Mistral, Gemini, DeepSeek, GPT or local models via Ollama.

2. Preference-aware Routing: Assign different models to specific coding tasks, such as – Code generation – Code reviews and comprehension – Architecture and system design – Debugging

Why not route based on public benchmarks? Most routers lean on performance metrics — public benchmarks like MMLU or MT-Bench, or raw latency/cost curves. The problem: they miss domain-specific quality, subjective evaluation criteria, and the nuance of what a “good” response actually means for a particular user. They can be opaque, hard to debug, and disconnected from real developer needs.

[1] Arch-Router: https://huggingface.co/katanemo/Arch-Router-1.5B

[2] Arch Gateway: https://github.com/katanemo/archgw

sparacha · 2025-08-02T22:10:14 1754172614

Hey! I built this. AMA. The model router is built into the proxy layer here: https://github.com/katanemo/archgw

sparacha · 2025-07-22T20:54:31 1753217671

But you can also use tokens to implement routing decisions in a proxy. You can make RBAC natively available to all agents outside code. The incremental feature work in code vs an out of process server is the trade off. One gets you going super fast the other offers a design choice that (I think) scales a lot better

sparacha · 2025-07-22T18:21:16 1753208476

There is liteLLM, OpenRouter, Arch (although that’s an edge/service proxy for agents) and now this. We all need a new problem to solve

CuriouslyC · 2025-07-22T18:26:07 1753208767

LiteLLM is kind of a mess TBH, I guess it's ok if you just want a docker container to proxy to for personal projects, but actually using it in production isn't great.

tom_usher · 2025-07-22T18:56:27 1753210587

I definitely appreciate all the work that has gone in to LiteLLM but it doesn't take much browsing through the 7000+ line `utils.py` to see where using it could become problematic (https://github.com/BerriAI/litellm/blob/main/litellm/utils.p...)

swyx · 2025-07-22T19:34:23 1753212863

can you double click a little bit? many files in professional repos are 1000s of lines. LoC in it self is not a code smell.

otabdeveloper4 · 2025-07-22T20:26:27 1753215987

LiteLLM is the worst code I have ever read in my life. Quite an accomplishment, lol.

swyx · 2025-07-22T22:51:47 1753224707

ok still not helpful in giving substantial criticism

otabdeveloper4 · 2025-07-23T04:54:49 1753246489

Sorry if this sounds harsh, but I'm not really interested in spending time to code review the worst code I've ever seen in 30 years of programming.

Is LiteLLM's code written by an LLM?

honorable_coder · 2025-07-22T23:20:20 1753226420

and you say you aren't "vested" in liteLLM?

swyx · 2025-07-23T04:28:40 1753244920

yes, green text hn account, i am not. i just want help in properly identifying flaws in litellm. clearly nobody here is offering actual analysis.

dlojudice · 2025-07-22T18:34:18 1753209258

> but actually using it in production isn't great.

I only use it in development. Could you elaborate on why you don't recommend using it in production?

honorable_coder · 2025-07-22T18:43:05 1753209785

the people behind envoy proxy built: https://github.com/katanemo/archgw - has the learnings of Envoy but natively designed to process/route prompts to agents and LLMs. Would be curious about your thoughts

wongarsu · 2025-07-22T19:35:10 1753212910

And all of them despite 80% of model providers offering an OpenAI compatible endpoint

troyvit · 2025-07-23T14:31:26 1753281086

I think Mozilla of all people would understand why standardizing on one private organization's way of doing things might not be best for the overall ecosystem. Building a tool that meets LLM providers where they are instead of relying on them to homogenize on OpenAI's choices seems like a great reason for this project.

swyx · 2025-07-22T19:10:46 1753211446

portkey as well which is both js and open source https://www.latent.space/p/gateway

pzo · 2025-07-22T19:44:09 1753213449

why provide link if there is not a single portkey keyword there?

swyx · 2025-07-22T22:50:27 1753224627

its my interview w portkey folks which has more thoughts on the category

ieuanking · 2025-07-22T18:37:26 1753209446

we are trying to apply model-routing to academic work and pdf chat with ubik.studio -- def lmk what you think

sparacha · 2025-07-13T02:03:54 1752372234

That’s an example of what the edge component could do. Did you give the preference-based automatic routing a try?

mutant · 2025-07-13T03:33:38 1752377618

No, but I've already put this at the top of my tinker pile. I'm sure I will soon

sparacha · 2025-07-01T23:51:45 1751413905

RouteLLM is essentially a benchmark-driven approach. Their framework chooses between a weak and a strong model and helps developers optimize for a metric called APGR (Average Performance Gap Recovered) — a measure of how much of the stronger model’s performance can be recovered when routing some queries to the weaker, cheaper model. However, their routing models are trained to maximize performance on public benchmarks like MMLU, BBH, or MT-Bench. These benchmarks may not capture subjective, domain-specific quality signals that surface in practice.

Arch-Router takes a different approach. Instead of focusing benchmark scores, we lets developers define routing policies in plain language based on their preferences — like “contract analysis → GPT-4o” or “lightweight brainstorming → Gemini Flash.” Our 1.5B model learns to map prompts (along with conversational context) to these policies, enabling routing decisions that align with real-world expectations, not abstract leaderboards. Also our approach doesn't require router model retraining when new LLMs are swapped in or when preferences change.

Hope this helps.

sparacha · 2025-07-01T22:55:38 1751410538

https://news.ycombinator.com/item?id=44436031

sparacha · 2025-07-01T22:31:34 1751409094

Arch is developer friendly, but designed for enterprise-grade customers in mind. The core contributors of Envoy redesigned the proxy substrate to handle prompts - offering something that is battle tested in terms of resiliency, speed, and deployments. Second, OpenRouter offers choice of models, but dynamically routing to LLMs based on user-defined usage policies is uniquely available in Arch. Hope that helps

sparacha · 2025-07-01T21:15:55 1751404555

Can you share more about your evaluation setup? I would love to see the specific usage pattern as we have tested our model against smaller LLMs and foundational models and our results show things differently. Of course, routing policies should follow best practices here: https://docs.archgw.com/guides/llm_router.html

Nonetheless, super curious to learn more and see what we may be able to improve. This is technically not a classifier model - its a usage prediction model (feels like a classifier, but not quite in terms of intended usage)

sparacha · 2025-07-01T19:44:14 1751399054

yes - we have already published a quantized version here: https://huggingface.co/katanemo/Arch-Router-1.5B.gguf. The performance difference with a quant version is negligible. I'll run another analysis and update the thread shortly

sparacha · 2025-07-01T21:26:48 1751405208

Overall performance degrades from 93.17 -> 92.99 with a quantized version