Oh yes for Claude I use LiteLLM as a proxy to use it with OpenWebUI.
I'll try librechat too (never heard of it before) but I wonder if it has the same capabilities like voice and python tools. And ollama support (95% of my AI interactions are running locally)
I think that's probably the shim I was referring to - it has hardcoded context length, but it is either implemented incorrectly, Anthropic ignores it, or maybe it's on openwebui to manage the window and it just isn't? Not sure. I found it kept getting slow, so I was starting new conversations to work around that. Eventually I got suspicious and checked - I'd burned through almost $100 within a few hours.
LibreChat isn't as nice in some areas, but it's much more efficient in this regard.
I do exactly this, use LiteLLM to bridge it. In fact I use LiteLLM to bridge OpenAI and Groq too. Even though OpenWebUI supports them directly, with LiteLLM I can control better which models I see. Otherwise my model list gets cluttered up. I configured this back when OpenWebUI only supported one OpenAI endpoint but I kept using it because it's just quite handy.
And no it doesn't cost extra credits, isn't ignored and doesn't have hardcoded context length. It works perfectly.
Also, it's pretty easy to find unresolved bugs related to openwebui not handling context length parameters correctly - I believe I actually read something from the author saying that this parameter is effectively disabled (for non-local LLMs maybe?).