armank-dev's comments

armank-dev · 2025-09-28T02:55:53 1759028153

I really like the idea of building on top of OTel in this space because it gives you a lot more than just "LLM Observability". More specifically, it's a lot easier to get observability on your entire agent (rather than just LLM calls).

I'm working on a tool to track semantic failures (e.g. hallucination, calling the wrong tools, etc.). We purposefully chose to build on top of Vercel's AI SDK because of its OTel integration. It takes literally 10 lines of code to start collecting all of the LLM-related spans and run analyses on them.

pranay01 · 2025-09-28T05:08:38 1759036118

like that it is based on OTel. can you share the project if it is public?

armank-dev · on Nov 28, 2024

Thanks!

armank-dev · on Nov 28, 2024

This looks great, thank you!

armank-dev · on Nov 28, 2024

I had thought about this but didn't know if it was reliable for large-scale applications. Thank you!

armank-dev · on Nov 28, 2024

Thanks, will look into it

armank-dev · on Nov 28, 2024

Thank you for helping!

> I'm not clear why you are focusing on hashing user ids. Nor how you landed on a 50/50 split

I landed on hashing and splitting from my research on building A/B tools, but none of that research was targeted towards building real, enterprise products (which is why I asked the question here). From your reply, I take it that this isn't as important as I read about earlier?

> When someone logs in, write a record of which one they got.

I'm confused about what you mean by "which one they got". How do I know which version to assign them? This is what I assumed hashing would solve - we'd have a reliable way to "choose" a version for any given user.

> why assign 50% of an entire userbase to a feature being tested that only 10% of the users touch?

This makes sense, I'm not sure why I had landed on 50%. So the percentage difference does not matter? I had assumed that we need a way to enforce a certain percentage split - how do I prevent a "feature" from only reaching 0.01% of the userbase, whereas the other feature reaches 99.99%?

Thanks again for your reply, you've been really helpful.

armank-dev · on Nov 28, 2024

Thanks for responding, I appreciate the insights here! I'm not focused on offering every single feature at the moment. This is a brand-new project and also is not in the same market as the products you have mentioned (AB Tasty, Optimizely, etc.). Those products may offer a lot more than A/B testing, but for my situation, I have a single, clear problem that I want to solve which doesn't require much.

makingstuffs · on Nov 29, 2024

No probs at all mate, feel free to reach out if you need to bounce ideas. As I say I’ve quite a bit of experience from leading out the development offering in an agency and building a bespoke AB Testing IDE.