Hacker Newsnew | past | comments | ask | show | jobs | submit | SamDc73's commentslogin

https://talimio.com/ Generate fully personalized courses from a prompt. Fully interactive.

New features shipped last month:

- Adaptive practice: LLM generates and grades questions in real-time, then uses Item Response Theory (IRT) to estimate your ability and schedule the optimal next question. Replaces flashcards; especially for math and topics where each question needs to be fresh even when covering the same concept. - Interactive math graphs (JSXGraph) that are gradable - Single-image Docker deployment for easy self-hosting

Open source: https://github.com/SamDc73/Talimio


Very cool! I'm chaperoning a python club for teens these days, I should make use of this concept!

i was delighted to see your comment at top... I am working on the exact same thing, generating concept DAGs from books and letting a tutor agent use it for structure and textbook reference. can we discuss this somewhere else?

yeah sure, my email is in the bio

Is this launched? Looks cool, but you should add a privacy policy.

It's -kind of- launched, still have couple of things to tight.

And will add a privacy policy by the end of the day, thank you for point that one out


https://talimio.com/ Generate fully personalized courses from a prompt. Fully interactive.

New features shipped last month:

- Adaptive practice: LLM generates and grades questions in real-time, then uses Item Response Theory (IRT) to estimate your ability and schedule the optimal next question. Replaces flashcards; especially for math and topics where each question needs to be fresh even when covering the same concept. - Interactive math graphs (JSXGraph) that are gradable - Single-image Docker deployment for easy self-hosting

Open source: https://github.com/SamDc73/Talimio


The IRT angle is interesting — most adaptive learning tools just do basic spaced repetition, but using Item Response Theory to estimate ability level in real-time is a much more honest approach to "personalized." The JSXGraph integration for gradable math graphs is a nice touch too, that's a hard problem. Quick question: how do you handle subjects where the "right answer" is more ambiguous? Does the LLM grading struggle with open-ended questions outside of math?

yeah we use an LLM for the grading .. (for the free form questions)

the flow is basically:

When practice questions are generated, the model generates the question + the reference answer together, but the user only sees the question. then on submit, a smaller model grades the learner answer against that reference answer + the grading criteria.

I benchmarked a bunch of judge models for this on a small multi-subject set, and `gpt-oss-20b` ended up being a very solid sweet spot for quality/speed/structured-output reliability. on one of the internal benchmarks it got ~98.3% accuracy over 60 grading cases, with ~1.6s p50 latency, so it feels fast enough to use live.

for math, it’s not just LLM grading though:

- `SymPy` for latex/math expressions, so if the learner writes an equivalent answer in a different form, it still gets marked correct; so `(x+2)(x+3)` and `x^2 + 5x + 6` can both pass. (but might remove that one since it might be easily replaced by an LLM? And it's a niche use that add some maintenance cost)

- tolerance-based checks for the JSXGraph board state stuff; so on the graph if you plotted x = 5.2 instead of 5.3 it will be within the margin of error to pass but will give you a message about it

I also tried embedding/similarity checking early on, but it was noticeably worse on tricky answers, so I didn’t use that as the main path.


The Stargate money didn’t show up I guess, and now the whole gridlock is collapsing?

I find https://github.com/steipete/gogcli a bit easier (but still confusing to setup)

Google Workspace API(s) keys and Roles was always confusing to me at so many levels .. and they just seem to keeping topping that confusion, no one is addressing the core (honestly not sure if that is even possible at this point)


pre-ai if I had to include Google search queries in a commit, I’d be so embarrassed I’d probably never commit code like ever


Didn't Dario Amodei ask for more government intervention regarding AI?


Not a contradiction with this post


I highly suspect he might even consider Anthropic since they enforced restrictions at some point on OpenClaw form using there APIs


yes that's the blunder I'm talking about


I switched from using YouTube to invidious mainly because they don't support shorts and blocked YouTube on the DNS level, it's a bit slower, but I know I won't be sucked into doom-scrolling


I mean they are only running a small version of codex can they run the full one? Or the technology isn't there yet?


1000 tokens/sec for a highly specialised model is where we are going to see agents requiring.

Dedicated knowledge, fast output, rapid iteration.

I have been trying out SMOL models as coding models don't need to the full corpus of human history.

My most recent build was good but too small.

I am thinking of a model that is highly tuned to coding and agentic loops.


This is model 12188, which claims to rival SOTA models while not even being in the same league.

In terms of intelligence per compute, it’s probably the best model I can realistically run locally on my laptop for coding. It’s solid for scripting and small projects.

I tried it on a mid-size codebase (~50k LOC), and the context window filled up almost immediately, making it basically unusable unless you’re extremely explicit about which files to touch. I tested it with a 8k context window but will try again with 32k and see if it becomes more practical.

I think the main blocker for using local coding models more is the context window. A lot of work is going into making small models “smarter,” but for agentic coding that only gets you so far. No matter how smart the model is, an agent will blow through the context as soon as it reads a handful of files.


The small context window has been a recognized problem for a while now. Really only Google has the ability to use a good long context window


you should look into using subagents, which each have their own context window and don't pollute the main one


What are you talking about? Qwen3-Coder-Next supports 256k context. Did you wanted to say that you don't have enough memory to run it locally yourself?


Yes!

I tried to go as far as 32k on the context window but beyond that it won't be usable on my laptop (Ryzen AI 365, 32gb RAM and 6gb of VRAM)


You need minimum ie. 2x 24G GPUs for this model (you need 46GB minimum).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: