https://talimio.com/ Generate fully personalized courses from a prompt. Fully interactive.
New features shipped last month:
- Adaptive practice: LLM generates and grades questions in real-time, then uses Item Response Theory (IRT) to estimate your ability and schedule the optimal next question. Replaces flashcards; especially for math and topics where each question needs to be fresh even when covering the same concept. - Interactive math graphs (JSXGraph) that are gradable - Single-image Docker deployment for easy self-hosting
i was delighted to see your comment at top... I am working on the exact same thing, generating concept DAGs from books and letting a tutor agent use it for structure and textbook reference.
can we discuss this somewhere else?
https://talimio.com/ Generate fully personalized courses from a prompt. Fully interactive.
New features shipped last month:
- Adaptive practice: LLM generates and grades questions in real-time, then uses Item Response Theory (IRT) to estimate your ability and schedule the optimal next question. Replaces flashcards; especially for math and topics where each question needs to be fresh even when covering the same concept. - Interactive math graphs (JSXGraph) that are gradable - Single-image Docker deployment for easy self-hosting
The IRT angle is interesting — most adaptive learning tools just do basic spaced repetition, but using Item Response Theory to estimate ability level in real-time is a much more honest approach to "personalized." The JSXGraph integration for gradable math graphs is a nice touch too, that's a hard problem. Quick question: how do you handle subjects where the "right answer" is more ambiguous? Does the LLM grading struggle with open-ended questions outside of math?
yeah we use an LLM for the grading .. (for the free form questions)
the flow is basically:
When practice questions are generated, the model generates the question + the reference answer together, but the user only sees the question. then on submit, a smaller model grades the learner answer against that reference answer + the grading criteria.
I benchmarked a bunch of judge models for this on a small multi-subject set, and `gpt-oss-20b` ended up being a very solid sweet spot for quality/speed/structured-output reliability. on one of the internal benchmarks it got ~98.3% accuracy over 60 grading cases, with ~1.6s p50 latency, so it feels fast enough to use live.
for math, it’s not just LLM grading though:
- `SymPy` for latex/math expressions, so if the learner writes an equivalent answer in a different form, it still gets marked correct; so `(x+2)(x+3)` and `x^2 + 5x + 6` can both pass. (but might remove that one since it might be easily replaced by an LLM? And it's a niche use that add some maintenance cost)
- tolerance-based checks for the JSXGraph board state stuff; so on the graph if you plotted x = 5.2 instead of 5.3 it will be within the margin of error to pass but will give you a message about it
I also tried embedding/similarity checking early on, but it was noticeably worse on tricky answers, so I didn’t use that as the main path.
Google Workspace API(s) keys and Roles was always confusing to me at so many levels .. and they just seem to keeping topping that confusion, no one is addressing the core (honestly not sure if that is even possible at this point)
I switched from using YouTube to invidious mainly because they don't support shorts and blocked YouTube on the DNS level, it's a bit slower, but I know I won't be sucked into doom-scrolling
This is model 12188, which claims to rival SOTA models while not even being in the same league.
In terms of intelligence per compute, it’s probably the best model I can realistically run locally on my laptop for coding. It’s solid for scripting and small projects.
I tried it on a mid-size codebase (~50k LOC), and the context window filled up almost immediately, making it basically unusable unless you’re extremely explicit about which files to touch. I tested it with a 8k context window but will try again with 32k and see if it becomes more practical.
I think the main blocker for using local coding models more is the context window. A lot of work is going into making small models “smarter,” but for agentic coding that only gets you so far. No matter how smart the model is, an agent will blow through the context as soon as it reads a handful of files.
What are you talking about? Qwen3-Coder-Next supports 256k context. Did you wanted to say that you don't have enough memory to run it locally yourself?
New features shipped last month:
- Adaptive practice: LLM generates and grades questions in real-time, then uses Item Response Theory (IRT) to estimate your ability and schedule the optimal next question. Replaces flashcards; especially for math and topics where each question needs to be fresh even when covering the same concept. - Interactive math graphs (JSXGraph) that are gradable - Single-image Docker deployment for easy self-hosting
Open source: https://github.com/SamDc73/Talimio
reply