Hacker Newsnew | past | comments | ask | show | jobs | submit | Dumbledumb's commentslogin

Wouldnt the margin be higher? All other models being moved from unquantized to quantized would lower their performance, while bonsai stays. I get what you see if it was in regards to score/modelsize, but not for absolute performance


The metric they're selling this on is intelligence per byte, rather than total intelligence. So, if they used the quantized competing models, the intelligence per byte gap shrinks, because most models hold up very well down to 6-bit quantization, and 4-bit is usually still pretty good, though intelligence definitely tends to fall below 6-bit.

Nonetheless, the Prism Bonsai models are impressive for their size. Where it falls apart is with knowledge. It has good prose/logic for a tiny model, and it's fast even on modest hardware, but it hallucinates a lot. Which makes sense. You can't fit the world's data in a couple of gigabytes. But, as a base model for fine-tuning for use cases where size matters, it's probably a great choice.


unfortunately, there doesn't seem to be a clear way to fine-tune these models yet. excited for when that happens though.


I definitely do this, along with the compulsion sometimes to tell the agent how a problem was fixed in the end, when investigating myself after the model failing to do so. Just common courtesy after working on something together. Let’s rationalize this as giving me an opportunity to reflect and rubberduck the solution.

Regarding not just telling „try again“: of course you are right to suggest that applying human cognition mechanisms to llm is not founded on the same underlying effects.

But due to the nature of training and finetuning/rf I don’t think it is unreasonable that instructing to do backwards reflection could have a positive effect. The model might pattern match this with and then exhibit a few positive behaviors. It could lead it doing more reflection within the reasoning blocks and catch errors before answering, which is what you want. These will have attention to the question of „what caused you to make this assumption“, also, encouraging this behavior. Yes, both mechanisms are exhibited through linear forward going statical interpolation, but the concept of reasoning has proven that this is an effective strategy to arrive at a more grounded result than answering right away.

Lastly, back to anthro. it shows that you, the user, is encouraging of deeper thought an self corrections. The model does not have psychological safety mechanisms which it guards, but again, the way the models are trained causes them to emulate them. The RF primes the model for certain behavior, I.e. arriving at answer at somepoint, rather than thinking for a long time. I think it fair to assume that by „setting the stage“ it is possible to influence what parts of the RL activate. While role-based prompting is not that important anymore, I think the system prompts of the big coding agents still have it, suggesting some, if slight advantage, of putting the model in the right frame of mind. Again, very sorry for that last part, but anthro. does seem to be a useful analogy for a lot of concepts we are seeing (the reason for this being in the more far of epistemological and philosophical regions, both on the side of the models and us)


Would staying at an LTS version instead of running my production workloads on the bleeding edge also be free-riding, because I am depriving the community of my testing?


Because I think precisely the indie hacker community is not as keen to default to the big-tech stacks, because those are neither indie, nor hack-y :)


Then who and in which scenarios?


Getting to know the views and values of your date is not a weird thing to do on the first date. If it’s a question that annoys them, they should consider why.


I like it, putting established knowledge in skill form is not difficult or great innovation, but definitely useful and worthy to share. I would critique one or two things: For skills like this, which are intended to be shared and but at the same time not complete "frameworks", I think skills should be atomic. For this i would split it into the technical writing guidelines (tone, grammar, etc) and the workflow (drafts, review, the specific folders). Agents should be able to discover and load both without a problem, but it makes it easier for other people to pick-and-chose.

Secondly, I'd combine one/two into one markdown file each. I don't want the reviewer / writer to selectively read those, since a review always needs to apply all of them. I get that the goal of scoping the review procedure into 10 individual steps is to create more focus on each task by giving it its own procedure step, but in my experience doing small focused steps like that will lead to much longer review times and worst case a very fragmented text, because small edits are applied on top of each other, without considering the big picture. A recent LLM with sufficient reasoning should be able to apply all rules in one go.


Thanks for feedback! I’ll incorporate this


I have been using parakeet TDT v3 with just 0.6B params and its insanely fast (feels instant, even on M1 Air). The accuracy is all I could ask for - I dont see the benefit of a much larger 4B model?

Not knocking your app, but asking before your app seems very focused on one model, while others allow the user to pick according to their needs.


In legal and public opinion distributions and authorship might not be looked at with such a technical lens, especially in a country trying to ban encrypted communications. A muddying between the two could easily be constructed intentionally, or unintentionally by ignorance of executive and judicial powers.


As I mentioned, if you are inconvenient to your government in an authoritarian state, they will not bother with technicalities to get rid of you.

Other people distributing code that you once authored will not stop by them getting rid of you.


So, I can’t use my legal name as a username because some random town with a few thousand people is named the same?


That would depend on the folks implementing the API

In it's current state, I'd look at the API to check for reserved / premium names (or something that's profane).

If it makes sense contextually: imagine if you were building the next Twitter. I'm guessing you'd want to have a way to charge for premium names and in-turn need a way to detect what's premium. For the most part, first and last names are pretty premium and people pay (they do!) for such usernames.


In Chapter D.7 they describe: "The complex reflection in water is interpreted by the network as a distant mountain, therefore the water surface is broken."

This is really interesting to me because the model would have to encode the reflection as both the depth of the reflecting surface (for texture, scattering etc) as well as the "real depth" of the reflected object. The examples in Figure 11 and 12 already look amazing.

Long tail problems indeed.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: