Hacker Newsnew | past | comments | ask | show | jobs | submit | 88j88's commentslogin

Something very similar I was experimenting with on, but had different results that you may be interested in, some of my findings were interesting

This was part of testing out how well a tool of mine worked (github.com/jsuppe/loom), which aims to be used to extracts requirements, specs, creates tests. At first I had no intention of using it for code generation but then tried it out with some early success. I tried splitting the work by using the tool with different frontier models, and then providing work to a local ollama instance running one of several models. Not all local models had the same outcome, not all coding languages had the same outcome. I also found in this experiment, when nailing down the coding tasks I wanted to set up positive and negative scenarios- which is where I found setting guardrails can sometimes backfire with inversion- this essentially elaborates on previous work by Khan 2025 (https://arxiv.org/abs/2510.22251); the most interesting finding to me was that if you give guardrails with a rationale, it reduces compliance and may cause the inversion

For coding tasks I found that the improvement was not only ability to use a lower cost model for these broken down tasks, but wall clock time was improved over using frontier model alone, with equivalent outcomes.


I've had a few reversions as well along the way, including in upcoming v0.7.0 patch. Some models benefitted, others regressed - overall better on harder scenarios or I wouldn't be releasing, but yeah - not intuitive.

The biggest challenge has been balancing the desire to hyper optimize for my favorite models, versus average behavior, versus consumer needs.


100% I found that you think you are smarter than the LLM and knowing what you want, but this is not the case. Give the LLM some leeway to come up with solution based on what you are looking to achieve- give requirements, but don't ask it to produce the solution that you would have because then the response is forced and it is lower quality.


100% dependent on the person driving it


Smart glasses featuring cameras, a control bracelet, and in-lens displays represent significant technological progress with particularly valuable applications for people with disabilities. The non screen version could be transformative for blind users, while the display equipped model offers great potential for the deaf community. However, there's a notable double standard in social acceptance: while these devices are welcomed when serving accessibility needs, they face resistance when used recreationally, reflecting society's discomfort with wearable recording technology in casual social settings.


Why does this read like AI slop?


The response he received had a correction to the code that the user did not expect.


kinda makes sense to not develop a whole other os just for Chromebook when they have all the engineering effort for Android.


Is this all real, or faked a-la gemini?


Remember when Zoom lied about having strong encryption, and sharing data without permission? https://arstechnica.com/tech-policy/2021/08/zoom-to-pay-85m-...


Remember when Zoom lied about having strong encryption, and sharing data without permission? https://arstechnica.com/tech-policy/2021/08/zoom-to-pay-85m-...


Some years before CUDA there was a lot of hype when the first GPGPU papers published in 2003 which showed significantly increasing performance using parallel computation from consumer graphics cards. At the time, it looked like competing on general purpose computation was a solid strategy: multi-core CPU from intel was still years away, showing up in 2005; starting from 2000 the rate of increase of clock speeds started slumping. We saw Intel started releasing more variants of processors, but the clock speeds weren't advancing exponentially anymore. The new battle for core supremacy was on the horizon.


I have papers collecting digital dust already doing compute with GPGPU assembly language.

We already knew some of the possibilities when looking at Renderman, or early GPGPU attempts like the TMS34010.


I must have missed the fake stories. I just saw the ones about a school bus sized metal structure hovering over the US sent by China, endangering people below it.


Yes, I have just searched through CNNs archives. Apparently CNN has never accused China of using balloons to spy on the US. Apparently we were always at war with Eurasia.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: