Hacker Newsnew | past | comments | ask | show | jobs | submit | serced's commentslogin

Yes, I also wonder about this! Progress from children books to scientific papers etc. Could it learn e.g. language structure faster in a pre-training stage? Also somehow one needs to define a proxy to generalization to compute a loss and do backpropagation.


This field of study is known as "Curriculum Learning" for your Googling pleasure (or I guess ChatGPT Deep Research now).


Yeah. This comment is profound to me. The internet works differently with these tools.

I haven't used the deep research features much but their ability to hash out concepts and build knowledge or even provide an amplified search experience is something...


Probably don’t need the name of the field for ChatGPT to get it.


I get why this comment was downvoted but I also get where you're coming from - yes, these models are becoming increasingly intelligent at understanding the nuance and where to look without knowing what to begin searching for.

But the downside is, you end up digging in the wrong direction if you leave it to a generalist system instead of a professional community in some cases which is counter productive.

Getting burnt is a good way to learn not to sometimes though...


I find interesting that their blog post on prompt/context engineering kind of stands against their ultra long system prompt. Maybe it is not too specific as in their visual example (too specific - just right - too vague). https://www.anthropic.com/engineering/effective-context-engi... and the system prompt https://docs.claude.com/en/release-notes/system-prompts#sept...


> This attention scarcity stems from architectural constraints of LLMs. LLMs are based on the transformer architecture, which enables every token to attend to every other token across the entire context. This results in n² pairwise relationships for n tokens.

The n² time complexity smells like it could be reduced by algorithm engineering. Maybe doing a preprocessing pass to filter out attending to tokens (not sure what the right term of art is here) that do not contribute significantly to the meaning of the input. Basically some sort of context compression mechanism.


It's nice to see Claude.md! I checked out the commits to see which files you wrote in which order (readme/claude) to learn how to use Claude Code. Can you share something on that?


The CLAUDE.md file in the repo is basically just the result of the `/init` command. But honestly, on small repos like this, it's not really needed.

Fun fact: I usually have `- Never say "You're absolutely right!".` in my CLAUDE.md files, but of course, Claude ignores it.


I actually put a directive to always reply to me in French just to see if it was reading the rules. Spoiler: it was reading the rules and ignoring the ones that I cared about but it could tell me about it in French so.. victory?

I've only had good experience concluding any prompt with "and don't talk about it" but my colleague says it hampers the agent because talking to itself helps it think. That's not been my experience, and I vastly prefer it not spending tokens I give no shits about


Great to see this from a public institution here! Can you share evals and the tech report? On Huggingface it leads to a 404 for me.


May I ask what part in M&A needs this much data processing? I am quite familiar with the field but did not yet see such tasks.


Zurich IT market died after COVID. Not sure about other hubs.


It died after Google announced layoffs.

The local IT market is quite small and Google is a very large employer here.

Two things had happened then:

1. Other companies froze hiring, because they got scared of „what does FAANG know that we don’t that they’re reducing employment” - no other reason really

2. A few hundred of pretty good Google engineers flooded the small market.

2023 was very rough, things got much better since then though.


Paris market is also dead, only Datadog (and few others) are still recruiting


To get some experience launching webapps that can be put into the play store and play around with image generation models, prompting, I am building an app to generate application/corporate photos from a few non-professional selfies.

Tech stack is probably FastAPI (I mainly know python) and likely nuxt/ionic (none/not much experience). Not sure how the whole hosting, interaction with replicate/huggingface will work on phone apps, payments on stripe without having a company, how to make the webapp into a phone app, etc. It should be a great learning project with the first time scoring an actual sale! Happy to hear early guidance if people have done similar things with python backgrounds to get started.


TIL there is 'vimtutor'. I barely new the basics for quickly creating a file and inserting stuff. Will have a look if there is something to learn there, thanks for the pointer.


In the previous startup I worked at, we set up a PR action that played a celebration thingy after every PR merge. We used https://github.com/leokster/dingdong_sonos to play on the Sonos speakers. Fun little gimmick but not really as a usecase like colors for linting etc.


Hi all I wrote this "blog" Google colab notebook for friends and family who have little technical know-how or do not yet have an understanding of language models. It's based on nanoGPT (a minimal GPT implementation from Andrej Karpathy, ex-OpenAI - not sure whether there is a need to state this here). However, I rewrote the code from scratch, made it more explicit, and have lots of textual information such that everyone should be able to follow what is happening when, and train their own little GPT :) Hope it helps some people and let me know if you have any questions/feedback.

Tl;dr: Blog about how LLMs work & train a mini model without prior knowledge.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: