Hacker Newsnew | past | comments | ask | show | jobs | submit | feliixh's commentslogin

2,3,8,9


Okay, so number 8 got me...


I'm building a catalog for health care price transparency data that aggregates the rates published by all insurers, to put everything in one place and make it easier for developers / researchers to access this data. https://www.accessmrf.com/


Very useful!


I'm building a catalog for health care price transparency data that aggregates the rates published by all insurers, to put everything in one place and make it easier for developers / researchers to access this data. https://www.accessmrf.com/


Great job, I intend to reproduce this on a similar dataset I've been collecting!

I will say, it would be great to see the color labeling done on domain url alone, to see how much of the topography of the map is driven simply by the different formatting characteristics of the websites you're gathering data from.


Def a psyop. Classic Catalonian move of seizing independence by writing self determination laws in code and carefully introducing a bug that they can then exploit to secede.


One thing I think will dominate in the future is to write software documentation geared towards the easy understanding of it by LLMs, with documentation possibly including a fine-tunning dataset with which a model can be tested for proficiency in using that particular tool (like OpenAI Evals). Software will be written to be used by humans through LLMs because humans will code in natural language, and not in the language of your interface.


I’m looking forward to the future of debugging how that pesky payment vanished into thin air despite the money being deducted from the account using code that’s just english writing!


Haha, fair point, what I really meant is that LLMs will translate natural language to code, so building will be mostly in English while debugging will still happen in code.


+1


I don't think that's so clear. When you train a deep learning model you are making it extract the gist or insight of many works and then use that pattern to produce new works. While the NN does not experience the work like a human it is definitely not memorizing.

A silly example. Making GPT write a rap battle between Keynes and Mises goes beyond a performative remix, it is transformational work, nothing is copied explicitly. If a human were to write it that would not violate copyright.

I think that to tackle this we need a new lens other than copyright in the long term.


You’re right that it’s not so clear, perhaps I overstated for brevity. I don’t actually think requesting permission is absolutely necessary, what I really think is that there aren’t good reasons AI people shouldn’t at least first try to establish training sets that are unambiguously legal, either through use of public domain work, or through an actual attempt to curate licensing models that allow re-use. We have plenty of precedent for doing this, so people claiming they should have access to everything without permission strikes me as lazy. There’s also the problem that the AI winners already are, and will continue to be, the monopoly tech and media companies who stand to make handsome profits off of the results of their trained networks. Even if you believe the results of their tech is “transformational”, there is no question that it wouldn’t work at all without access to the source material.

The argument that NNs aren’t memorizing is definitely debatable and not necessarily true. They are designed to memorize deltas and averages from examples. They are, at the most fundamental level, building high dimensional splines to approximate their training data, and intentionally trying to minimize the error between the output and the examples. It’s fair to say that “usually” they don’t remember any single training sample, but it’s very easy for NNs to accidentally remember outliers verbatim. The whole reason the lawsuits mentioned in the article are happening is because we keep finding more and more examples where the network has reproduced someone’s specific work in large part. If we’re going to claim that today’s AI is producing original work, then we have to guarantee it, not just assert that it doesn’t usually happen.

> a rap battle between Keynes and Mises goes beyond a performative remix, it is a transformational work, nothing is copied explicitly.

I don’t buy that the work can be called transformational just because the remix doesn’t have any recognizable snippets. GPT is in fact copying individual words explicitly, and it’s putting words together by studying the statistical occurrence of words in context of other words.

> I think that to tackle this we need a new lens other than copyright

I totally agree with that. This question is legitimately hard. We do need a new lens, but we might have to keep and respect the old one too at the same time. I feel like AI work should acknowledge that difficulty and step up to lead the curation of training sets that are legal wrt copyright by design, rather than ignoring the concerns of the very people who made the work they are leveraging.


Automate the boring stuff with Python


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: