This is cool and fixes a very big need I have. I often use Comfy as a shortcut to generating content in a larger pipeline and the JSONified API is OK, but not the easiest to use. Node graphs are just complicated and difficult to edit in the JSON form. Love this! Now if someone would point me toward a way to better pass binary data to/from comfy I'll be a happy camper.
We don't train any models, we're just bringing existing AI models to the VFX workflow. The choice of models and technologies is up to the individual artists .
We're working on demo videos and will be releasing regular content updates at our Youtube channel https://www.youtube.com/@deepmakeai soon. Bear with us as we get everything up and running.
You can still try it with Llama, and no it wasn't the full text of the page, or even very accurate. Even the "popular" quotes were VERY likely to be paraphrased and missing any poetry or cadence of the original.
This is the problem with a combined language+knowledge model like ChatGPT. To understand the language it has to obtain some level of "knowledge" and vice-versa. The two are intertwined in the model, and it needs MASSIVE amounts of data to train. Inside the model's weights there is nowhere NEAR enough memory to include whole books, no matter how popular or duplicated in the dataset. Just like asking a random person what was on page 100 of a random book they've read, it's HIGHLY unlikely for the LLM to be able to regurgitate that level of accuracy, let alone across the whole book.
Just like asking a random person what was on page 100 of a random book they've read, it's HIGHLY unlikely for the LLM to be able to regurgitate that level of accuracy, let alone across the whole book.
Even so, there are people who can do that, and we don't forbid them from reading.
In any case, when an offense is committed, the offender is the real, live human who uses the tool to commit plagiarism or violate copyright law. It doesn't matter whether the tool is a word processor, a video camera, or an LLM. The output is what matters, not the input.
"Cliff Note" style content sometimes outsells the content they're summarizing. LLMs aren't a new problem, the internet did that already. In fact, they're really LESS likely to provide a large amount of the original content.
I do agree on the fact that the current laws aren't going to work for this context, especially bad is trying to fit the new challenges to copyright laws.
In fact, it's the same legal team basically making the same argument again. They're just repeating the same play hoping to get more chances at the huge nest-egg that OpenAI has.
You know what else stores nearly verbatim copies of texts and then regurgitates those to the public often including direct quotes from the text? Cliff Notes.
Those aren't copyright violations. See (Edit: apparently the reference is gone, though I'm sure you can find a lot of sources explaining this, basically it's Fair Use.) for a great in depth analysis of the legality.
Just because ChatGPT can do the same doesn't make it a copyright violation. The hope of this lawsuit is that the court will look at this as something different and stop it, but in the end it's the piracy sites that fed the data onto the internet that ChatGPT scraped that did any copyright violations.