14 Comments

Taking over the wires today this OpenAI admission will impact all aspects of the field. It brings into the complex legal issues US and international copyright laws...

__

OpenAI says it’s “impossible” to create useful AI models without copyrighted material

"Copyright today covers virtually every sort of human expression" and cannot be avoided.

https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material

Expand full comment

One fo the more interesting left-field comments:

"The only solution I really see is to tax the commercial use of AI, and use that money to pay for a minimum income for human beings.

It's the only way to solve the copyright issue, and the issue of AI replacing human workers.

You can't possibly train AI without using copyrighted works just like you'd be hard pressed to really learn about the world if you weren't allowed to watch or to see copyrighted works.

I also don't think one person deserves more protection than others because the training data is so massive. It relies on human produced work. It doesn't matter its an artistic photograph of an apple or a regular old photograph of an apple."

Expand full comment

At the very least, maybe we can get some money for Humanities, and start digitizing dissertations!

Expand full comment

I'm all for more dollars for the humanities!

Expand full comment

"The hardest part of this problem is getting comprehensive digital access to all potential sources. The LLM you use to execute the validation will need to access a lot of material that is likely not available openly on the internet but is in Jstor, special collections, etc."

This wouldn't work for a commercial project, but if a tool were built by those with an axe to grind against the elitist tier of academia, they might use Sci-Hub.

Expand full comment

That's a good point. Independent operators also might use it.

Expand full comment

Yes, I believe it will be diluted. It's only remarkable when a person is uniquely responsible or exceptional. If it's demonstrated through today's AI systems that what we call "plagiarism" is in fact commonplace, there will be less reason to make an issue of it. I think widening the review to a vast number of dissertations is likely to show it is commonplace. It is also likely that it frequently happens without the willful intention of the author. As long as the author is willing to go back and give proper attribution when it's called to their attention, there should be no issue, in my opinion.

Expand full comment

Does your university subscribe to Turnitin's iThenticate product? If you are unfamiliar with it, it is a beefed up version of Turnitin to handle long-form content and has a much deeper database of scholarly papers and books. It is used by some publishers to check for plagiarism. At my institution it is available for faculty and graduate students working in their labs to check their own work before submission to a publication. If use of it became mandatory (I know that would be a fraught issue) it would go a long way towards mitigating the effects of Ackman's threats.

The long term should see a lot more education of the public on plagiarism, which as Sarah Eaton noted yesterday, has no uniform definition across institutions, and as she had previously noted, is likely to be significantly redefined in the era of AI.

Expand full comment

Guy, I actually don't know if Georgetown does.

Check the Ian Bogost link up above for his use of it.

Expand full comment

The more broadly plagiarism is identified, the less interest it will gain.

Expand full comment

Why is that, because it'll seem too diluted?

Expand full comment

Bryan, I just read a Gary Marcus post (https://open.substack.com/pub/garymarcus/p/the-desperate-race-to-save-generative?r=2whxz&utm_medium=ios&utm_campaign=post) and got to thinking about everything that is swirly around plagiarism and intellectual property these days. Consider how these AI problems with plagiarized content and the NYT lawsuit against OpenAI and MS might interact with the kind of mass plagiarism detection of all faculty that Ackman is threatening. Maybe plagiarism and intellectual property law are going to have a moment. If Ackman turns up the heat on plagiarism and makes it a major issue, does that raise the profile sufficiently with the public to turn against OpenAI and MS to a greater degree, or does it overload the public on the issue and make things easier for OpenAI/MS? Maybe they don't really interact in the public mind? Maybe, and this is a very long shot, the public becomes more sophisticated about intellectual property and makes things harder for all involved? The longest shot would be that the public not only becomes more sophisticated but demands changes to intellectual property rights that gut the entertainment industry, publishing, social media, and AI.

Expand full comment

That's a very good point, Guy. 2024 as the year of IP.

I'll have to see if I can do some work on this.

Expand full comment

Much might depend on timing. One big release of accusations held until something else blows up as an issue for colleges might be very damaging. A trickle, less so, but it will play into the general distrust of colleges and universities, which is already pretty damaging.

Expand full comment