20 Comments

Effect on energy consumption and climate aside for the moment, what happens to the digital divide (imo widened by LLMs)?

Expand full comment

Not only a great article that I'll be referring to, but some excellent replies within hours of you publishing. Applause for all, incredibly useful!

Expand full comment

It doesn't strike me as likely that such a ruling, if it comes, will be a bolt from the blue. A lower-court could get aggressive, but such a consequential ruling would almost certainly be stayed pending higher-court review. (If an attorney wants to weigh in to rebut this, I'm open-minded.) Some years later, the Supreme Court would render final judgment, in a process so carefully scrutinized that all parties would be more-than-half expecting whatever ruling emerged. Every party would have had ample time to put mitigation mechanisms in place; if the court was obviously hostile, AI firms would undoubtedly be announcing preemptive content agreements sufficient to enable their continued development. It's still a major change, but far less frantic--and I don't see commercial AI going away as a result. Getting more expensive/exclusive? Yes, but that would make a lot of current AI providers happy, not unhappy. Bryan, if you take away the element of surprise and allow for rational people to react rationally, what would you expect to be the most consequential results in this more-accommodationist scenario? The one I see clearly is a massive increase in the digital divide, undoing a generation of efforts to reduce it; any others?

Expand full comment
Jan 21Liked by Bryan Alexander

OpenAI and other LLMs are the inevitable consequence of the commodification of ideas. The fight here is between two corporate models fighting over who owns the ideas of our society. This is a perversion of the original idea enshrined in the US Constitution: To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;

AIs can be one of the most powerful tools for "promote the progress of science and the useful arts" ever created. Fights over copyright kind of miss the point. The point is whether OpenAI has the right to commodify other people's ideas. In doing this, they are following in the venerable tradition of Disney and the content industry that emerged in the 20th century.

Unfortunately, this kind of decision would, most likely, be incredibly destructive. The idea that ChatGPT is constructing new constructions out of existing data is merely an automation of what artists and scholars have been doing for centuries. This kind of decision would also likely break truly open AI models as well because they wouldn't be able to use "proprietary" ideas as well.

Sam Altman and other AI CEOs demanded AI regulation last year. This was a disingenuous ploy to raise the barriers to entry into the field. I think we should take them up on it and regulate to decommodify the models being used. There is a simple expedient to achieve this goal and that is to demand that all models adopt a minimum level of transparency. I don't think that's the kind of regulation they wanted to see.

However, transparency is unlikely to be the regulatory course taken, because the only group that would benefit from such a regulatory model would be the public. Content providers simply want the AI companies to pay "rent" on the ideas they are using (this is already happening as OpenAI has signed agreements to train their models with several content providers already).

The problem with this approach is that it raises the barriers to entry and virtually insures that monolithic corporate models will be the only ones who have access to a significant enough corpus to train powerful models. This also suits the incumbent AI companies because they have the resources to work out these kinds of deals.

The other thing this kind of framework would do is to give advantages to players who are, for one reason or another, not under the US legal umbrella to hack and steal information to create models that are even more powerful than the artificially siloed information models that a pay-to-play system would establish in the US.

The losers here would be the general public and the educational communities that could emerge around AI systems. If the cost to entry rises due to paying "idea fees," you will exclude large chunks of the American populace (as is already happening with the $20 monthly subscriptions). We already suffer under idea fees that hide behind paywalls to scholarly content (this doesn't benefit the authors financially either - still waiting on my check, Elsevier).

The PC revolution was ultimately a democratizing force in our society. The Internet/Web has been a bit more uneven but more and more people have access to it as costs have come down and its necessity has become apparently (we still have work to do there).

These technologies have transformed how we view ourselves. They have opened up vast new possibilities for lifelong learning, employment and entrepreneurship. Holding onto the Disney version of copyright will bankrupt these kinds of systems.

Furthermore, I think that any effort to do this will be futile in the end as economies based on false scarcity inevitably lead to black markets. AI itself will create a disruptive force in these warring economies.

It would be better, however, if we used the current AI moment as an opportunity to rethink how we approach the commodification of ideas. We need to defend open models and push for the transparency in our systems that AI could open up. I'm trying to be optimistic that will happen without too much chaos in the interim.

Expand full comment
author

Tom, this is a fantastic commentary. Want to turn it into a blog post?

"monolithic corporate models will be the only ones who have access to a significant enough corpus to train powerful models" - good warning.

Very good point about the rising costs, too. And how they work against democratic access.

Expand full comment

I am actually working on a blog post that contains some of this stuff but it's still gelling. (Like the AI, I've stolen a lot of these ideas. :-P)

Expand full comment

Excellent article Bryan! My guess is that the legal system's naïveté about AI will allow a bunch of alternative paths for AI companies. For example, now that LLMs are built, AI developers can do transfer learning (similar to its meaning in human learning) by taking a portion of a previously built AI to seed the construction of a new one. OpenAI could do so even by using a competitors model to start (e.g. open-source LLAMA) and then adding a much smaller amount of data on top of it. Legally, it seems to me each AI company would need to be separately sued to prevent this transfer learning from happening. The delay alone from appeals, etc. in the legal process would hold a final decision up for years. By that time, AI companies probably hope they will understand better what's happening inside the AI models and just use transfer learning to pluck submodels from larger networks and construct things that way. The genie is out of the bottle, and I don't see any way to put it back in.

Expand full comment
author

Oh, good points, Tim.

Fun question: how much data would one have to add to a preexisting dataset to pass muster? I expect lawyers and judges will start thinking of test rules.

"Legally, it seems to me each AI company would need to be separately sued to prevent this transfer learning from happening."

Agreed. Perhaps the way music and movie lawsuits proceeded after Napster is an example. A bunch of takedown notices, then tech responses.

"The delay alone from appeals, etc. in the legal process would hold a final decision up for years."

I think so. What happens if s judge orders weights and software deleted?

"The genie is out of the bottle, and I don't see any way to put it back in."

I agree, but I'm not sure how things will develop with the genie. How much open source work should we see, for example?

Expand full comment

Re: "how much data would one have to add to a preexisting dataset to pass muster?", I'm not sure what you're asking. If you mean how much data would need to be added to a preexisting AI model, then that depends whether the model is used as a whole or in part, how much the model being transferred was trained on, etc. My limited experience is that orders of magnitude less data is often necessary w/ transfer learning.

Expand full comment

Call me cynical if you wish but I have to say that, like every other illegal activity on the internet, the criminal enterprise will take it forward anyway. Therefore any court of law needs to take that into consideration when making a decision.

I understand that the New York Times is complaining, this is the newspaper that covered up communism’s atrocities and praised a certain German dictator.

Expand full comment
author

Good point on the NYT, indeed.

Illegal LLMs... I started writing a post on this but need to do more research. I'm thinking of Sci-Hub as an example.

"any court of law needs to take that into consideration when making a decision" - yes, and this is also where governments can leap in with regulations.

Expand full comment

Sci-Hub and LibGen is exactly where my mind went (given my dissertation is focusing on scholarly us of that)...ChatGPT gives proof of concept and open source LLMs given means...

in that scenario, when LLMs go underground, I think we further shoot ourselves in the foot because it will make it that much harder to know and understand how people are using it and where LLM outputs are being used.

It's like the cat's out of the bag, but then, we don't actually have any sense of where the cat went....

Expand full comment
author

Excellent point about underground LLMs being harder to track.

I still wonder if higher ed will take open source LLMs seriously.

Expand full comment

I would expect interest in open source LLMs to grow to substantial levels in the mainstream only when people involved in creating LLMs start to feel more real pain (vs conceptual/intellectual pain) over the shortcomings of the market options. Right now they're the shiny new toys, and the concentrations of capital they enjoy mean they'll continue to out-develop the open-source alternatives for some time to come. Open source thrives best amidst market failure; this market is too new for its failures to be clear yet, or deeply felt. There are some niches that already qualify, particularly AIs serving marginalized communities. But mainstream? That's going to take a while, IMO.

Expand full comment
Jan 21Liked by Bryan Alexander

I appreciate this essay on the possible implications to generative AI tools and challenges to copyright infringement. I would add that this has bigger implications possibly as well to AI powered business models, such as those from Amazon and Spotify.

Expand full comment
author

Gerald, that's a good point. Do we have decent knowledge about what Spotify and secretive Amazon are up to with LLMs?

Expand full comment
Jan 21Liked by Bryan Alexander

I am not sure. My understanding has been shaped by Cory Doctorow's recent books: especially The Internet Con and Chokepoint Capitalism.

Expand full comment
author

I'll keep poking around and see what I can learn.

Thank you for the pointer.

Expand full comment
Jan 21Liked by Bryan Alexander

Sure thing. As will I. There is another piece of this which are the proprietary education efforts by folks like Sal Kahn and others. Are there copyrighted materials feeding their models as well?

Expand full comment
author

That's a good question. Let me check.

Expand full comment