How might the Ackman plagiarism campaign play out?
First thoughts about a new academic use of AI idea
Greetings, fellow denizens of the year 2024. I have a bunch of posts in the pipeline, but today I wanted to share thoughts about a sudden new development in artificial intelligence and academia.
This weekend Bill Ackman, a hedge fund manager, Harvard University graduate, and activist, proposed that someone set up an AI-backed effort to check elite university professors’ scholarship for plagiarism.
Why does this matter?
On the face of it Ackman’s call might amount to nothing. He’s just one person, after all, and has no academic position. His business doesn’t work with academia, unlike, say, a publisher. Anyone can Tweet/X out a big idea and find nobody interested.
On the other hand, Ackman’s been a major player in the downfall of Claudine Gay from her Harvard presidency; that success won him allies as well as media attention. Ackman is also a billionaire capable of funding all kinds of projects. (This also raises the topic of elite donors influencing higher education, which is a topic for another post.) And a story about his wife’s alleged plagiarism and Business Insider has given him even more incentive to act.
So what might this mean? Here are some quick thoughts from your higher education futurist. They draw in part on online conversations from the day, some public and others not.
Who might heed the call?
The best approach, however, is probably to launch an AI startup to do this job (I would be interested in investing in one) as there is plenty of work to do, and many institutions won’t have the resources to do it on their own. Perhaps more importantly, the donors are going to demand that the review is done by an independent third party.
I can imagine various people who might devote resources to creating such a targeted plagiarism engine. Given some of the politics around Harvard in particular and higher education in general, we could easily see Republican party politicians openly or quietly authorizing such a thing. GOP funders might do something similar. And ambitious Republican operatives of any kind - consultants, local organizers, commentators - could try to set up such a machine to win credit within the party. Again, people could do this surreptitiously or openly.
Beyond party politics, other reasons could mobilize other people. Some people might sense a business model in the offing and form a startup. (Have any formed today?). An existing business might do the same if they have the technical capacity. (I asked Turnitin about this on Twitter/X but haven’t heard back yet.) Alternatively, a freelance group including some technologists might build a faculty plagiarism detector for the intellectual challenge or just to stir things up. (Perhaps there’s a -chan group already at work.). Former university system president Wallace Boston imagines a team with librarians:
I'd build it with a team of research librarians capable of locating the electronic copies of journal articles and books from citations. Some may be capable of loading content into Turnitin as well as interpreting results. May be better to build a Turnitin team given the volume.
Would any academics consider building a plagiarism detector in-house? It might sound perverse, yet there might be an incentive to do it inside the tent, as opposed to outside the tent, ah, detecting in, as the old saying goes. A department head or dean might prefer to have such a tool emerging organically and to use it to fend off challenges.
John Warner thought this unlikely:
How might it work? This is a trickier question to answer, as this is such a tentative idea. For example, the scope of such a plagiarism detector is unclear. Would it focus on Ivy League universities or the Ivy+ or even more institutions? Ackman’s tweet offered an initial list:
why would we stop at MIT? Don’t we have to do a deep dive into academic integrity at Harvard as well? What about Yale, Princeton, Stanford, Penn, Dartmouth? You get the point.
Yet later on he massively expands scope:
Now that we know that the academic body of work of every faculty member at every college and university in the country (and eventually the world) is going to be reviewed for plagiarism…
I’m also curious about the people whose work would be examined. Would it be restricted to tenured faculty and those on the tenure-track? Would full time term appointees also be subjects? What about adjuncts, the largest faculty population? And, of course, senior administrators will likely appear for scrutiny. Ian Bogost quotes one plagiarism expert: “If I were a school looking to appoint a new president…I’d consider doing this kind of analysis before doing so.”
Which technologies would such a project use? Picking a commercial AI product like Bard or ChatGPT has the advantage of rapid startup. Alternatively, a team might prefer to use an open source version generative AI app to have more control over the product. Which databases would they use and how? I wonder if Bard connects smoothly with Google Scholar on this score.
This brings us to another issue, giving the AI access to the full range of scholarly sources. As Eric Harper observes,
The hardest part of this problem is getting comprehensive digital access to all potential sources. The LLM you use to execute the validation will need to access a lot of material that is likely not available openly on the internet but is in Jstor, special collections, etc.
Assembling these sources might be the function of part of a plagiarism detection team. Would they partner with scholarly publishers? What role would third parties like JSTOR or the Internet Archive play?
Moreover, if we imaging multiple plagiarism projects revving into life, I’m curious how they might relate to each other. They might compete on speed, quality, and reach. If they adopt a collaborative attitude, they might divvy up targeted universities or academic disciplines. Team 1 goes after Yale, Team 2 hits Princeton. Or Team A works on the humanities, while Team B focuses on the natural sciences. If this lasts long enough the providers might form new organizations, like a trade association.
Yet what about the problems large language models have with getting things wrong/generating hallucinations? Computer scientist Valerie Barr sees this as a major problem:
I'm amazed to see such deep trust in AI since we know from numerous other examples that it is completely unwarranted.
Which is very true. How will these projects control for quality - perhaps with the aforementioned librarians, or technologists, or through a software solution? Maybe generative AI’s reputation for making stuff up will weaken the appeal of any plagiarism detector.
How might academics and institutions respond? I don’t think it’s a stretch to envision people and schools denounce the plagiarism efforts. Some will argue that it’s intrusive, as John Warner does above. Some will defend the academy, pointing to various structures designed to check plagiarism which are already in place: peer review, editors, academic departments making hiring/tenure/promotion decisions. I expect political responses as well, taking various forms: criticizing Ackman over Israel or his charge against Gay, or for being a billionaire.
Ackman offers one vision of a campus responding:
what if a plagiarism review turned into an incredible embarrassment for the entire university? It could lead to wholesale firings of faculty. Donors terminating their donations. Federal funding being withdrawn, and a massive litigious conflagration where faculty members and universities sue one another about what is plagiarism, and what is not. Think about the inevitable destruction of the reputations of thousands of faculty members as it rolls out around the country, and perhaps the world.
He elsewhere adds:
If every faculty member is held to the current plagiarism standards of their own institutions, and universities enforce their own rules, they would likely have to terminate the substantial majority of their faculty members.
I wonder how many institutions will consider block anti-plagiarism probes at the IT layer, such as by setting campus servers to reject traffic from certain IP addresses. They could also take to lawyers by filing suits or injunctions against the plagiarism detectors.
Perhaps individual academics, departments, divisions, schools might do in-house plagiarism audits to prove themselves honest before, or in response to, a plagiarism detector probe.
If these Ackman apps proceed, I wonder how many graduate programs will ramp up plagiarism instruction for their graduate students. That’s a longer term response, aimed at equipping grads with a high scholarly standard.
All of the preceding is pretty narrowly focused. I picked up one proposal and ran with it, looking for potential ways it could unfold without assuming background changes. I didn’t take it far into the future, but kept it mostly to the short term.
What might happen next, if an Ackman plagiarism detector (or more of them) takes off?
I suspect we might see a growing flurry of charges against academics and their host institutions, which could in turn lower higher education’s already challenged standing in American society.
Tom Headrose thinks similar applications might appear in other fields:
Other professions can expect the same. Judicial decisions and rulings? Patterns of bias, inconsistencies and plain old 'dodgy', all on the way in your local common law jurisdiction. I suspect the higher courts in Dublin won't fair well. "The worm turns", as the old saying goes.
Perhaps we’ll also see more writers either charing others with plagiarism, or proactively defending themselves. (See the Ian Bogost article above for a prototype)
Beyond writers and lawyers are politicians. I’m reminded of a 1987 event when journalists established that an American presidential candidate used phrases from a British politician. The story sank the candidate’s attempt on the presidency that year, but he succeeded decades later. I’m speaking, of course, of Joe Biden.
Over time, if there is sufficient demand for Ackman detectors, we could see more investment heading that way. In which case we might expect the historical patterns: boom and bust, investors demanding business changes.
Quick final note: this is an unusual use of generative AI. Yes, many of us have been talking about plagiarism from the time ChatGPT first achieved some basic success, but that was about students plagiarizing. I don’t know of anyone pondering a billionaire calling for an AI attack on Ivy League faculty work.
So a question to end on: what’s the next unexpected use of generative AI?
(thanks to many folks for conversation, including Peter and Ruben)
At the very least, maybe we can get some money for Humanities, and start digitizing dissertations!
"The hardest part of this problem is getting comprehensive digital access to all potential sources. The LLM you use to execute the validation will need to access a lot of material that is likely not available openly on the internet but is in Jstor, special collections, etc."
This wouldn't work for a commercial project, but if a tool were built by those with an axe to grind against the elitist tier of academia, they might use Sci-Hub.