Vesuvius Challenge: deciphering millennial secrets with AI

By Mauro Mereu

“At about one in the afternoon, my mother pointed out a cloud with an odd size and appearance that had just formed. From that distance, it was unclear from which mountain the cloud was rising, although it was found afterward to be Vesuvius.” That’s how the historian Pliny the Younger described – in a letter to the Roman historian Tacitus – the buildup of the Vesuvius eruption in 79 AD. The eructation blanketed the surrounding area of Pompeii, Stabiae, and Herculaneum with meters of ashes and lava. A villa with an extensive library of papyrus scrolls – owned by the father-in-law of the Roman general Julius Caesar – was covered, too.

The Vesuvius Challenge aims to decipher content from two millennia-old scrolls preserved in a villa covered by the Vesuvius eruption in 79 AD;
Using cutting-edge technologies, participants have until the end of 2023 to read four passages from the scrolls, with a $700,000 grand prize at stake.

Launched by former GitHub CEO Nat Friedman, the Vesuvius Challenge is pushing its contestants to decipher the content of some of the two millennia-old scrolls found in the villa using today’s cutting-edge technologies. Despite almost two thousand years passing by, the papyri are still intact. The volcanic debris’ heat carbonized them, but they stayed preserved underground. “It’s very tantalizing because, on the one hand, these papyri have been preserved, but on the other, we can’t read them yet,” says JP Posma, project lead of the Vesuvius Challenge. Its participants have time until the end of 2023 to read four passages and win a $700,000 grand prize.

This is an article from IO Next: The Year Of… For the last magazine of this year, we selected the articles that stuck with us the most, whether it was an impressive interview, an important story or just something funny.

Why Mauro chose to write a follow-up on this article:

Discovering more about our past isn’t the use case that immediately pops up when thinking about AI. But AI is a horizontal technology, finding applications in all spaces, historical research included. AI is proving to be good at knowing more about our ancestors, as participants of the Vesuvius challenge started to decipher two thousands-year-old papyri with the help of machine learning algorithms.

Reading the first words

The challenge builds on a series of discoveries by Dr. Brent Seales‘ lab at the University of Kentucky. In early 2023, their machine-learning model recognized ink from X-ray scans of the papyri. Piece by piece, we are starting to read the content of the manuscripts. Luke Farritor, a computer science student, was the first contestant to read an entire word – precisely πορφύραc, the ancient Greek word for purple – from the scroll, winning the $40,000 first-letter prize. Shortly after, Youssef Nader, a Ph.D. student at the Free University of Berlin, discovered the same word with more precise results, bagging the second-place award.

“I was thrilled and excited to the point of being nervous. It’s also a competition, so I couldn’t share this with many people. I was also still trying to be skeptical not to overestimate it. I redid the experiments a couple of times to validate my method, then I reached out to JP, and he urged me to submit the result as soon as possible,” told Innovation Origins Nader.

The deciphered word – © Vesuvius Challenge

Unearthing Pompeii’s secrets with cutting-edge technology

“At about one in the afternoon, my mother pointed out a cloud with an odd size and appearance that had just formed. From that distance, it was not clear from which mountain the cloud was rising, although it was found afterward to be Vesuvius.”

Flashback

The papyrus villa was first discovered in 1750 when a farmer dug a well, and stumbled across its marble pavement. Archeological research followed suit, recovering many scrolls over the decades. In 2015, Seales lab’s first breakthrough happened. The scientists read the En Gedi scroll — an ancient Hebrew parchment found in the Dead Sea – combining X-ray scans — through computer tomography, CT – and computer vision technologies. Therefore, the desire came to apply the same technique to the Herculaneum Papyri, too.

Four years later, the American researchers used a particle accelerator to scan two of the scrolls retrieved in the villa. Thanks to this experiment, they inferred that machine learning models could recognize surface patterns indicating ink presence. And that became a reality at the beginning of 2023 when their model detected ink on the scans. So the challenge followed.

A 3D reproduction of the scroll

No different than in hospital CT scans, X-rays are sent through the object – the scroll in this case – getting a picture on the opposite side. “But then, one can rotate the object to get many pictures from different angles. Through tomography, one can reconstruct a 3D image from all those pictures,” explains Posma.

Particle accelerators can help get high-resolution images, as they can provide a high-quality energy source and specific light incidences. At this point, the scroll content appears in a conglomerate of voxels, the name for the data points in a 3D grid. From there, it’s easier for AIto identify characters and read them. The idea is to develop neural networks trained on tiny bits of the scroll with and without ink. “Then it’s about inferring patterns and eventually showing some characters,’’ underlines Posma.

Data models become teachers

That’s what Nader did. Specifically, he took the challenge as a chance to experiment with self-supervised learning, one of the latest trends in AI technology, which he combined with his idea to make a data model teacher for another data model.

In other words, this method trains models based on the information retrieved by a previous model. The first letter detected by one of Nader’s models was a ‘pi’ (Π), which was then passed to the next model, which improved based on the pre-acquired knowledge. “After the next models are trained, they become teachers. And it’s like an ongoing cycle until you get a good amount of data from within the scrolls of ink signals, all of them,” clarifies Nader.

“The model takes a small part of the image and tries to predict whether or not there is ink. It trains on with whatever data you have and makes predictions on the new segments from the scrolls,” the Ph.D. student goes on to explain.

AI system self-organises to develop features of brains of complex organisms

Cambridge scientists have shown that placing physical constraints on an artificially-intelligent system allows it to develop features of the brains of complex organisms in order to solve tasks.

Cooperative effort

With a computer engineering background and a master’s in data science, Nader is starting his Ph.D. research. With a strong passion for machine learning, the Vesuvius Challenge is not the first one he has participated in. “The historical aspect of it is exciting, and prize money doesn’t hurt the interest either. It’s a tough problem that makes for a very nice playground for testing ideas and trying things,” he says.

Despite all the efforts made by the University of Kentucky team, many still needed the collective power of many to decipher the scrolls, as per Posma. The Vesuvius Challenge is a competition, but it also created a community where participants come together – on a server of the messaging app Discord – to discuss ways to solve the puzzle. Moreover, both the codes developed by Farritor and Nader were made public to the other contestants. After a period without relevant advancements, the two submissions raised the community’s spirits, pushing which the research forward.

Completing the challenge

With about a month remaining until the end of the competition, there are still chances to read the four tweets from the past – each passage is about 140 characters long. “I think it’s in the realm of possibilities. It’s still unclear whether there is enough text to be recoverable or maybe it’s too damaged. This is still a point of debate, and it will be the work of the next few weeks to try and see if there’s a possibility,” sums up Nader.