Skip to content

SuperSpeare: The Shakespeare quote search tool

Posted on:October 31, 2023 at 02:00 PM

This article was cowritten by Alastair and I. Go check out his blog post on his site, and tell him Zeg sent you! :D

Over the last few weeks, Alastair and I built SuperSpeare, a project showing off some of Cloudflare’s latest AI technologies. You can use it to search for your favourite Shakespeare quotes. Ever wondered where the famous quote “What you egg? [he stabs him]” comes from? Now you can find out! It uses a vector database to search through text embeddings of Shakespeare’s plays to find the closest matches for your query. If you click one of the results, it takes you to that quote in the text where you can read the context.

Affiliation Notice

Neither of us is employed by Cloudflare, but we are part of the Community Champions program. This means we have free access to several Cloudflare products, including all used in this project. Cloudflare does not have editorial control over this blog post.

How it works

There were three important components that needed to work in this project. We needed the data in a format we could query, we needed to be able to search it from the frontend, and we needed the frontend itself.

Importing the data

We got the Shakespeare plays we used in this project from the Folger Shakespeare Library. The licence of their translations is very permissive. We downloaded the TXT versions and then ran them through the importing script we made. It would remove character names, act and scene designations, and the intro. After this, we were left with (mostly) just the text and scene directions. Then, we split the text up into lines and fed those, in batches, into a Cloudflare Worker we had set up that would calculate the text embedding for the sentence. This uses the new Workers AI platform to run @cf/baai/bge-base-en-v1.5(based on BAAI’s FlagEmbedding Model). This worker then stores this vector, along with the line number and work the sentence is from and the sentence itself, into Vectorize. Vectorize is Cloudflare’s new vector database. Finally, we upload the entire play into R2 so we can download it on the frontend.

Querying the data

Now we have this data in Vectorize, we need some way to query it. This is where the aptly-named query worker comes in. This worker takes the search query, generates the embeddings for it (again, using Workers AI), and searches Vectorize for the 10 most similar vectors. We then bundle the resultant data from the vectors (this currently requires some ugly code, but which should be better :soonTM:), which includes the line number and work that the text comes from, and send those to the user.

The Frontend

Finally, the frontend is hosted on Pages and made using SvelteKit. After some experimentation, we found that it looked best when it was all on a single page like it is now. We added some comfort features, such as mobile support, and hid some easter eggs throughout the site.

Future improvements

If we ever come back to this project, we might add the following features:

Closing thoughts

We had fun making this. Being able to run this stuff fully serverless is really powerful, and we’re excited to see where it goes! Give it a try yourself and check out the source code on Github.