The Promise and Peril of Digital Research in Yiddish: An Interview with Gerben Zaagsma

Elena Hoffenberg


It seemed fit­ting to break away from a con­fer­ence bring­ing togeth­er Amer­i­can and Euro­pean schol­ars and researchers work­ing on dig­i­tal human­i­ties for an old-fash­ioned face-to-face con­ver­sa­tion about Yid­dish. Dur­ing a con­fer­ence on Dig­i­tal Hermeneu­tics at the Ger­man Her­itage Insti­tute in Wash­ing­ton, DC, I sat down to talk with Ger­ben Zaags­ma, who is an assis­tant pro­fes­sor at the Cen­tre for Con­tem­po­rary and Dig­i­tal His­to­ry at the Uni­ver­si­ty of Lux­em­bourg. In addi­tion to being the author of the book Jew­ish Vol­un­teers, the Inter­na­tion­al Brigades and the Span­ish Civ­il War (2017), Zaags­ma is the cura­tor of the invalu­able online col­lec­tion of links to all things Yid­dish, yid​dish​-sources​.com.

In this inter­view, he talks about his chance encounter with Yid­dish that start­ed him on the path to study­ing klezmer music and the Span­ish Civ­il War. While research­ing a book on an event with glob­al and long-last­ing rever­ber­a­tions, he encoun­tered the poten­cy and the frus­tra­tions of dig­i­tal research. Final­ly, Zaags­ma looks to the future of dig­i­tal stud­ies and Yiddish.

Elena Hoffenberg: To begin, how did you start with Yiddish?

Gerben Zaagsma: By complete coincidence. I studied history in the Netherlands, in Groningen, which is a small university town. At one point, when I was choosing my main courses, one class fell through and I had to choose something else. That something else became a class on Eastern European Jewish history. So by coincidence, I ended up in a Jewish history class. I’m not Jewish myself, so it’s not related in that sense to my background.

In that class, I had a teacher who also encouraged his students to engage with aspects of Jewish culture, like music. I’m a big fan of all kinds of music, so I decided to work on klezmer music in the class and later ended up writing my master’s thesis on it. Alongside that, I decided to learn some Yiddish, so I took some classes in modern Hebrew just to learn the alphabet. There happened to be a small Yiddish reading group in Groningen, so I joined that. I went to a Yiddish summer course in 1998 at Oxford.

Then I got an MA in Yiddish studies in London, where I studied with Mikhail Krutikov and Gennady Estraikh, who are now in the United States. That’s also where I switched topics from Jewish music to what would become my dissertation about Jewish volunteers who fought in the International Brigades. My background is not in literature or linguistics; I really came to studying Yiddish as a historian who simply needed to know Yiddish in order to be able to read the sources and work with Yiddish materials for my research.

EH: That speaks to the difference that any knowledge of the language can make. It seems from my perspective that your research on the Spanish Civil War and Jewish volunteers would not have been possible without some knowledge of Yiddish. It enriches the topic so much.

GZ: Yes, it would have been impossible. That’s exactly why I studied Yiddish, to be able to work with Yiddish sources in my historical research.

EH: It strikes me that klezmer music and the Spanish Civil War seem like very disparate topics. Are there any connections between the two, aside from Yiddish language?

GZ: There’s actually very little. There is some poetry in Yiddish about the Spanish Civil War. And there was a marching song, set to the melody of a Spanish folk song, written by one Jewish volunteer for the Botwin Company, a Jewish military company that fought in the International Brigades and around which a lot of the dissertation and the book revolves.

EH: I was struck by how diverse the archival locations are in your book. You went to archives in New York, in London, in Paris, in Jerusalem—it’s a very international story that you’re telling. Can you talk about how you traced this narrative through the different national contexts?

GZ: When I did the MA in Yiddish studies in London, I wrote a thesis about the Botwin Company, a Jewish military unit created on December 12, 1937, consisting mostly of Yiddish-speaking volunteers from Poland. I elaborated that MA thesis into my dissertation about Jewish volunteers in the brigades, which to a large extent also revolved around the Botwin Company.

The book is less a study of the biographies of Jewish volunteers than it is a study about representations of Jewish volunteerism both during and after the Spanish Civil War. I decided to turn to the Spanish Civil War period itself to look at how Jewish communists in Paris, who were mostly from Poland, were dealing with Jewish volunteers. The Botwin Company was created because these Jewish communists in Paris lobbied for its creation. So I was interested in what was actually happening there. Why is a group of Polish Jewish communists in Paris lobbying for the creation of a Jewish military company on the Spanish battlefields?

There are not a lot of sources relating to the activities of Jewish communists in Paris, but a key source is Naye prese, which was the newspaper that they published between 1934 and 1939. I realized, though, that if you want to look properly at how these Jewish communists in Paris are writing about these Jewish volunteers in their newspaper, and what that says about their broader political strategies — if you want to really see what is specific for the Jewish communist position — you need to compare their newspaper to a non-communist source. There was another daily Yiddish newspaper in Paris, Parizer haynt. It was a French offshoot of Haynt in Warsaw. So part of the book is a comparative study of how these two Yiddish dailies in Paris in the 1930s wrote about Spain and Jewish volunteers.

I was a little bit unlucky when I was doing the research. The main International Brigades archive is in Moscow and is part of the archive of the Communist International. A key part of the archive was not available when I was doing my research because they were cleaning it up before digitization, so the dissertation lacked some key materials. But then a couple of years later, when I was writing the book, I could consult those materials as well since the whole International Brigades archive is now online. I also had the chance to look at additional materials in the national archives in Poland, to which a colleague of mine alerted me.

The book has a lot of extra material that I couldn’t or didn’t access when I was writing the dissertation. In addition to the International Brigades archive, I used the American Lincoln Brigade Archives (ALBA) at New York University’s Tamiment Library, which holds a lot of material about American Jewish volunteers who fought in Spain. There was a whole debate among American Jewish volunteers after the Spanish Civil War about what it had actually meant to be Jewish during the Spanish Civil War. The ALBA archives give a fantastic view into the development of that debate.

The shape of the dissertation and the shape of the book were very much determined by the sources I could find. One of the critiques of the book has been that it is focused much more on representations of Jewish volunteers than on biographical questions. That was a conscious choice at the time, as I felt there was already a lot of biographical work out there, and it was time to ask other important questions. It would be interesting now to delve more into that biographical aspect, especially because the International Brigades Archives are now online and there is suddenly a lot more easily accessible material than there was 10 years ago.

EH: It’s very interesting, and it’s a perfect segue into the second reason why the readers of In Geveb may be interested in your work. You have written about what it means to do digital history in general and digital Jewish studies in particular. In your own career—from a master’s student writing an MA on klezmer music to this point now—how has your understanding of digital sources and the possibilities of digital scholarship changed?

GZ: Completely. When I studied history in the early nineties, digital was only just starting. I remember that we got email at the university, which was a big thing. There was one lecturer who taught some digital classes—building databases—and whose class constructed a website about the American Revolution as one of the very early digital public history projects. Digital was not a big thing then, but I think it’s completely changing the way we can access history now.

There are two important things to mention about digital history in the Jewish context. One is the dispersal of sources.. Being able to unite and access all kinds of materials online, without having to travel to archives, is an enormous advantage. That changes a lot. You suddenly have truckloads of new possibilities. For example, I once wrote a small article about Naftali Botwin to trace why he became a symbol in Jewish communist circles. Botwin shot a police infiltrator in the ranks of the Polish Communist Party in 1925 and was put on trial and executed. His deed wasn’t only a symbol of revolutionary self-sacrifice for what many communists saw as a just political cause; for many Jewish youngsters it resonated because they saw Botwin’s act as proof that existing prejudices in Poland about Jews as being submissive, cowardly, and “dodging the fight” were wrong. That’s why his name was chosen for the Botwin Company. He was a perfect symbol for both communists and Jews. Some of my key sources for this paper were the Yiddish newspapers put online by the Lithuanian National Library, which contained reports about the trial. If that material had not been online, I would have never been able to write the article I did.

At the same time, that also signals what you need to be careful about when using digital sources — namely, that they start steering the kind of research that you can do, or the kinds of questions that you can ask and answer. So there’s always the much bigger question, which goes way beyond Jewish studies and applies to everyone using online sources: what is actually being digitized and what is not being digitized? What stories can we tell with the material that is online, and which stories can we not tell? That’s quite a critical question to address, especially if students and researchers start gravitating more and more to materials that are available online instead of going to the archive. Then the question of what is actually online becomes very, very important.

EH: I’m curious what you see as specific challenges for Yiddish sources in the digital age, or Yiddish studies in the digital age.

GZ: Apart from digitization itself, one key challenge is to keep track of the abundance of Yiddish materials that have recently been put online. When I started out ten years ago, only a few Yiddish newspapers had been digitized. Beyond that, there was only the Yiddish Book Center’s Spielberg Digital Yiddish Library, which is of course a fantastic resource. YIVO’s Vilna Collections project did not yet exist. There’s a lot more Yiddish newspapers on the Historical Jewish Press website today. In terms of the availability of Yiddish digitized sources, the landscape has changed dramatically in the last ten years.

Another very important challenge is how you get a hold on the information that is contained in those digitized materials. That’s of course a general question for all digitized sources, especially because they are to a large extent unstructured. For Yiddish, the big challenge is OCR [Optical Character Recognition, the ability to digitally read — and thus search — images of printed text. Editor’s note: See the coda for a discussion of very recent updates to OCR technology for Yiddish.] And then with OCR, you could start doing things like named-entity recognition and use Linked Open Data to connect to other resources. I mean, imagine a unified interface where you could search through these digitized collections of Yiddish sources at once. That would be fantastic.

Printed materials are already difficult enough, especially given the many different variations and orthographies of Yiddish, but handwriting is the next big challenge. A lot of progress has been made in the past couple of years, especially with the Transkribus project. They work a lot with non-Roman alphabets like Sanskrit and all kinds of different types of handwriting, so it would be really nice to do a project to start using Transkribus on a corpus of Yiddish handwriting. But that is a massive challenge. I think that OCR is perhaps even more important in Yiddish studies than in other fields of study because in Yiddish, the discrepancy between the amount of material available and the number of scholars who are actually capable of reading that material is huge. Having some kind of way to get a handle on those digitized materials would be really fantastic. If you can somehow also combine that with advances in translation software, that would be even better. Then you could also make that material accessible to scholars who might not know Yiddish, at least for exploratory purposes.

Now here I should be very clear, before I’m misunderstood: I’m not advocating that scholars can work with Yiddish sources through translation software alone and therefore should abandon learning the language. But translation software can provide a rough impression of what might be useful inside the data when you explore digital resources. To give a very concrete example: when I wrote the book, I was also interested in seeing how the Polish Brigade press wrote about their Jewish comrades in arms in the brigades. The whole International Brigades archive is online. So I downloaded copies of all the Polish press publications, OCR’ed them myself, and started looking for a couple of keywords. I don’t know Polish, but I knew what I needed to look for: through searching the Polish words for “Jew,” “Jewish,” and “antisemitism,” I found a couple of very interesting articles. I ran those through translation software. The quality was not very good, but it gave me a rough picture of what they were hinting at in the articles, such as a discussion about antisemitism within the International Brigades. Then I asked a Polish colleague of mine for help with further analysis. You can’t possibly use a [computer-generated] translation like that to quote, but translation software now offers at least some possibilities to get a handle on materials that otherwise would be off limits. But again, linguistic expertise is always critical for a proper analysis.

EH: The anecdote you gave about the Polish brigade’s publications is a really good illustration of how digital tools can help to pinpoint the places to apply more traditional research practices. What are the challenges that still exist for working with Yiddish materials?

GZ: Ten years ago, the big challenge in Yiddish was dispersal and the limited availability of sources. Now we have much more, especially when the Vilna Collections project is finished. Optical handwriting recognition and technologies like named-entity recognition and topic modeling are the next frontier to tackle. And that I think is a massive challenge because the field is so small and these things require a lot of investment. Nevertheless, small pilot projects can make a big difference, and there are scholars who are doing really, really important work in this regard, like Sinai Rusinek in Israel.


GZ: In November [after the interview above was conducted], the Yiddish Book Center put online a beta version of its full-text search engine which allows users to search the full text of all the scanned Yiddish books it has put online. The technology is based on French computer scientist Assaf Urieli’s Jochre software. This is an absolutely thrilling development.

EH: Ten years in the future, what would the ideal state of Yiddish studies in the digital age look like? What do you think the challenges will be ten years from now?

GZ: The minimum ideal state would be that most online Yiddish sources are OCR’ed so that it can be searched, and that many of these resources are interlinked, as is already happening in the JudaicaLink project. Next up is the development of technologies to mine and analyze Yiddish materials. Machine learning and neural networks are becoming very important in digital humanities and will also become important for Yiddish Studies. Needless to say, these methods are complementary to what scholars are already doing, but are critical to develop to confront the abundance of new materials, and data, that we now have at our disposal.

EH: For the readers of In geveb who are engaging with digital humanities, what do you recommend people read, or watch, or a tool for people to use?

GZ: You can start by exploring the Yiddish Book Center’s brand-new OCR full-text search engine here.

Another important project to mention is the Jewish Heritage Network, which is involved in several projects that help to disclose Jewish heritage collections in new ways.

And, finally, if I may shamelessly plug another project: At the Luxembourg Centre for Contemporary and Digital History (C²DH), where I work, we are currently building a website with all kinds of information on the intersection of Jewish Studies and Digital Humanities with support of the Rothschild Foundation Hanadiv Europe. We hope to launch it by late 2020 or early 2021 — so stay tuned!

Hoffenberg, Elena. “The Promise and Peril of Digital Research in Yiddish: An Interview with Gerben Zaagsma.” In geveb, February 2020:
Elena Hoffenberg is a graduate student in the University of Haifa's MA program in Holocaust studies and in the Simmons University School of Library and Information Science. She is currently serving as the Digital Humanities Associate Fellow at the United States Holocaust Memorial Museum in Washington, DC.