Don’t Fear the Data: Coming to Terms with Data and Digital Humanities

Word Cloud created on Wordle.net using the text of this post.

 

In 2012, Stephen Marche wrote a scathing article about big data and digital humanities in the LA Review of Books, arguing passionately: “literature is not data.” Positioning digital humanities as nothing more than “instant titillation” and just another “next big thing,” he locates its fundamental problem as the attempt to treat literature like data. For Marche, literature is the antithesis of data and regarding it otherwise results in the removal of taste, value, distinction, and refinement, merely reducing the cannon to a stack of books. Clearly, a digital humanities approach to literature has struck a chord with Marche, and his article may be representative of a deeper fear of data, the quantification of literature and other objects of study, as well as the broader digital humanities altogether. But perhaps he is misguided in his approach to the digital humanities, distracted by its supposed shortfalls rather than recognizing its benefits. In his article “Big? Smart? Clean? Messy? Data in the Humanities,” Christof Schöch similarly notes the suspicion of data and quantitative methods from scholars. He connects this distrust to “the apparent empiricism of data-driven research in the humanities [which] seems at odds with principles of humanistic inquiry, such as context-dependent interpretation” (n. pag.). How can we jettison this fear of data in the humanities and maintain the critical analytical stance we value?

Kim Crawford’s six myths of big data may be an apt place to start. Crawford calls attention to the fact that big data is not new but rather has become more ingrained in everyday life, making it more visible and harder to ignore. Moreover, she illuminates how data is something that is created and imagined, not an objective truth or “fact,” and thus, it always needs to be contextualized. In his response to Marche’s article, Scott Selisker emphasizes how data is not something that is likely to replace the interpretation of individual texts or to “dehumanize” literature (or other cultural artifacts), but can supplement and strengthen such analyses. He asserts: “They don’t threaten the individuality of literary works, but rather help us return to those literary works with more information at hand” (n. pag.). Taking into account the great mass of ephemeral texts, objects, sounds, and moving and still images that fall outside the cannon poses significant methodological challenges. Utilizing digital tools and methods, and thus coming to terms with data and quantitative research, may be one such direction, assuming one pays head to Crawford’s cautions. When we shift our perspective toward conceiving distant reading, data, and digital methods as potentially advantageous rather than an encumbrance to analysis, we can begin to understand how such methods might uncover new forms of knowledge and pose new research questions. As Christine Borgman elucidates in Scholarship in the Digital Age, recognizing the importance of data is much more than a digital issue. It is a theoretical, methodological, and social one as well.

Another way of ameliorating the fear of data is by gaining a greater grasp of what constitutes data. Schöch proposes the following definition:

Data in the humanities could be considered a digital, selectively constructed, machine-actionable abstract representing some aspects of a given object of humanistic inquiry. Whether we are historians using texts or other cultural artifacts as windows into another time or another culture, or whether we are literary scholars using knowledge of other times and cultures in order to construct the meaning of texts, digital data add another layer of mediated into the equation. Data (as well as the tools with which we manipulate them) add complexity to the relation between researchers and their objects of study.

Understanding what data is, how it functions, what its limits are, and what it reveals can be viewed as the first steps towards meaningful engagement with data and digital humanities. For a more detailed discussion of the benefits of digital methods and data visualization, look to my next article: “Why Digital Humanities? 12 Reasons for Media Historians.”

Works Cited

Borgman, Christine. Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, Mass: MIT Press, 2007.

Crawford, Kim qtd. in Quentin Hardy. “Why Big Data is Not Truth.” The New York Times. 1 June 2013. Web.

Marche, Stephen. “Literature is not Data: Against Digital Humanities.” LA Review of Books. 28 Oct. 2012. Web.

Schöch, Christof. “Big? Smart? Clean? Messy? Data in the Humanities.” Journal of Digital Humanities 2.3 (2013): n. pag. Web.

Selisker, Scott. “The Digital Inhumanities?” “Two Rebuttals to ‘Literature is not Data: Against Digital Humanities.’” LA Review of Books. 5 Nov. 2012. Web.