Johanna Drucker’s talk at Concordia in the Fall 2015 left me thinking about all the things that those of us who work with digital tools ask of data. This inspired me to consider the limits of what data can do, which is the topic of this article. A simple Google search locates a plethora of articles on this very topic, which is perhaps indicative of the general skepticism of Big Data. Generally speaking, to illuminate what data can do and how we can effectively employ big data in our research, we need to address its limits.
1. Data cannot function as testimony.
Writing for the Arclight website, in “Teaching with Arclight and POE” Eric Hoyt illustrates how Arclight can be employed in a film or media history classroom. For example, Arclight can be used in the first step in the POE (Predict, Observe, Explain) strategy. Hoyt explains:
Arclight offers one means of integrating POE and active learning into a film or media history classroom. To use my earlier graph example, a teacher might ask, “How did the discourse of sports change from 1900 to 1960 in books and magazines about American entertainment and media?” Students could write down their predictions, then get to work on their computers or phones running queries for baseball, basketball, football, and other terms in Arclight. Something might immediately jump out at them. For me, it was the decline of both baseball and football during the years of 1942, 1943, 1944, and 1945. Based on this observation, I would offer the explanation that this decline of baseball and football occurred due to the impact of World War II and the enlistment of athletes into armed forces. (para. 6)
Approached in this way, Hoyt contends, “distant reading is a new prediction that invites closer inspection, observation, and analysis.” Thus, the graphs produced through Arclight are visualizations and predictions that require further exploration, and not, in fact, evidence or proof of a specific phenomenon. As Digital Archivist Trevor Owens asserts, data is “not in and of itself a kind of evidence but a multifaceted object which can be mobilized as evidence in support of an argument” (para. 2). Thus, it is vital to remember that “visualizations are always interpretations” of data, not mere snapshots (Drucker, Graphesis 7, 129).
2. Data cannot speak for itself.
Data and visualizations of data created through a method of distant reading do not elucidate anything as such, but require analysis to make them “speak.” As Lisa Gitelman and Virginia Jackson argue, “Data require our participation. Data need us” (6). Moreover, data cannot differentiate between real and spurious correlations. While a search term in Arclight may generate a superfluity of results, without a close analysis of the search term in the context in which it was written, we cannot tell whether its occurrence identifies anything meaningful or not. Data visualizations are informative as an interrogation tool, enabling researchers to see large patterns as well as to detect anomalies, and network diagrams have illustrative value. However, as Drucker discusses in Graphesis: Visual Forms of Knowledge Production, they are only effective if we know how to properly read them, a process that is “essential in our contemporary lives” (4). In this way, she suggests we need to “develop a domain of expertise focused on visual epistemology, knowledge production in graphical form in fields that have rarely relied on visual communication” (6), engage with statistics and critical media studies, understand how to make, manipulate, and comprehend structured data, and create training opportunities for the necessary skill acquisition to undertake digital humanities research (126). Jonathan Gray, researcher in the Department of Media Studies at the University of Amsterdam and Director of Policy and Research at Open Knowledge, takes a similar position in his article “What data can and cannot do,” published in The Guardian. He cautions us not to assume that datasets are readily transparent. Noting the many similarities between official datasets and official texts, he advises us “to learn how to read and interpret them critically, to read between the lines, to notice what is absent or omitted, to understand the gravity and implications of different figures, and so on” (para. 6). As both Drucker and Gray demonstrate, in order to make sense of data we not only require the proper tools but also the refined skills to analyze the data and provide it with a voice.
3. Data cannot be raw.
We need to remember that generating data is a process of interpretation. Since data “needs to be imagined as data to exist and function as such,” this imagining involves a process of interpreting (Gitelman and Jackson 3). In their introduction to “Raw Data” Is an Oxymoron, Gitelman and Jackson expose how perceiving data as “raw” or untouched by social and cultural values and biases “leads to an unnoticed assumption that data are transparent, that information is self-evident, the fundamental stuff of truth itself” (2). Drucker explains how such “realist approaches to visualization assume transparency and equivalence, as if the phenomenal world were self-evident and the apprehension of it a mere mechanical task,” and thus are “fundamentally at odds with approaches to humanities scholarship premised on constructivist principles” (126). Drawing a correlation between photography and data, Gitelman and Jackson argue that just as the “photographic image is always framed, selected out of the profilmic experience in which the photographer stands, points, shoots … [d]ata too need to be understood as framed and framing, understood, that is, according to the use to which they are and can be put” (5).
4. Data cannot be singular or universal.
Despite its common usage as a mass noun and singular verb, data is plural. Gitelman and Jackson explain how “Data’s odd suspension between the singular and the plural reminds us” that data are in fact aggregative (9). Since data pile up, imagining data implicitly involves devising a system of classification (8). In other words, the reduction of phenomena to data entails grouping, dividing, and ranking data, which in turn conceals “ambiguity, conflict, and contradiction” (9). Consequently, we must look “under data to consider their root assumptions” (4). This is not to say that data are not objective, but to understand objectivity as “situated and historically specific; it comes from somewhere and is the result of ongoing changes to the conditions of inquiry, conditions that are at once material, social, and ethical” (4).
5. Data will not provide us with salvation.
Curated by Olga Subirós and José Luis de Vicente, Big Bang Data is an ongoing project and exhibition that interrogates the current “information explosion” and questions whether data is the new oil. On the Big Bang Data website, Subirós and de Vicente discuss the dangers of data-centrism, including the notion that data can solve all our problems, and stress the importance of retaining certain values, such as subjectivity and ambiguity, “at a time when it is easy to believe that all solutions are computable.” Part of the Big Bang Data project and commissioned by The New York Times, Jonathan Harris’s “Data Will Help Us” is a multi-colored manifesto that also reflects the “promise and perils of data.” In describing the project, Harris exposes how Big Data “has become a kind of ubiquitous modern salve that now gets applied to almost any kind of ailment. In fields ranging from education, to government, to healthcare, to advertising, to dating, to science, to war, we’re abandoning timeless decision-making tools like wisdom, morality, and personal experience for a new kind of logic which simply says: ‘show me the data.’” Projects such as these remind us to be wary of approaching and understanding data as a solution or the answer and reinforce the use of data and digital tools as just one method or tool in our methodological toolboxes.
It is important to understand that what actually constitutes data is controversial. Indeed, Drucker has argued that as a term, data is insufficient, noting its etymological roots in the term “datum,” which translates to “that which is given” (Drucker qtd. in Schöch). In “Humanities Approaches to Graphical Display,” Drucker explains how data “pass themselves off as mere descriptions of a priori conditions,” in turn collapsing the “critical distance between the phenomenal world and its interpretation” (para. 1). In response, she argues that we must “reconceive all data as capta” (para. 3). For Drucker, the term capta captures the “situated, partial, and constitutive character of knowledge production, the recognition that knowledge is constructed, taken, not simply given as a natural representation of pre-existing fact” (para. 3). In “Big? Smart? Clean? Messy? Data in the Humanities,” Christof Schöch observes how even in the absence of a new term, data in DH can be redefined. He suggests the following definition of data as “a digital, selectively constructed, machine-actionable abstraction representing some aspects of a given object of humanistic inquiry.” At a time when data are frequently presented as the ultimate solution, it is important to remain critical. Examining the boundaries and limitations of data reinforce the ways in which data can and cannot be of assistance to us as scholars working with data and employing digital tools in our research.
Works Cited
Big Bang Data. Curated by Olga Subirós and José Luis de Vicente. 2015. Web.
Brooks, David. “What Data Can’t Do.” New York Times. 18 Feb 2013. Web.
Drucker, Johanna. Graphesis: Visual Forms of Knowledge Production. Cambridge: Harvard UP, 2014. Print.
—. “Humanities Approaches to Graphical Display.” Digital Humanities Quarterly 5.1 (2011): n. pag. Web.
Gitelman, Lisa, and Virginia Jackson. “Introduction.” “Raw Data” Is an Oxymoron. Ed. Lisa Gitelman. Cambridge: MIT Press, 2013. 1-14. Print.
Gray, Jonathan. “What data can and cannot do.” The Guardian 31 May 2012. Web.
Harris, Jonathan. “Data Will Help Us.” Big Bang Data. 2013. Web.
Hoyt, Eric. “Teaching with Arclight and POE.” Project Arclight. 12 Oct. 2015. Web.
Owens, Trevor. “Defining Data for Humanists: Text, Artifact, Information or Evidence?” Journal of Digital Humanities 1.1 (2011): n. pag. Web.
Schöch, Christof. “Big? Smart? Clean? Messy? Data in the Humanities.” Journal of Digital Humanities 2.3 (2013): n. pag. Web.