The Arclight Software Application: An In-Development Preview

The Arclight Software Application: An In-Development Preview

This past March, Eric Hoyt previewed an early version of the Arclight software application at the Society for Cinema and Media Studies conference in Montreal (you can read more about that session, as well as other digital projects at SCMS, here). The Arclight app searches the Media History Digital Library’s corpus of 1.4 million pages of film and magazines, and visualizes the number of hits for searched terms to look at how those terms trend over time. Since SCMS, a number of additions have been made. As progress on the Arclight prototype continues, we wanted to provide an update on where the app currently stands, what it can do, and the work that remains to be done.

Much of the credit for Arclight’s rapid technical development goes to Kevin Ponto, Assistant Professor of Design Studies and faculty in the Living Environments Lab at the University of Wisconsin, Madison. Kevin’s work has resulted in a stunningly fast, flexible, and well-visualized interface. The current version of the software works on a user’s local machine, although a web-based version should be available sometime this summer. Here is the interface as it currently stands:

Arclight SES Interface and Visualization: Streamgraph

This screenshot shows a Scaled Entity Search for two search strings originally suggested by attendees of the SCMS workshop where Arclight was made public: “propaganda” (in blue), and “public relations” (in red). The graph at right visualizes those results; quantitatively, the y-axis represents the percentage of pages that include the searched entity in a given year (the x-axis).

The interface is already quite user-friendly. The entities to be searched are entered in the “search strings” input field. While this SES only compares two terms, Arclight is capable of searching for more than a thousand quite quickly. Other options are available via checkboxes, and defaults have already been set up to anticipate the needs of most users.

“Literal search,” for example, ensures that two-word entities (like “public relations”) are queried through Lantern as a literal string rather than as an object variable; this helps cut down on false positives significantly. Users comfortable with more ambiguous results can deselect this option. The “Streamgraph” option presents results as a stacked graph (as seen above), where the area of each search term in a given year is proportional to the results for all terms, showing the general prevalence of each. Users can also select a non-stacked representation of the data in the form of an overlapping line graph by deselecting the Streamgraph option:

Arclight SES Interface and Visualization: Line Graph

This configuration may be more useful for users wishing to visualize the absolute rather than relative prevalence of entities. As this example shows, it is particularly effective at visualizing direct comparisons between a small number of entities.

“Normalize,” also on by default, mathematically adjusts the visualization of the page hits based on the total number of pages on the MHDL library for a given year. This ensures that spikes or dips in the visualization that occur due to a particular year having a greater or smaller number of pages are evened out, yielding a more statistically representative representation of the data. As noted below, normalization can be a tricky process, and we are still working out the best way to accomplish it.

In addition to viewing Arclight’s results directly in the app, users can click on the colored portions of the visualization to be taken to the actual Lantern results they represent, for a closer reading. The page hits data is savable as a CSV file via the application’s File menu, as is the visualization itself (as an SVG file). While the interface structure necessary for searching by individual journals is in place, only the corpus as a whole is searchable at the moment; this is the next step of the application’s development. Clearer axis scaling is also on the way.

The above example of “propaganda” vs. “public relations” shows both the power and the pitfalls of the kind of distant reading Arclight was originally designed to accomplish. As we might have expected, the two most obvious spikes in the graph correlate with the World Wars (and a closer look at the hits confirms that they are indeed primarily war-related). We can also see the gradual replacement of the first term with the second in discourses surrounding the use of media as a way to influence public opinion (most clearly in the absolute line graph). However, what explains the first spike, around 1895?

The answer is that it is a statistical anomaly; the spike represents four hits from Billboard in that year. Because the MHDL contains a relatively few number of pages from 1895, the hits on those four pages work out to a relatively large percentage of the corpus from that year as compared with 1918 or 1941. As work on the app progresses, a feature to distinguish between percentage of the corpus and raw page hit data will be incorporated. In any case, Arclight was always intended to bridge the gap between distant and close reading, and the links to specific results in Lantern should help users remain critical about the visualizations it produces.

#DHSCMS: Digital Humanities, Tools, and Approaches at SCMS 2015

#DHSCMS: Digital Humanities, Tools, and Approaches at SCMS 2015

This conference report is reposted with the kind permission and cooperation of Antenna.

Arclight_Demo

This year’s Society for Cinema and Media Studies conference in Montreal featured a number of excellent panels that were broadly dedicated to “the digital.” Eric Hoyt (@HoytEric) and I (@DerekLong08) live-tweeted digitally-themed panels using the #DHSCMS hashtag, through our personal Twitter accounts as well as the Project Arclight account (@ProjArclight). We defined the digital broadly in our Twitter coverage, attending any panels that conceptualized the digital as a research tool, a methodological approach, or an object of study. What follows are some themes, questions, and trends that emerged from those panels, along with brief thoughts on that status of digital work in our field, and how it might move forward. This summary is far from exhaustive: there were many compelling presentations beyond the ones discussed here. For more extensive coverage, check out #DHSCMS on your preferred Twitter platform.

Wednesday’s panel on “Network Studies” (B19) situated the digital as itself an object of study, and offered a number of valuable insights into historical and contemporary digital practices. Steven Malčić offered a particularly valuable history of the early Internet, carefully contextualized in object-oriented ontology as well as the history of computer science as a discipline. According to Malčić, ARPANET engineers conceptualized network “entities” in an inverted way, positing the stability of “processes” (protocols and applications) and the ephemerality of “objects” (users, hosts, and servers). Sheila Murphy, in her presentation on fitness trackers and other wearable devices, showed some of the ways in which personal data collection on them has been framed as a game. She made an important distinction between two different models of data collection: that of all-powerful, active “surveillance,” and the more passive “capture model” of wearables. Malčić and Murphy’s papers were only two of the many that explored the important intersections between the digital-as-technology and the digital-as discourse/rhetoric/philosophy, demonstrating that our understanding of how digital technology is implemented is fundamentally linked to the ways in which it is conceptualized.

Thursday offered a pair of workshops on digital archives and their uses, and both testified to the ways that digital technologies are changing research, publishing, and pedagogy. The workshop on “Making the Past Visible” (F19) modeled a number of innovative practices. Michael Newman made a compelling case for the use of archival images as evidence instead of simple “illustration.” He showed how platforms like Pinterest and Tumblr might be used as a supplement to the traditional book publishing model, not only as a means of preserving color, sharing evidence productively, and making visual arguments, but also as a way to connect scholars and other users with common interests. Deborah Jaramillo made the interesting point that pedagogy should be considered an integral part of archival research; teaching students with documents can be a way to curate the often-intimidating volume of digital archival documents. Curation was a theme that returned throughout the workshop, with several participants arguing for a greater valuation in our field—particularly for the purposes of hiring and promotion—of data collection, processing, and sharing.

A pedagogically-focused workshop later in the day (G18) showcased several resources for and approaches to teaching film and broadcast history. Beth Corzo-Durchardt stressed the growing importance of teaching students how to evaluate online sources, as well as how to be collaborative in their primary document research. Catherine Clepper modeled a fascinating assignment using the Cinema Treasures website (cinematreasures.org) to engage students with the exhibition history of their own hometowns, while Eric Hoyt, citing Franco Moretti, argued that the humanities needs to do a better job of teaching students how to read data at scale. Toward that end, Hoyt gave first public demonstration of Arclight, an app that visualizes hits, by year, for search terms across the entire corpus of the Media History Digital Library.

In keeping with Hoyt’s hands-on demonstration of the turn toward digital archives, Martin Johnson made the crucial point that despite this turn, scholars often proceed as if they are still using physical collections. This echoed nicely with the valuation point made in the earlier workshop, hammering home the importance of crediting—and being critical of—digital collections.

Finally, Friday afternoon’s panel on Digital Film Historiography (L8) showcased a large-scale digital project underway at the EYE Film Institute in Amsterdam. Christian Gosvig Olesen presented on the project, which uses the 900 early films, 2000 photos, and 120,000 business documents of the almost completely digitized Jean Desmet collection. As Olesen explained, the Desmet project combines a mapping of early film distribution in the Netherlands (based on the business documents) with ImagePlot visualizations of the collection’s color films, in an exploration of the ways in which New Cinema History and statistical style analysis might be combined. Work toward combining these two very different modes of analysis for the project’s ultimate question—Did films with certain patterns of color aesthetics correlate with particular distribution patterns?—remains in an experimental stage. However, the project’s mapping component offers an interesting model for laying bare the inherent uncertainty and often-invisible lacunae of digitized corpora. By using color, the Desmet project clearly distinguishes between all the films in the collection and those for which distribution data actually exists, modeling one solution to the problem of corpus transparency in digital scholarship.

Desmet_Project

My personal takeaway from Montreal is that scholars are increasingly using digital tools for every aspect of their work, be it analysis, archival research, publication, or pedagogy—and with a healthy critical understanding of their advantages and pitfalls. Truly exciting work is being done in all of those arenas. What remains, as I see it, is the problem of technical implementation. Aside from a breakout session in the pedagogy workshop, none of the panels I attended offered practical training in the use of software tools or coding. Even a broad discussion of the kinds of tools, platforms, or languages that might be useful for approaching particular problems or research questions would help to expand the base of scholars able to do digital work. Some kind of forum focusing on the use of specific digital tools would almost certainly be well-attended if it were to be offered at next year’s conference, whether as a workshop, practicum, or even a poster-style “drop-in” exhibition. Such a forum would be invaluable, even if it were only a starting point; getting hands-on has a wonderful way of reducing the intimidating character of some more advanced digital tools, for students and advanced scholars alike. If the last few years have seen a digital turn in media studies, then solving the problem of technical implementation may very well usher in a digital acceleration.