Thursday, April 24, 2014

“How Do our Cultural Resources Get Online?” A Critical Approach to Digital Collections

by Robert D. Montoya

Online digital repositories are an increasingly vital part of our academic and educational landscape. Digital library collections are integral components of instruction at all levels, ranging from K-12 schools to graduate level coursework within the college or university (Gilliland-Swetland; Malkmus). Particularly significant efforts have been made to utilize digitized primary sources to enhance undergraduate research skills within the university classroom, providing students with the literacies necessary to negotiate and interrogate these online repositories (Krause; Mitchell, Seiden, and Taraba). In this brief post I begin to examine how these online collections come into being and how representative they are—or are not—of the institutional collections from which they are digitized. To do so I will utilize a “critical information literacy praxis” approach, which examines library repositories, technologies, and seeking infrastructures as “culturally-situated phenomenon, embedded within specific social, political, and economic systems, subject to … the power relations and ideologies that define particular moments in history” (Accardi, Drabinski, and Kumbier ix). In addition, I will use Ajit Piyati’s “critical theory of technology” approach to tease out how the digital library space is not “neutral” and standalone as an entity, but rather must be examined within the larger economic and social forces in play in the physical library environment (Pyati). I propose an examination of digital library repositories that takes into consideration the impetus for their digitization. The existence of one digital object over another in any given online repository is dependent upon funding and this model can bias online collections to the preferences of those that have economic power within these cultural institutions. As such, digital repositories exemplify not necessarily the most significant holdings from any given library collection, but holdings whose value has been decided by those with more financial resources.

Online digital collections of cultural materials are becoming more widespread and growing astronomically in scope. The Digital Public Library of America (DPLA) celebrated its one-year anniversary on April 18, 2014, and since then has become a significant player in the dissemination of cultural collections made digitally available by institutions across the United States (DPLA). Since it’s inception, the DPLA has attracted contributions from over 1,300 organizations, “attracted over 1 million unique visitors to its website,” and aggregated over 7 million items searchable through its interface (DPLA). The DPLA has rightly been hailed as a “potentially transformative initiative,” creating a vital platform by which libraries and other similar cultural institutions can push their cultural heritage material into the public realm (Drucker). The importance of online digital collections cannot be understated, especially as educational institutions of all levels increasingly rely on these resources to bridge the classroom with primary source material otherwise unavailable for consult. Calisphere, a University of California sponsored program designed to integrate primary source documents into “sets that support California Content Standard in History-Social Sciences, English-Language Arts, and Visual Arts for use in K-12 classrooms,” is a prime example of how digitized material can essentially transform the way educators frame a student’s relationship to the material artifact of cultural production (University of California).

It is precisely because of this rising social significance of digital repositories that I argue for an increased transparency in how these digital objects find their way into repositories at the local level. University libraries are complex organizations and the reasons why material is digitized in these environments and deposited into an online digital collection are even more complicated. Digitizing costs are hefty, involving not only the labor-time of actual scanning or photography, but also quality control, the application of metadata to digital surrogates, project management, the workflows to shepherd these files into long-term storage, and the maintenance of technical infrastructure to preserve these digital objects (Schaffner, Snyder, and Supple). The source of funding for any particular digital project, or what we might call the economy of digitization, is a key piece of information to understand how a digital library object fits within the larger institutional structure from whence it came. In Figure 1, below, I have identified six possible (not exhaustive) impetuses for the digitization of library material: patron-generated requests, instructional digitization, donor-driven, collections identified as culturally or institutionally significant, preservation digitization, and foundation supported. Each of these categories represents a different source of funding, and thus, has vastly different motivations for why an object might get digitized and placed online in the first place. In this way, the digital library environment mirrors the economies of power already present in the physical library environment.

These six categories can roughly be separated into two groups: curated and ad hoc. ‘Ad hoc’ digitizing consists of patron-generated requests, donor-driven projects, and institutionally specific digitization projects that typically arise without advanced internal planning. ‘Curated’ digitizing are projects that require more foresight and institutional selection, and are broadly defined as preservation digitization, foundation-supported digitization, and the scanning of collections deemed culturally significant.

Ad hoc digitizing is not necessarily pre-planned by the institution, but is rather spurred on by requests that arise from public use of the collection. Patron-generated digital projects are often, but not always, smaller in scale and particular to the research project of the requesting entity. Scanning or photography is often fragmentary—of particular pages within a book, or a selection of documents within collections—and thus are not representative of the larger corpus from which they were captured. Depending on the staffing at some institutions, some choose to apply full-scale metadata to the surrogates created, deciding to ingest the generated image/s into their local digital library, while others deliver the image to the user and keep no copy of the file within their repository (Schaffner, Snyder, and Supple 8). Instructional digitization projects are those that arise from engagement with classrooms as part of library teaching and learning initiatives, often from student requests for scanning as part of given assignment, the creation of scans for classroom presentations by staff, or the digitization of items for exhibits generated as part of group projects. Instructional projects are significant because they are in-line with university teaching missions and are the fruits of engagement between faculty, library staff, and students. Indicating which classes these digital surrogates arise from is important contextual information for researchers utilizing these resources. Finally, donor-driven duplication often arises from requests by donors who require portions of their collection for personal use. Many donors, but of course not all donors, have the financial means to request such (often) larger scale digitizing. Donors with more financial resources are better equipped to prompt institutions to scan more of their material. If digitized material finds its way into online repositories by any of these means it is important for repository users to understand that it was not the institution that decided to scan these items, but rather that they are a product of the course of research. Material of this nature is less likely to represent the collection development policies adopted by repositories because of this fact.

Curated digitizing, on the other hand, is often (but not exclusively) planned, larger-scale scanning by the institution. Foundation supported scanning might limit duplication of material by subject, format, etc., but institutions either (1) applied for a grant acknowledging the importance of digitizing a particular portion of a collection, or (2) have latitude in what collections they can digitize within broad guidelines. Preservation digitizing is performed on the most fragile items within a collection. Large institutions with significant holdings and limited financial resources, however, often have a great deal of fragile material and must choose only those items that are most significant for teaching or research—in either case, the university can weigh research value against preservation digitizing at ay given point, scanning only those items that meet particular requirements. Finally, universities also digitize material because they are deemed culturally significant or of high research value (regardless of condition). For example, a special collections institution might scan a collection because it gets heavy use in the reading room, thereby providing broader access to the material or alleviating stress on the material items from repeated use. These curated modes of digitizing have much more institutional control and are more likely to fall in-line with established collection development policies at the local site. As such, users of online digital repositories might find it helpful to know that there was purposive selection in regard to these items.

It should be noted that the categories above are not static or rigid, nor are they exclusive of each other—one might imagine a case where a foundation will initiate support for only a particular collection, thus biasing digitization to their cultural preferences. The categories created are merely outlined to illustrate that all digitization is not equal—an individual or funding entity has the tangible economic power to bias online collections in any number of directions depending on the amount of resources they are willing to allocate toward the purpose. What might otherwise be seen as a neutral “universal good,” the digital repository is emblematic of how we need to critically understand technological structures as expansions of already-existing bureaucratic structures where exertions of economic power are daily happenstance in order to make apparent the inequalities of representation within the system (Pyati 87–88).

Transparency at the institutional level would help alleviate this ambiguity in the digital library, exposing the economic circumstances that brought a particular digital item broader exposure in the digital library. But how might this be done? One solution might be the uniform addition of a metadata line such as was done here:

This example from the UCLA Digital Library indicates, under “Description,” that this particular map titled, "California, the golden state," was digitized as part of the California Cultures Project (and thus ingested into the Calisphere project previously mentioned in this post). Why is this important? Precisely because this scanning project took place within the context of a pedagogical project: a curriculum compatible K-12 initiative meant to connect primary resources into the classroom. The user is not left to assume this map’s importance within the UCLA collection purely on the basis that it was digitized at all, but immediately understands that this map is important within the context of the California Cultures project in particular. However, the “Description” metadata field is not uniformly used for such purposes, and few repositories outside of UCLA include such contextual information.

Another solution to foster transparency in digital collections would be the adoption of clear digital collection development policies, such as those drafted by a number of physical archival institutions. These development policies would include notes on how items find their way in the digital library, elucidation of local institutional selection practices for the creation of digital objects, as well as any large-scale initiatives underway at a given institution (such as foundation support). Such documentation would help, not only researchers, but also teachers and librarians, convey a particular digital item's importance within the context of the digital collection, as well as within the context of the entire local archival collection itself.

I acknowledge that applying metadata is expensive, time-intensive work—it takes real intellectual energy with real institutional costs. That said, there is a great deal that institutions can do quite easily to expose the economies of digitization at play within their local repositories. This transparency will only become more important as entities such as the DPLA continue to grow in prominence and bring to light the myriad (often free) cultural resources our American cultural institutions have available for learners and scholars at all levels. Interrogating our online digital repositories more critically with an eye toward the social structures that bring them into being will help make for more transparent, valuable, and ethical repositories, and help users of these resources to understand not only the strengths of online collections but also the limitations they represent.


