The Social Networks and Archival Context (SNAC) project is an ambitious one that seeks to locate records of historical importance across repositories and make them available to patrons on a massive scale. Our panel updated us on its fascinating progress. Look at what we records and information management professionals can do.
The Society of American Archivists (SAA) 2012 annual meeting, “Beyond Borders," concluded Saturday, August 11, 2012 in San Diego.
Tammy Peters of the Smithsonian Institute introduced her panel:
- Ray R. Larson (University of California, Berkeley)
- Daniel Pitti (University of Virginia, Institute for Advanced Technology in Humanities)
- Jerry Simmons (National Archives and Records Administration).
The Social Networks and Archival Context Project: Status Report
Ray R. Larson
Mr. Larson delivered an update to SNAC. Officially, the goals of the project are to further the transformation of archival description and to separate description of records from description of people documented in them. Translation: the project is meant to make available records of historical importance and
- enhance access to archives resources, through all cultural heritage resources; and
- enhance understanding of those resources.
We’re talking big data. With a sample of 150,000 EAD-encoded finding aids contributed from around the world by national libraries and others, including:
- Library of Congress
- National Archives and Records Administration
- Smithsonian Institution
- British Library
- Archives nationales (France)
- Bibliothèque nationale de France
- OCLC WorldCat and VIAF
- Getty Vocabulary Program.
Institutes like the Getty Vocabulary Program have contributed a union list of artist names (make that: 293,000 personal and corporate names).
The problem: a proliferation of the forms of names (for example, different people with the same names). EAD records are full of family names and within the structure it notes the creator of the archive (typically the complete autobiography is provided). This autobiography is extracted to the Encoded Archival Context for Corporate Bodies, Persons, and Families records (EAC-CPF) record.
We’re given names -- sometimes multiple names. Identical names means a complete Library of Congress record with attributes is available. If it’s an exact match, it’s marked. But marking doesn’t work for everything. Abbreviations are troublesome -- think transliteration of non-roman characters. We take names where we didn’t get an exact match, then test against library authority files. Do we find an exact match? We flag it as a potential merge. Is nothing matched by this stage? We create overlapping segments of three characters. Finally, we take all flagged as potential matches, do a find, make sure these are the ones we want. With the authoritative form of the name, we combine all EAC-CPF records. To give you an idea of volume, a recent test merged 93,033 person names from 114,639 person records," said Larson
In other words, the names are extracted from EAC-CPF and from existing EAD. If the EAC-CPF records match against one another and against existing authority records (for example, VIAF), then prototypes of historical resources and accessibility are created.
The most recent extraction results:
- Total: 175,637 EAC-CPF from 30,496
- corporate Body: 47,189
- person: 12,554
- family: 2,894
What’s important to Randy the Researcher: we’re creating standardized personas for target audiences. This is meant to be really elegant with enhanced search, merging information from multiple sources, multiple fields from finding aids. Our future plans include conducting an assessment of activities involving members of target audiences to establish mental models. We’re going to scale the interface to millions of names; we will create visualizations that are both useful and integrated; we’ll create stable URLs between batches; provide social and personalization features; and integrate with local systems.”
Establishing a National Archival Authorities Cooperative: Developing a BluePrint
Funded by the Institute of Museum and Library Services, the objective of the NAAC is to realize archival authority description at last, because archival authority control needs to be cooperative. “Imagine consistent use of names for the same entity across descriptions. The need to maintain only a single set of shared authority records and the economic benefits of cooperation outweigh the effort,” asserted Pitti.
The benefits for archivists:
- Working cooperatively will interrelate different collections, interrelate people;
- Cooperating authorities will enable integrated access to distributed records; and
- A shared national archival authority data set would be a substantial historical resource.
The benefits for users:
- Integrated access to distributed archival resources;
- Context for not only archival records but all cultural heritage resources;
- Social-professional-intellectual networks in which people lived and worked; and
- This context is also a bio-historical resource in itself.
Archivists describe the creators of and other people documented in records. The description of people is currently intermixed with the description of records through a single apparatus, the finding aid. Archivists have advocated for several decades to separate description of people from description of records. SNAC is demonstrating the power of this idea. SNAC is proving to be reasonably effective in extracting the relevant data, assembling authority records, matching and combining them,” Pitti explained.
Realize, though, that SNAC is a research and demonstration project, it will end!
At the May 2012 meeting, 83 participants (representing archivists, library, museums) were enthused and confused. Enthused, because they saw integrated access to archival holdings; access to social-professional-intellectual networks; access to individual biographies and histories. Confused, because with resources already stretched, what will be the costs of participating? What would participating require? Why not simply use computation methods, why are human editors necessary? Where will it reside, who will govern?
Participants decided that algorithms alone will not ensure accuracy, nor can the SNAC processing ensure the currency of the data. The NAAC should be built and maintained using a combination of computation methods, professionals with computer assistance, professionals with “crowd” assistance. Indeed, the professional community should be interpreted broadly, archivists, librarians, scholars, with a “continuum of perfection”: built, maintained and improved over time.
The next meeting of 24 or so archivists, librarians, scholars is in early October. The challenge: very difficult to address the business, governance and technological requirements separately. They’re interdependent. NARA has expressed a strong interest in hosting the cooperative. With shared vision, governance, development, maintenance, we understand this is for the benefit of the professional community and the benefit of the national and international users of archives. Expect a whitepaper published next year.
Transformation of NARA’s Authorities: a New Era of Context and Connection with EAC-CPF
Mr. Simmons, the authority cataloging team lead at NARA/Maryland, got straight to the point. “I have never been this excited about any initiative. Until this moment, NARA’s policy for person names have been AACR2 since the start of formal cataloging twelve years ago. Looking ahead, NARA’s authorities are being transformed by Resource Description and Access (RDA), EAC-CPF and SNAC. We’re anticipating using RDA March 31, 2013 in NACO.
NARA’s NACO contributions in the Library of Congress Name Authority File (LCNAF) eventually results in NARA’s names in VIAF. NARA can repurpose its own names authorities out of LCNAF in various formats. Once in SNAC, NARA will get a head start on EAC-CPF work through its local lifecycle data requirements guide (LCDRG), not by AACR2.
NARA’s new description and authority service will provide a more comfortable environment for manipulating data with more advanced record editing and merging features. Linking person names with corporate names isn’t a new concept at NARA. ARC, from the beginning, allowed for assigning, linking, person names as access points in corporate name authority records. So linking between people and organizations has been in practice at NARA for 10 years.
NARA isn’t naïve; it knows it will face challenges moving the ‘organization” names forward with RDA and EAC-CPF. Consider: in most cases there are no source notes to support names/variants. When source notes do exist, they are unstructured and cryptic. The silver lining, though, is EAC-CPF’s <conventionDeclaration> tag to ease the transition.
Editor's Note: Mimi has been reporting extensively from the SAA12 Conference. To read more: