Site menu:

RSS Duke Digital Collections Blog

RSS A View to Hugh

RSS defenestrated

RSS Ensuring Democracy Through Digital Access

Site search

Recent Posts

Recent Comments

Categories

Engineering the Transition to Large-Scale Digitization

Over the past few weeks,  I’ve been doing a lot of revision to our digital collections proposal process at Duke. Over the last decade, our digitization efforts were project-focused. Library staff looked for meaningful themes and connections between our primary source collections and digitized to demonstrate those connections.  Through this approach we developed some of our most popular, flagship collections, such as Ad*AccessHistoric American Sheet Music, and others, and we continue to build on this solid foundation with our digitization today.

In order to transition to large-scale digitization, however, I’d like to re-engineer our approach so that the discovery of these meaningful connections between our primary sources happens AFTER digitization, not before. We hope to shift digitization from these small-scale, project-focused collections to digitization of entire archival collections or entire series, whenever possible.

One issue we’re having with this switch is that the very first step, our current proposal process for selecting and prioritizing digitization projects, was built to support the small-scale, interpretive approach, rather than the large-scale, comprehensive approach. My revision work over the past few weeks has been to transition from our existing project proposal process to what I’ve been calling a collection nomination process. I thought it would be useful to get feedback from Collaboratory readers.

So far the process looks like this:

  1. Review university and library strategic goals and determine collection strengths we want to support through digitization on an ongoing basis.
  2. Existing collection development groups determine the top priority collections or series to digitize to support those strengths. Nominations can be submitted anytime, but review and prioritization will occur quarterly.
  3. Special Collections reading room circulation and duplication requests will be analyzed quarterly and will be evaluated along with the collections nominated by library staff.
  4. Priorities will be staged and scheduled for production.

I’ve also been reading Laura Clark Brown’s excellent report, Extending the Reach of Southern Sources: Proceeding to Large-Scale Digitization of Manuscript Collections (PDF, 85pp), which includes an excellent “decision matrix” for prioritizing collections (Appendix G, p. 57), and I hope to develop something similar for prioritizing the nominated collections at Duke.

I hope to launch this new approach at Duke starting in October 2009. Over the course of the 2010 Fiscal Year, our goal is for 75% of digital objects published to be part of large-scale or open-ended collections of diverse formats. Wish us luck!

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • Technorati
  • Tumblr
  • Netvibes
  • Turn this article into a PDF!
  • Reddit
  • Yahoo! Bookmarks

Comments

Comment from Gretchen
Time September 29, 2009 at 7:35 am

I think the scheme is a good one. Our approach at ECU takes nominations from the special collections unit first, then work with a team representing all library units to rank and prioritize once per semester. Our unit is a bit different than yours though and we have many other competing demands so our digitization projects remain kind of small, but we’ve taken the same path of thinking of it as a “nomination” of what collection (or large portion thereof) to scan next rather than which “digital collection” to create next.

My only question for you is, how are you handling the metadata? If this increases scanning will it mean an increase in the original metadata you need to create or are you reusing finding aid description?

Comment from Rich Murray
Time October 1, 2009 at 11:26 am

Hi Gretchen — We certainly always try to reuse metadata whenever possible, including reusing finding aid description as you mention. But large-scale digitization as Jill describes it will certainly require more original metadata creation here at Duke. The head of our Cataloging & Metadata Services Department is committed to incorporating creation and management of metadata for digital collections into the regular job responsibilities of many if not all of the staff in her department, and the head of Special Collections Tech Services is also onboard. Our goal is to mainstream digital collections work, and treat it as just one of the many types of materials our catalogers and archivists work with. The new metadata editor we’re building as part of our Trident project will definitely be a critical part of this plan, because without a usable tool we won’t be able to ramp up metadata production like we want and need to.

We’ve put together a pilot team of 8 staff members from the Cataloging and Special Collections TS departments that are currently working on creating original non-MARC metadata for our Broadsides & Ephemera Digital Collection, using an early prototype of the new metadata tool. This project is a test of how integrating this work into catalogers’ and archivists’ routines will go, what reasonable turnaround times and expectations for productivity will be, etc. So far things are looking promising, but it’s early days yet. Still, in our distributed digital collections environment here at Duke, we feel fortunate to have department heads who are committed to supporting digital collections work and to making sure their staffs aren’t left out of this important direction the library is moving in.

Rich

Comment from Elizabeth
Time October 2, 2009 at 7:37 am

Rich/Jill, how in-depth is the metadata? Are you experimenting with a minimal approach, and if so, what fields are you minimally requiring?

Comment from Rich Murray
Time October 2, 2009 at 2:57 pm

To some extent the depth of the metadata varies from project to project, but we’re certainly investigating how minimal we can go without negatively affecting discovery too much. But when you’re working with thousands or tens of thousands of items, as we do in some of our digital collections, you often can’t describe at the level you’d ideally like to. Technically we’re only requiring Title, Type, and Identifier, but I don’t think we’ve had any projects where we’ve gone THAT minimal yet. But we’re seriously examining some of the more time-consuming parts of the metadata process, like using controlled-vocabulary subject headings, and looking for alternatives that might be quicker but still allow powerful searching (as we did with our Sidney Gamble photos from China). Leaning towards minimal approaches makes training quicker and easier, and opens up the work to more levels of staffing, as well as (obviously) letting us describe more digital objects in less time, but at the same time, you don’t want to cut the metadata down so much the objects aren’t discoverable. So it’s always a balancing act….

For an example of a digital collection where we’ve taken a more minimal approach with the metadata, take a look at our AdViews collection in iTunes U. The metadata there is pretty basic, due partly to the limitations of iTunes but also because we needed to push out thousands of items in a short period of time.

We’re also talking about collecting a bare minimum of metadata at the point of digitization and using it to make the digital objects immediately available, then going back later and enhancing the metadata. Currently we wait till all the metadata is complete and then push out an entire collection at once, which means there can be a significant lag between digitization and publication. I’d like to at least try the “publish immediately with a metadata stub and then enhance it later” approach to see what the pros and cons are, and how it fits into our workflows.

Rich

Comment from Kim Klausner
Time November 20, 2009 at 4:31 pm

I manage the Legacy Tobacco Documents Library (http://legacy.library.ucsf.edu) with 10+million documents and the Drug Industry Document Archive (http://dida.library.ucsf.edu), with 2500 pharma documents. There are potentially millions of pharma documents becoming available as a result of litigation. We’d love to add these documents but we need a certain amount of metadata for each document in order to do so. We’re planning on developing web-based, open-source software for metadata entry and enlisting students’ help (on the crowd-sourcing model). I’d like to email or talk with you about a proposal I’m writing for funding of this project.

Write a comment