HathiTrust Research Center Digital Scholarship Workshops (Day 1)
- Thursday, May 2, 2019 from 9:00am to 1:00pm
- Reid Hall 302
This workshop will introduce attendees to the HathiTrust Research Center’s (HTRC) tools and services for utilizing the massive HathiTrust Digital Library in computational text analysis. The HTRC leverages the scope and scale of HathiTrust Digital Library’s holdings to allow researchers the opportunity to perform text data mining. Workshop attendees will develop skills that will allow them to conduct text analysis research using HathiTrust data.
The workshop will be split into two days. Day 1 will include an introduction to HathiTrust and the HTRC, how to access data from each, and a gentle overview of text mining using off-the-shelf tools. Day 2 will cover the HTRC Bookworm tool, Extracted Features dataset, and Data Capsule computing environment.
Topics to be covered include:
- How the HTRC makes HathiTrust volumes available for text mining
- How to identify relevant volumes and build worksets (datasets) of content for analysis
- How to access HathiTrust data and metadata via provided APIs, request procedures, and open datasets.
- HTRC tools and systems for text data mining