I'm Gregory Wiedeman,

an archivist who makes information more accessible. Sometimes I write, give presentations, or code.


This is a place to put some of the things I'm working on. I am currently the University Archivist at the University of Albany, SUNY.

I focus on collecting documentation of the university and making it available for research use. I get to solve problems with paper records and legacy digital records, work with metadata at scale, and create ways to make a lot of content available on the web.

I do a ton of work with XML, Python, obsolete digital formats, and web technologies.

I have a research background in 19th century American History and did some work on how people understood faraway places, particularly the Eastern Mediterranean.

This is not really a blog, but I have some plans to write stuff here that's not really suitable for publication.


I write mostly on archives, particularly automating collection management and born-digital records.

This is an overview our ArchivesSpace migration process and descriptions of the code used. Hopefully this rather long-winded description helps document our processes and thinking.

Review of Metadata by Marcia Lei Zeng and Jian Qin. Discusses different views of metadata among libraries and archives. Published in Archival Issues.

Article that shows archivists how Python can be used to automate workflows between different systems and reduce manual tasks. Published in Practical Technology for Archives Issue no.7.

Working with a multitude of digital tools is now a core part of an archivist’s skillset. We work with collection management systems, digital asset management systems, public access systems, ticketing or request systems, local databases, general web applications, and systems built on smaller systems linked through application programming interfaces (APIs). Over the past years, more and more of these applications have evolved to meet a variety of archival processes. We no longer expect a single tool to solve all our needs and embraced the "separation of concerns" design principle that smaller, problem-specific and modular systems are more effective than large monolithic tools that try to do everything. All of this has made the lives of archivists easier and empowered us to make our collections more accessible to our users.

Guest post for the Archive-It Blog: In recent years many archives have expanded to preserve the web among other formats in their traditional collecting areas. Yet, unlike traditional formats, the best way to make web archives available to researchers is not with boxes and call numbers, but in their original environment — an Internet-connected web browser. This post discusses how at the University at Albany, SUNY, we are providing minimal access to this new type of archival material together with other formats that originated with the same creators.

Instructional module for the CPR Electronic Records Committee that describes how to do a web crawl using Archive-It

Code4Lib article on the development of ANTS, the digital records transfer system I developed to package records with checksums and filesystem metadata and make network transfers.

Archivists have developed a consensus that forensic disk imaging is the easiest and most effective way to preserve the authenticity and integrity of born-digital materials. Yet, disk imaging also has the potential to conflict with the needs of institutional archives – particularly those governed by public records laws. An alternative possibility is to systematically employ digital forensics tools during accession to acquire a limited amount of contextual metadata from filesystems. This paper will discuss the development of a desktop application that enables records creators to transfer digital records while employing basic digital forensics tools records’ native computing environment to gather record-events from NTFS filesystems.

Article that introduces XQuery to non-programmer archivists and encourages them to see archival description as data. Published in Practical Technology for Archives Issue no.3.

XML has long been an important tool for archivists. The addition of XQuery provides a simple and easy-to-learn tool to extract, transform, and manipulate the large amounts of XML data that archival repositories have committed resources to develop and maintain – particularly EAD finding aids. XQuery allows archivists to make use of that data. Furthermore, using XQuery to query EAD finding aids, rather than merely reformat them with XSLT, forces archivists to look at finding aids as data. This will provide better knowledge of how EAD may be used and further understanding of how finding aids may be better encoded. This article provides a simple how-to guide to get archivists to start experimenting with XQuery.

Unpublished paper I wrote in graduate school for a preservation class. It argues that Nicholson Baker's criticism of microfilming newspapers is muddled in misunderstanding and that the more effective argument against this practice is that preservation microfilming often ignores access concerns. The argument I like best in this piece is that use, not just time, is what damages acidic newsprint.

Unpublished paper I wrote in graduate school on how to make finding aids more accessible on the web.

Unplublished master's thesis.

This study addresses how nineteenth-century Americans perceived the lands of the Eastern Mediterranean. The project rests upon a detailed examination of American primary school geography textbooks that enjoyed widespread circulation during the century. The lack of an effective education apparatus in the period rendered American students incredibly reliant on their textbooks. These texts reflect the general common knowledge of the region shared by most educated Americans. Additionally, this study draws support from a thorough analysis of travel accounts that were extraordinarily popular during the period. These works offered Americans a chance to explore vicariously the most interesting lands of the Levant.

Nineteenth-century Americans sought to locate their essential place, meaning and mission within a universal system of world processes. Geography authors fulfilled this social need by providing students with a systemized structure of knowledge about the Eastern Mediterranean. This framework enabled students to address the complex realities of the region in a simplified and palatable manner – a process that also used to satisfy various social pressures. This episteme of the Eastern Mediterranean provided the context for Americans to regulate their self-meanings and cultural missions in the nineteenth century. Often, the concepts of this knowledge structure took the form of dichotomies which acted as defining antitheses. Students located themselves within these oppositions which became constructs of Sameness and Otherness. The structured framework of knowledge about the Levant provided the setting in which these processes played out. Thus, the people, places, and practices of the region were marked as aspects of “us” and “them” – of heritage and Otherness.


Providing Computational Access to Records of American Capital Punishment

2019 February

Code4Lib 2019 talk on the challenges of providing computational access to the Espy Papers, including presenting the material in context and the metadata empathetically.

Challenges and Conflicts of Linked Data in Archives

2018 August

Talk that askes if publishing Linked Data conflicts with the mission of archives. Part of SAA 2018 Session 303: Progress (and Pitfalls) of Linked Data Projects. (pptx)

Describing Web Archives with the Partner Data API

2018 August

Talk at the Archive-It Partner Meeting on using DACS and archvial practices to describe web archives using the Partner Data API.

Processing Born-Digital Images at Scale

2018 August

Remote talk for the NDSA Content Interest Group looking back on processing born-digital images at scale, maintenance issues, and building infrastructure.

The Espy Project: Enabling New Access to Archival Materials

2018 April

CNI project report on providing API access to data from a digitization project and using that technology to rework our university records collecting system.

The Espy Project: Building Digital Infrastructure

2018 March

Talk I gave on the Espy Project and how its helping us build technology for collecting university records for our library's Professional Activities Committee.

GIt and Github for Archives

2017 November

Short workshop on Git and Github that Mark Wolfe and I gave at the New York Library Association (NYLA) Annual Conference

Challenges of Digital Archives at UAlbany

2017 October

A talk I gave to a Capital Area Archives (CAA) event on UAlbany's Downtown Campus about born-digital archives

Documenting the Macabre: The Espy Project

2017 October

My part of our 2017 MARAC panel on the National Death Penalty Archives, "Documenting the Macabre"

Data Analytics Approaches for Web Archives

2017 October

Talk on potential research uses of web archives for UAlbany's Data Analytics Lightning Talks series

University Archives in Practice at UAlbany

2017 October

Guest class for UAlbany's Archives and Manuscript course

Beyond Finding Aids: New Approaches to Archival Description

2017 July

Our SAA 2017 panel on rethinking finding aids for presenting archival description on the web

Describing Web Archives: Automating Access with APIs and ArchivesSpace

2017 June

Talk about using DACS to describe web archives, and authomating the process with APIs

Link Ranking in Web Archives

2017 June

WInning project for Archives Unleashed 4.0 at the British Library

Providing Basic Access to Web Archives Provenance Data

2017 June

Lightning Talk at Archives Unleashed 4.0 at the British Library

Pragmatic Processing: Diverse Approaches to Different Collections

2017 June

Into to a panel I chaired on pragmatic/extensible processing for the 2017 New York Archives Conference in Utica

Born-Digital Records in Practice at UAlbany

2017 April

Guest class for Electronic Records Management graduate course.

What's a Digital Archivist?

2017 April

Talk at MARAC Newark on the differences between "Digital Archivists" and IT Professionals

Automating Web Archives Records in ASpace

2017 February

Short talk at WASAPI Symposium at the Internet Archive

No More Finding Aids: A New Frontend for Special Collections & Archives at UAlbany

2016 December

Presentation at New England Code4Lib in Amherst, MA that argues the finding aid as a concept is no longer useful.

APIs, WARCS, and NPL: How Technology has Created New Opportunities for Historical Research

2016 November

Presentation at Researching New York/Conference on New York State History in Albany, NY

Updating web archives extents/dates using the ASpace API and Archive-it CDX

2016 October

Talk and hands-on workshop at Beyond the Basics ArchivesSpace Skill Share, Philadelphia, PA

SIP Your Pics: an almost-OAIS workflow for 180,000 images

2016 August

Part of a panel on processing born-digital photographs for the SAA Annual Meeting 2016 in Atlanta.

A Sustainable, Large-Scale, Minimal Approach to Accessing Web Archives

2016 August

Talk for the Archive-It Partner's meeting on automating access to Web Archives in finding aids using Archive-It's CDX API

Marcia Brown at the State College for Teachers (1936-1940)

2016 April

Renowned author and illustrator's college experience

How to Metadata

2016 March

Guest lecture for graduate Digital Libraries class

Transferring Records to the Archives with ANTS

2016 March

Code4Lib 2016 Lightning Talk

Auto Upload and On-Demand Digitization

2016 January

Talk at METRO conference

Managing Descriptive Metadata with Open XML... For Now

2015 August

Lightning talk I gave during the Collection Management Tools Roundtable at SAA in Cleveland

Effective Metadata Systems for Archives: EAD and Unified Collection Management Systems

2015 June

Talk I gave at the 2015 New York Archives Conference in Fredonia, NY

Using XML data with XQuery

2014 November

Guest lesson for graduate XML class

Lowell Thomas and American Cultural Knowledge of Palestine and Arabia

2013 March

Talk at Northeastern University Graduate History Conference


Espy Project Metadata Tool

Ruby on Rails

Workflow application for linking different types of records and creating metadata for the Espy Project.



Actively used tool for importing and exporting file-level ArchivesSpace inventories with spreadsheets.


Python, PowerShell

Tool to facilitate on-demand digitization with ArchivesSpace. Includes UAlbany on-demand digitization package spec.

ArchivesSpace Python Library


A Python library for working with the ArchivesSpace API. Will be superseded by community ArchivesSnake project.



Experimental tool for automating description for Web Archives in ArchivesSpace according to archival principles using the Archive-It CDX and Partner Data APIs.



Command line tool for automating SIPs as bags with an additional metadata file extracted from filesystem data. Currently being reworked to comply with Bagit Profiles.

ArchivesSpace Migration


UAlbany's ArchivesSpace migration docs and scripts.

Link Ranking Project Repo

Python, D3Plus, Spark

Link Ranking Project repository for Archives Unleashed 4.0 Web Archives Datathon.


Python, NLTK, D3Plus

Experimenting with Twarc and an estimated list of Twitter bots for a university Data Analytics talk.

Example Scripts for Learning Python


Basic sample scripts to get some results with Python. For use with "Python for Archivists".

Researching NY 2016 Scripts

Python, D3Plus

Example scripts for using computational tools like NLP for historical research. For November 2016 presentation

ArchivesSpace/Archive-It Example Workshop


This is a working example of how to automate description of Web Archives in ArchivesSpace using the API and the Archive-It and Internet Archive CDX Servers. For ASpace Skill Share. Superseded by describingWebArchives.

Basic access to web archives scripts


Scripts for adding web archives to EAD files from Archive-It's CDX API. For Archive-It Blog post

Processing Scripts for Born-Digital Photos


Scripts used to automate extraction of 180,000 photos from disk images and make static pages for display

Random Born-Digital Scripts

Mostly Python

Includes script to automate disk imaging from external drives

Collections Access System

XSLT, Bootstrap, CSS, JQuery

Custom XTF instance with Bootstrap 3

ANTS: Archives Network Transfer System

Python, wx, Pyinstaller

ANTS is designed for transfering digital records to an institutional archives using digital forensics tools.

EADMachine 2.0

Python, wx, Pyinstaller

Simple GUI to quickly create basic collection-level or series-level EAD finding aids

Auto Upload


Experimental scripts for on-demand digitization with EAD. Superseded by UploadTool

EAD tools


Python scripts for cleaning up and standardizing EAD, and inserting semantic identifiers


Python, Bootstrap

Script for strict rule-based validation for EAD files


Python, wx, Pyinstaller

EADMachine is an easy EAD creation and editing tool. It is a Python .exe that reads and writes complete EAD files from spreadsheets.


Gregory Wiedeman

Science Library 356
1400 Washington Ave
Albany, NY 12222


2013 M.S. Information Science
University at Albany, SUNY

  • ALA accredited program with concentration in Archives & Records Management

2013 M.A. History
University at Albany, SUNY

2010 B.A. History
Marist College, Poughkeepsie, NY


2015-Present University Archivist

M.E. Grenander Department of Special Collections and Archives
University Libraries
University at Albany, SUNY

  • Manages acquisitions, accessioning, collection development, archival processing for the University Archives

  • Developing born-digital records collections program for University Archives that meets pubic records laws and professional standards

    • Also leads born-digital collecting for manuscript collections
  • Implemented minimal processing techniques and use-based processing to reduce the number of inaccessible collections

  • Performs primary reference services for University Archives, working with students, faculty, staff, alumni, and the general public

  • Oversees web archiving program for Albany.edu domain and outside collecting areas

  • Leading role in implementing new technologies in collection management and public access

    • Designed and developed new web access system for archival collections

    • Mix of Drupal, XTF, and Static page generation, with Bootstrap 3 and back-end Python scripting

    • Leading ArchivesSpace implementation and data migration

    • Implemented guerrilla user testing program

    • Led year-long legacy metadata cleanup process.

    • Implemented reference ticket system and provided vision and direction for on-demand digitization

  • Manages undergraduate and graduate student assistants performing archival processing and digitization

2014-2015 Project Archivist

M.E. Grenander Department of Special Collections and Archives
University Libraries
University at Albany, SUNY

  • Implemented Council On Library and Information Resources (CLIR) Hidden Collections grant to arrange and describe 710 cubic feet of the National Death Penalty Archives (July 2014-March 2015)

    • Processed records of empirical research on Capital Punishment and the effect of race in charging, sentencing, and jury decision-making
  • Supervised three Graduate Student Assistants and three Undergraduate Student Assistants processing a variety of archival material in accordance with Describing Archives: A Content Standard (DACS)

  • Undertook New York State Documentary Heritage Program Grant (April 2014-June 2014) to arrange and describe four collections containing 82 cubic feet of material concerning LGBT civil rights activism

  • Developed an automated Encoded Archival Description (EAD) authoring workflow that maintains consistent local practices and is comfortable for archivists without technical expertise

    • Uses Microsoft Excel XML mapping and basic Python desktop GUI

2011-2014 Project Archivist (2013-2014), Archives Assistant (2011-2013)

Archives and Special Collections
James A. Cannavino Library
Marist College, Poughkeepsie, N.Y.

  • Responsible for the arrangement and description of all major incoming collections from July 2011-April 2014, using professional processing standards, including DACS, MARC, LCNAF, AAT

  • Performed regular in-person and remote reference services, working regularly with undergraduate students and scholarly researchers

  • Regularly managed 5-12 undergraduate student assistants and occasional MLS interns in archival processing, digitization, and creating online exhibits

  • Developed web exhibits, online finding aids, and access tools using XML, EAD, XSLT, XQuery, HTML, and CSS

  • Regularly managed large-scale digitization projects

    • Helped manage Lowell Thomas Digitization Project ($103,979 NHPRC Grant)

    • Over 39,000 images in more than 8 photographic formats

    • Coordinated metadata creation and quality control workflow

    • Developed imaging standards in collaboration

  • Performed regular and on-demand preservation activities

  • Directed the movement and storage of the department’s entire holdings for renovation in summer of 2012

  • Developed copyright and use statements for online collections

  • Performed off-site appraisal of Arthur Glowka Papers

2012-2014 Collections Project Manager

Dutchess County Historical Society, Poughkeepsie, NY

  • Arranged and described 19th century collections and made them available for use

  • Collections included comprehensive administrative records from local units in both the Civil War and the American Revolution

  • Implemented DACS, minimal processing techniques, and other professional standards

  • Wrote and obtained $8,297 New York State Documentary Heritage Grant for the Hubbard Family Papers

  • Developed and implemented low-cost preservation procedures

  • Designed new, easy-to-use collection management and access workflows that are not software-dependent

2009-2010 Undergraduate Student Assistant

Archives and Special Collections
James A. Cannavino Library
Marist College, Poughkeepsie, N.Y.

  • Participated in the 2 year project to arrange and describe the 1,200 linear feet Lowell Thomas Papers, supported by $140,000 NHPRC Grant


Applications – git, Microsoft Office, Adobe Creative Suite, eXist XML Database, BaseX
Archives Applications – ArchivesSpace, Archivist’s Toolkit, BitCurator Environment, PastPerfect
CMS/Frameworks – Drupal, WordPress, Bootstrap
Disk Imaging – Guymager, dd, FTK Imager
Digital Forensic Tools – The Sleuth Kit, fiwalk, Plaso
Digital Repositories – Luna Insight
Digital Scholarship Tools – NLTK, twarc, D3Plus
Markup Languages – HTML 5, JSON, XML, XSLT, Markdown
Metadata Standards – DACS, EAD (EAD 2002 and EAD3), MODS, METS, PREMIS, Dublin Core, Schema.org
Operating Systems – Windows, Ubuntu Linux
Programming Languages – Python, PHP, XQuery, CSS 3, JavaScript
Shell Scripting – Bash, Windows PowerShell
System Administration – Apache, Tomcat, MySQL, ArchivesSpace