ML4ARC Agenda

9:00-9:10Welcome and Introduction – Cal Lee, University of North Carolina
9:10-9:45Visualization and Access
• Ray Wang - Using Machine Learning to Visualize and Categorize Large Document Collections
• Carl Wilson, Open Preservation Foundation - Machine Learning for Web Content Accessibility Guidelines (WCAG) and PDF/UA Validation
10:00-10:40New Workflows
• Mike Shallcross, Indiana University – Workflow Pain Points and Opportunities for ML
• Emily Higgs, North Carolina State University - Embedding NER in Born-Digital Processing Workflows
• Kathleen Jordan, Library of Virginia - "I don't understand.....Wikileaks was able to get everyone else's records up very quickly": Virginia HB170
10:40-11:20Interoperability and System Dependencies
• Matthew Farrell, Duke University – Planning for Handoffs: Lessons from OSSArcFlow
• Justin Simpson, Artefactual - Preservation Action Registries and Beyond: Avoiding Stovepipes
• Euan Cochrane, Yale University - Machine Learning for Software Preservation
1:00-2:00Curation of Email – Challenges and Strategies
• Kate Murray, Library of Congress - The Future of Email Archives: A Report Summary from the Task Force on Technical Approaches for Email Archives
• Kevin De Vorsey, US National Archives and Records Administration - Conversion Criteria and Requirements for Archiving Email into PDF Containers • Lynda Schmitz Fuhrig, Smithsonian Institute Archives - Tackling PII Challenges in Email: Revisiting the Smithsonian Institution Archives’ Collections
• Joanne Kaczmarek, University of Illinois- Processing Capstone Email Using Predictive Coding
2:00-3:00Curation of Email – Developing Tools
• Glynn Edwards, Stanford University – Natural Language Processing (NLP) with ePADD
• Jamie Patrick-Burns and Camille Tyndall Watson, State Archives of North Carolina - TOMES
• Kam Woods and Antoine de Torcy, University of North Carolina at Chapel Hill – Review, Appraisal and Triage of Mail (RATOM)
3:15-4:00Breakout Discussions
4:00-5:00Conclusions and Next Steps