RATOM Hackathon 2020: Email Processing Tools, Scripts, and Workflows for Archives

As the current iteration of the RATOM project is coming to a close, our team wanted to provide an opportunity to foster community engagement with the software and its surrounding work. During October 19-21, 2020, we hosted a hackathon to encourage community members to explore the software across three lightly scheduled and loosely structured days. The intent was to dedicate time and digital space for participants to interact with RATOM amongst fellow archival workers and with the guidance of our team.

We provided recommended topics for those interested in the project, including CLI and scripting, the RATOM web tool, and workflow or prototype development. Work and discussions largely took place on designated Slack channels and during a check-in Zoom session. RATOM’s Technical Lead, Kam Woods, led the hackathon’s opening kick-off meeting and provided reference materials about the software, installations guides, and sample data sets for participants to use. RATOM Software Engineer Antoine de Torcy, Co-PI Camille Tyndall Watson, and Investigator Jamie Patrick-Burns worked with hackathon participants, fielding a range of software usability questions and providing firsthand technical and archival expertise.

The CLI and scripting channel saw consistent activity across all three days. Most discussions revolved around RATOM-specific capabilities, but general threads also featured remarks about, for example, the pros and cons of tools such as SQLite and DBeaver. Participants inquired into whether or not the email system type (Gmail, Microsoft, etc.) would be included in metadata outputs. Other participants identified the (in)accuracies of extracted entities through spaCy amongst various factors, such as third-party tagging, incomplete sentences, acronyms, etc. Individuals also introduced notions of speed differentials between older and newer email formats. Lengthier threads also parsed through issues of null headers resulting from notes and calendar data being extracted from the bodies of emails. Finally, workflow modifications were also suggested to the RATOM team. One participant specifically suggested a workflow for formatted JSON files to export EML files for other tasks like extracting, rendering, and indexing.

Beyond CLI and scripting, alternatives to access and delivery were considered in the context of fulfilling records requests. This included case-specific examples of current delivery means through FTP, hard drives, and public interfaces. Conversations also touched on levels of digital literacy and education as broader, yet necessary components of these workflows that could be examined and possibly integrated into future work. PDFs were brought up alongside Adobe’s redaction capabilities and the possibility of archiving to PDF, although the latter would require further development with the RATOM tool specifically. Closing remarks underscored the importance of understanding audience needs and how processes may need to function on a case-by-case basis depending on a number of researcher-related variables.

Collaborative troubleshooting with minor installation hurdles, Jupyter notebooks, mybinder, and XCode version compatibility occurred throughout the hackathon, as did conversations about a future version of the web tool that could be deployed to a local container. Although we did not intend to use the hackathon for beta testing, the RATOM team will be able to implement a number of changes based on participant feedback. General comments from participants indicated that existing documentation was, for the most part, easy to understand and libratom was simple to download. But we are working towards a number of documentation modifications after seeing the process of participants go through it individually. We also had build issues related to compatibility with XCode 12 on macOS, which have been resolved in a recent update to the codebase and are now tested automatically in our continuous integration workflow.

The RATOM team would like to note it was especially rewarding to see so many participants representing state or government archives during the hackathon. Although some identified themselves as “non-technical” people, it was exciting to see communities that are not typically part of these conversations engaging with the software and with broader discussions as they took place on Slack and Zoom.

You can learn more about the RATOM project here, or you can follow our work on GitHub here