Library hosts data archiving event


Hannah Bernstein

After the presidential election, a fellow in the University of Pennsylvania’s (UPenn) Environmental Humanities program received an e-mail. It was from an urban planner, concerned that federal data they used for their work would become unavailable under the new administration.

That e-mail spurred the Environmental Humanities program into action, said Margaret Janz, a data curation librarian at UPenn. They created a project, DataRefuge, which aims to preserve and archive federal data and websites.

DataRefuge has spread across the country and made its way to Northeastern on Friday, when Snell Library hosted a data rescue event on campus. The event focused on archiving, or “rescuing,” federal websites that could be at risk under the new administration, such as those with climate and environmental information.

Jen Ferguson, the Northeastern research data management librarian, said librarians are realizing that while data and information can still be found in books and journal articles, online data is vulnerable under a new administration that insists on denying climate change.

“We know that our faculty and students and staff need access to these data sets to do their work,” Ferguson said. “We know that our concerned citizens need it as well to be the best informed people they can be and participate in their democracy.”

Sara Wylie, assistant professor of sociology, anthropology and health sciences, said the event featured three tracks: Seeding and harvesting, archiving websites and storytelling.

Wylie explained that seeding and harvesting, a less technical track, involves downloading or preserving the information on the websites. Archiving tackles websites that cannot be easily downloaded—these include data like interactive maps and search databases that are harder to understand.

Storytelling, Wylie said, can involve many different pieces depending on the event. At Northeastern, the storytelling track featured sign-making for science marches, web design and data visualization projects.

Wylie is also a co-founding member of the Environmental Data Governance Initiative (EDGI), a project to preserve and monitor federal data. She said EDGI seeks to use technology to protect free information.

“How can we, the public, take on some of the jobs of archiving this work and making sure it remains accessible and in the public domain?” Wylie said. “Then, pairing that work with monitoring, how can we track what’s happening to federal websites? How can we track what’s happening to federal agencies better, using our digital tools?”

Freshman Meghan McCallister, an environmental science and political science combined major, participated in the seeding portion of the event, archiving pages about fire on the U.S. Fish and Wildlife Service website. She said despite her lack of technical skills, she wanted to help.

“We’ve talked a lot about how things like the EPA [Environmental Protection Agency] are getting cut in my classes,” McCallister said. “I really don’t know much about computers and this seemed like an accessible way to learn more.”

Looking forward, McCallister said she wants to learn Python, a computer coding language often used at data rescue events, in order to get more involved. She also hopes to continue learning about climate change science and policy on dialogue in Indonesia and Singapore this summer
“A lot of the information we’re going to be dealing with [on dialogue] is looking at big data sets and learning programming like Python,” McCallister said. “This has showed me a real application of those skills.”

Janz said archiving federal data matters regardless of the administration.

“While it was in reaction to the election, the election really just opened our eyes,” Janz said. “Having this data only on federal servers is not a good idea. It should have been backed up by institutions off of federal servers for a long time.”