COVID-19 has undeniably overtaken the world in the last year, but infectious disease specialists knew this would be the case. Back in the early months of 2020, a team of data scientists and disease specialists began cultivating a list of every case worldwide.
Last year, a Google Sheet of approximately 80,000 international active COVID-19 cases was updated. As of February, that data has transformed into a fully functional, free internet database that consolidates information regarding global disease outbreaks under the name ‘Global.health’.
The database is a condensed list of all coronavirus cases on record since the initial outbreak in January 2020. Global.health’s team is hoping that by having the information be completely open to public access, anyone can use the data to develop new disease response information and initiatives. The website features an interactive map and a data dictionary, as well as a comprehensive list of cases, its outcomes and anonymized patient data.
Sam Scarpino, an assistant professor at Northeastern specializing in marine and environmental science, was one of the co-founders of this data initiative. Scarpino, whose doctorate degree involved infectious disease modeling and public health decision-making, was involved in planning for Northeastern’s reopening last fall.
“At the end of last January, there was a group of researchers who were just manually entering COVID-19 case records as they got reported. So, there would be a news alert that somebody had tested positive in Japan, and we would capture that information on a Google spreadsheet. By about this time last year, we were running up against the limit of the size of a Google spreadsheet, which, in our case, is about 80,000 [datapoints],” Scarpino said.
In the beginning stages, once Global.health had transitioned out of just working on a spreadsheet, Scarpino described it as an early- to mid-stage technology startup with many different people gathering information about how the public would interact with the published data.
“That information then fed into the design descriptions of the software tool that we were going to build, which then feeds into the engineering teams who actually figure out how to build it and then make it,” Scarpino said.
The platform can be accessed completely freely by the public, providing data for any and all who may find it beneficial. Scarpino said this data will be helpful even after the pandemic.
“[Monitoring for flare ups] is a lot harder to do from a surveillance perspective because, right now, if somebody has a respiratory infection, chances are still pretty good that it’s COVID,” Scarpino said. “But hopefully coming into the fall, things will be a bit more normal, and, if somebody has an upper respiratory infection, it could be one of many different things.”
Scarpino began as a volunteer, before eventually becoming one of the official co-founders. He worked in collaboration with product designers, researchers and data analysts, as well as 15 Google employees whose role was specifically crafting the software. Scarpino collaborated with project lead Joe Brilliant throughout the development of Global.health.
Brilliant, who worked with Scarpino in previous years on other start-up projects, was responsible for bringing him onto the project.
“I learned a lot from him, and I think he’s only grown. His skillset, like a lot of epidemiologists and people in his field, became really critical in the pandemic just to help decision-makers understand what was going on,” Brilliant said. “What he’s so great at is communicating, [he was] working with reporters, through his own channels on Twitter and other partnerships to help contextualize it and make sense of a very complex, challenging, fast moving, ever-changing problem.”
Another individual who worked closely with Scarpino on both Global.health and projects in the past is Robel Kassa, whose background is in computer science. Kassa worked on data visualization for Global.health during development but has since moved on to other projects. Throughout this year, he’s also done contract work on Northeastern’s COVID-19 reopening plan.
“Sam sort of has been my gateway into all these amazing projects that I love, [and] that I’ve been lucky enough to work on,” Kassa said.
Kassa’s role as a software engineer for Global.health allowed him to work on many different portions of the project.
“My main focus was building out the online presence or the marketing website. So, when you go to Global.health, what you see there in terms of presentation and functionality, that was something I collaborated on with the design team at Google Earth,” Kassa said. “When the marketing website and all my personal stuff was sort of getting tied up, I moved on to supporting the map visualization part of global health.”
Kassa initially struggled to process such a large collection of data. He said that the initiative was “overwhelming at first.”
“I couldn’t even get my head around the fact that there’s a chance that there’s going to be more than 20 million rows of individual lined data,” he said. “That was an incredible amount of data, and it took a lot of work. It was definitely designed to challenge [us], in terms of what’s the best way that all of this [information] can function in your browser and not kill you.”
Now that Global.health has launched, Scarpino, Brilliant, Kassa and their team are trying to plan for the project’s next steps.
“The surveillance systems that we’re building, [and] the data we’re capturing are going to become increasingly important as we go back down toward lower levels of COVID-19 transmission, and we need really fine grained detail and carefully curated data to help guide the response and avoid additional lockdowns,” Scarpino said. “We’re also interested in Global.health software tools being used by ministries of health globally as a part of a rapid outbreak response.”