Environmental Factor – May 2024: Data science event showcases new approaches to environmental health
Researchers and big data specialists from across the country, including NIEHS grantees, gathered at North Carolina State University (NC State) April 5 for the Data Science and Environmental Health Science Research Symposium. Sponsored in part by two NIEHS-funded centers, the Center for Environmental and Health Effects of PFAS and the Center for Human Health and the Environment, the event was designed to foster collaborative discussion and present cutting-edge work in the field of data science.
During the daylong gathering, hosted by NC State Professor Seth Kullman, Ph.D., attendees shared their research, participated in poster sessions, and networked with fellow scientists. Topics varied from addressing environmental health disparities through data mining to new methods in population genomics, which involves providing hazard and risk assessments without animal testing.
NIEHS Director Rick Woychik, Ph.D., kicked off the day by outlining the institute’s strategic priorities: the exposome, precision environmental health, mechanistic toxicology/biology, climate change and health, environmental justice, and — particularly relevant to the symposium’s theme — data science.
“There’s a lot of data management that’s going to have to happen if environmental health science is to move forward systematically,” Woychik said. “But how do we develop infrastructure that is robust, avoids duplication, and allows us to test hypotheses? That’s the problem for us to solve.”
Meeting the big data challenge
In her keynote address, Francesca Dominici, Ph.D., a professor of biostatistics at the Harvard T.H. Chan School of Public Health and NIEHS grantee, responded to this challenge. In particular, she described how her team was using innovative data science methods to change environmental policy.
Prior to February, the air quality limit set by the Environmental Protection Agency (EPA) for fine particulate matter (PM) measuring less than 2.5 microns in diameter (PM2.5) was 12 micrograms per cubic meter. However, Dominici believed that even these levels of pollution could be harmful.
To test this theory, her team compared mortality rates in different Zip codes across the U.S. to PM2.5 levels at each location. They found that every 10 micrograms increase in PM2.5 led to an 11% increase in mortality, even at levels well below the limit of 12 micrograms per cubic meter.
“Based on our research, the EPA raised its standards for fine particulate matter to 9 micrograms per cubic meter,” said Dominici. “This is a big deal, not only because it means we are going to be breathing cleaner air, but also because it showed the impact data science can have on policy.”
Fostering collaborative data science
Following the keynote, the symposium split up into targeted sessions on various subjects. Though each speaker focused on a different topic, many of them highlighted the same underlying theme: the importance of collaboration and data sharing between institutions.
For instance, NIEHS scientist Alison Motsinger-Reif, Ph.D., called attention to her team’s efforts to provide access to data from the Personalized Environment and Genes Study (PEGS), which includes health information on more than 9,000 participants.
“We talk a lot about data sharing in our lab, so we decided to create an interactive web tool that can be used to engage directly with PEGS,” she said. “You can download the data, visualize it, and manipulate it. We’re thrilled to have this resource and fully committed to growing it in the future.”
For some participants, making environmental health data more widely accessible was not only a way of helping to promote collaboration, but also a means of producing better science — especially when dealing with complicated phenomena such as gene-environment interactions.
Chirag Patel, Ph.D., a professor of biomedical informatics at Harvard University, noted that data sharing is particularly crucial for large-scale research projects that seek to differentiate between possible causes of disease within a population.
“We’re at an incredible time in the field of environmental health sciences when we have all these different data sets coming together,” said Patel. “Because of this, we can integrate multidimensional data across a number of measures to try and figure out where the elusive risk is for highly complicated traits.”
(Ben Richardson, Ph.D., is a Presidential Management Fellow in the NIEHS Office of Communications and Public Liaison.)
link