Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.


CCC Responds to RFI on NIH’s Strategic Plan for Data Science 2023-2028

March 14th, 2024 / in Announcements, CCC, health / by Haley Griffin

Today, CCC submitted a response to a Request for Information released by the National Institutes of Health (NIH) on their Strategic Plan for Data Science 2023-2028. The response was written by the following computing experts: Tony Capra (University of California-San Francisco), David Danks (University of California San Diego, CCC Council Member), Haley Griffin (CCC), Carl Kingsford (Carnegie Mellon University), Rittika Shamsuddin (Oklahoma State), Katie A. Siek (Indiana University, CCC Council Member), Mona Singh (Princeton University, CCC Council Member), Donna Slonim (Tufts University), and Tammy Toscos (Parkview Health, CRA-I Council Member).

The authors applauded NIH for an impressive list of aspirations in the Strategic Plan, but raised concerns about the training, expertise, data and additional funds needed to implement the plan. They also noted that more of the recommendations should be required rather than suggested.

They also made the following recommendations for improving the Strategic Plan:

Additional details needed to enable implementation: 

  • Consider how to capture qualitative and media-rich data that can be used in future data science analysis.
  • Encourage the definition and maintenance of metadata that capture the context and history of data collected.
  • Include IT leaders from state and local departments of health when adopting health IT standards.
  • Support the design of strategic ways to address the social needs of individuals/communities in order to ensure that the data that are collected are representative, ethically sourced, and meaningfully impactful.
  • Define strategies to address miscommunication and lack of awareness among the general public about health data use for research, as transparency does not automatically lead to community understanding.
  • Require higher education institutions to document how they support interdisciplinary research.
  • Clearly define and support public-private partnerships to account for the real-world pressure on health systems.
  • Consider the issues and opportunities of synthetic data generated by AI/ML systems.
  • Include a plan for when incorrect data is integrated. AI/ML tools for identifying and correcting errors should be supported.
  • Require institutions to have checks and balances to ensure people from historically excluded groups are provided with real research experiences and treated ethically.
  • Use mechanisms, documentation, and reporting as necessary to show how funded institutions have worked to decrease the need to teach diverse groups about “resilience.”
  • Include the closure of gaps in data of communities that do not have regular access to health care systems as a major goal or subgoal in the plan.
  • Consider opportunity gaps in data access between well-funded, established institutions and institutions without as much funding and access in grant budgets in order to make funding accessible to all health organizations. 

Additional funds/resources to support implementation:

  • Increase initiatives aimed to support dual appointment positions and interdisciplinary positions. 
  • Support implementation science training, perhaps in the form of a call to adapt implementation science frameworks in the development of new software technologies.
  • Support access to compute resources like GPUs via both funding for new hardware at diverse institutions, and provide access to shared cloud resources at rates that are affordable given current NIH grant budget levels.
  • Support pure computational research (during study sections and review criteria) that has application to biological data rather than only applied biomedical research.
  • Support standardized data formats that include requirements on data content (required fields, standardized terminology) so that the data is ready to be inserted into AI systems and analyzed.
  • Support summer research opportunities for MS students in order to help the pipeline of future data science researchers.
  • Provide funding to mentors to not only mentor, but to also keep their research going with low overhead research funding proposals. Additionally, require documentation from institutions on how research mentoring of historically excluded groups is valued in their promotion and tenure in service, teaching, and research.
  • Provide funding mechanisms that help trainees stay in the training pipeline.
  • Develop tools to help users easily contribute to, access data within, and interpret information derived from these resources (like the NIH’s website) to expand access and ease of leveraging data.  

The authors also suggested the following partnerships that NIH could engage in:

  • Local nonprofits/community organizations to help NIH reach under-resourced communities, provide funding where it is needed most, and communicate with impacted populations. 
  • Federal institutions that support data and/or systems research, including FFRDCs that have a major emphasis on data science and data management (e.g., the Software Engineering Institute).
  • Public health experts, as it is essential to understand the public health network and the way patient care fits in. Public health professionals oftentimes don’t have the latest EHR, nor the funding required to integrate with computing technologies.
  • Pharmaceutical companies, as even though they are very unlikely to share data, they use a lot of public data and address public health needs, so working with them would be beneficial.
  • NSF (especially supercomputing centers), including NSF AI Institutes with a focus on biomedical challenges (e.g., AI-CARING) as well as divisions within the CISE directorate that focus on systems, programming languages, computational biology, and algorithms.
  • Department of Energy (DOE)
  • Military research systems
  • Veteran Affairs (VA) – The VA hospitals and associated care systems collect large amounts of patient data representing both common (e.g. cardiovascular) and unique (e.g. combat-related PTSD) health challenges. Partnering with them might provide unique data resources and highlight very different patient and provider perspectives.   

At a high level, the authors emphasized that many biomedical research efforts require advances in fundamental computer science research, including in areas such as programming languages, algorithms, and systems. These areas also need to be supported at an unprecedented scale in order to meet the goals of this plan, especially to support the data interoperability, reproducible and distributed processing, low latency data availability, compression, search, and storage of data.

Read CCC’s full response here.

CCC Responds to RFI on NIH’s Strategic Plan for Data Science 2023-2028

Comments are closed.