Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.


NSTC Releases Report on Lessons Learned from Federal Use of Cloud Computing to Support AI Research and Development

July 18th, 2022 / in Announcements / by Maddy Hunter

Artificial Intelligence (AI) and Machine Learning (ML) has made huge strides in the past decade. A large part of this progress can be attributed to large quantities of accessible datasets and computing resources. Recently many federal agencies have started to invest in leveraging commercial cloud computing resources to advance AI/ML research and development (R&D). The White House’s National Science and Technology Council (NSTC) and AI Subcommittee just released a report on Lessons Learned from Federal Use of Cloud Computing to Support AI Research and Development summarizing lessons learned from Federal agencies on the use of cloud computing to further AI R&D.

The report came about from another Federal Government’s Select Committee on AI report, Recommendations for Leveraging Cloud Computing Resources for Federally Funded Artificial Intelligence Research and Development that details recommendations for the Federal Government to advance the use of cloud computing to support AI innovation. Acting on these recommendations the MLAI subcommittee facilitated a series of dialogues among agency representatives and commercial cloud computing providers to help identify challenges and best practices in cloud computing and R&D. The newest reports on “Lessons Learned” summarizes key findings from these dialogues boiled down into benefits of investments, best practices, common challenges, and opportunities looking forward. The full report from the dialogue can be found here.

Benefits of Investment 

  • Providing researchers with persistent, on-demand access to cutting-edge capabilities, accelerating experimentation and the use of AI in new domains
  • Enabling reproducibility and scalability of research activities and their result
  • Help researchers quickly gain access to specialized AI hardware
  • Provide agencies access to the latest and most up to date computational capabilities

Best Practices 

  • Dedicated administration teams. Building this capability has provided agencies with the necessary expertise and authority to manage and oversee access to cloud computing resources, services, and platforms. Such teams have also provided training to the user community and vetted appropriateness of requested resources for achieving specific research goals.
  • User authentication. Most of the programs have restricted access to known, qualified, and credentialed users. Many also require two-factor authentication as a component of their security measures. Together, these measures provide a baseline level of security and an ability to create user-based access controls.
  • Training and education. Training assistance and educational opportunities have been critically important for addressing existing skills gaps, advancing equitable access opportunities, and building expertise among the user base. Making these resources available has helped the supported researchers navigate the various cloud computing resource offerings and match specific research and needs to the right compute architectures and software tools.
  • Pre-computed resources and workflows. Particularly when supporting internal or mission-focused research efforts, pre-computed workflows have reduced duplicative work and created accessible baseline approaches for common starting points for analyses.

Common Challenges

  • Efficient user authorization. Authenticating users can create bottlenecks related to verifying identities and provisioning sign-on capabilities. The underfunding or understaffing of governing organizations can lead to delays in account activations and resolution of issues that arise at every level of access. Furthermore, a lack of authoritative agency and government-wide guidance on approved services, which includes variable data privacy and access considerations, slows adoption and creates variation across agency policies and procedures.
  • Costs. The costs of data storage and access, complicate the ability for multiple teams to access shared data. In addition, billing and budgets are further complicated by the variability of cloud computing costs per project and the ease with which researchers can inadvertently exhaust credits through the use of incorrect settings. Furthermore, variable charges add complications for Federal procurement processes, as does uncertainty around which appropriations categories can be used to purchase which compute capabilities.
  • Organization. Ensuring that the users of a given cloud computing platform can locate and maintain awareness of data, experiments, and results relevant to their work and interests.
  • Privacy and Security. Determining ways to host and facilitate access to the right kinds of data with appropriate privacy and security safeguards, subject to budget considerations, changing research priorities, and the evolving user community being served.
  • Integration of cloud services with non-cloud resources. Presents challenges in terms of enabling researchers to effectively access the full breadth of agency resources.
  • Workforce development. Many Federal employees have limited familiarity with cloud computing technologies, and few have industry certification on cloud computing systems. These limitations challenge both internal research efforts and the ability to provide guidance and resources to external researchers.

Opportunities Looking Forward

To address financial models, the Federal Government could do the following:

  • Take better advantage of the purchasing power reflected in the consolidated Federal investments in commercial cloud computing platforms. This action would facilitate access to the most advanced capabilities of the cloud and provide a means to speak with a common voice on the expectations and needs of the federally funded AI research community.
  • Create explainable models with corresponding costs to better manage budget uncertainty, as these would illustrate for researchers and program managers the cost dynamics associated with cloud computing, particularly in terms of decisions related to the training parameters and processes.
  • Capture and share best practices from agency cloud programs regarding the contractual agreements and strategies to manage overspending.

To move toward the envisioned seamless, multi-cloud environment, agencies could do the following:

  • Leverage and help cultivate open-source technologies that can support standard ways of building and executing workloads for multi-cloud deployment (e.g., containerization and automation).
  • Facilitate and automate identity and access management through federated systems that bring together the research community inside and outside government.
  • Conduct an evaluation to assess the feasibility of developing a federated data mesh to reduce data movement and replication.

Agencies could be further assisted in their adoption of commercial cloud computing resources through the following:

  • Creation of a portal kit that would outline a standard template and put forward best practices to implement portals at various levels, depending on the organizational needs.
  • Provision of a guide for resource selection that would help agencies determine the circumstances under which different types of resources are best suited, such as the choice of cloud versus high-performance computing and commercial offerings versus on-premise machines.
  • Offer a guide on approved policies, procedures, resources, and services when it comes to commercial cloud offerings, to the extent practicable, by leveraging the purchasing power described above.

Finally, addressing workforce development needs will require the following:

  • Investments in training resources that can serve the entire range of end users, researchers, and technical staff, differentiated for their skill levels, needs, and interests.

  • Recruitment and retention strategies that include high-demand skill sets that support cloud computing, such as cloud architects, research computing and data professionals, research software engineers, and data scientists.

Read the full report here.

 

 

NSTC Releases Report on Lessons Learned from Federal Use of Cloud Computing to Support AI Research and Development

Comments are closed.