The Computing Community Consortium (CCC) recently released the Wide-Area Data Analytics workshop report. The workshop was organized by Rachit Agarwal (Cornell University) and CCC Council Member Jen Rexford (Princeton University) to identify challenges and opportunities in data analytics and related research given that modern datasets are often distributed across many locations. In some cases, datasets are naturally distributed because they are collected from multiple locations, such as sensors spread throughout a geographic region. In other cases, datasets are distributed across different data centers to improve scalability or reliability, or to reduce cost; these distributed locations could be a mix of public clouds, private data centers, and edge computing sites.
“The workshop identified several key use cases for wide-area data analytics, including:
- Video analytics for real-time city surveillance, airport security, traffic monitoring, sporting events, and more;
- Analysis of diverse sensor data for monitoring wildlife, agriculture, laboratory experiments, manufacturing, offshore drilling, hospitals, and more;
- Augmented reality and virtual reality;
- Autonomous vehicles such as self-driving cars and drones;
- Sharing of sensitive data (e.g., personal fitness data, medical test results, financial data, information about security breaches) with analytics services, including combining data from multiple users or organizations to help create better models;
- Offloading of traditional data-analysis applications to the public cloud;” (p. 1).
Additionally, “the workshop identified the following high-level ‘grand challenge’: Write a data-analysis question in a high-level language, without regard for where the data are collected, stored, or analyzed, and it ‘just works’ — all while grappling with the extreme heterogeneity under the hood. The results of the analysis should be explainable so the user understands the meaning of the results and the data used to compute them.” (p. 1). The report focuses on the research challenges and opportunities across distributed systems, databases, computer networking, and security and privacy. It also argues “that research that cuts across the traditional “layers” of the stack, and is driven by compelling use cases and the capabilities of emerging devices, can lead to unprecedented progress toward this ambitious goal.” (p. 7)
To learn more about the research recommendations that emerged at the workshop, read the full report here.