Data sets are growing rapidly. Yahoo, Google, and Amazon, work with data sets that consist of billions of items. The size and scale of data, which can be overwhelming today, will only increase as the Internet of Things matures. Data sets are also increasingly complex. It is becoming more important to increase the pool of qualified scientists and engineers who can find the value from the large amount of big data.
The National Academies released a report on training students to extract value from big data based on a Committee on Applied and Theoretical Statistics (CATS) workshop that occurred in April 2014.
From the report:
Training students to be capable in exploiting big data requires experience with statistical analysis, machine learning, and computational infrastructure that permits the real problems associated with massive data to be revealed and, ultimately, addressed. Analysis of big data requires cross-disciplinary skills, including the ability to make modeling decisions while balancing trade-offs between optimization and approximation, all while being attentive to useful metrics and system robustness. To develop those skills in students, it is important to identify whom to teach, that is, the educational background, experience, and characteristics of a prospective data science student; what to teach, that is, the technical and practical content that should be taught to the student; and how to teach, that is, the structure and organization of a data science program.
Click here to see The National Academies report.