The Intelligence Advanced Research Projects Activity (IARPA) is out with a request for information (RFI) this month, seeking input on “a possible future IARPA investment (such as a program or grand challenge)” in automatic machine learning:
Machine learning (ML) is used extensively in application areas of interest to IARPA including speech, language, vision, sensor processing, and multi-modal integration. Typically, expert practitioners in ML select appropriate architectures and algorithms for the application domain, performance requirements, and data characteristics of the problem at hand. Additionally, they engineer an appropriate set of features to be extracted from the data for use in the system design. Then, depending on the problem, data may be selected for training and scheduled for presentation to the system according to the requirements of the task. In some application areas, the data needed for training are extremely sparse, consisting of only a few instances, and important information may be missing, requiring the application of supplementary information and real-world knowledge for intelligent inference. In many other application areas, the amount of data to be analyzed has been increasing exponentially (sensors, audio and video, social network data, web information) stressing even the most efficient procedures and most powerful processors. Most of these data are unorganized and unlabeled and human effort is needed for annotation and to focus attention on those data that are significant.
The focus of this RFI is on recent advances toward automatic machine learning, including automation of architecture and algorithm selection and combination, feature engineering, and training data scheduling for usability by non-experts, as well as scalability for handling large volumes of data. Useful automatic machine learning systems will require significant innovations in the science and technology of machine learning, possibly including (but not limited to) hierarchical architectures like Deep Belief Nets and hierarchical clustering, methods for parallelization of computation, attentional mechanisms for focusing on data of significance, methods for transfer of previously learned knowledge to a new task, methods for incorporation of real-world knowledge to include human advisors and one-shot learning methods, methods to include different temporal scales and the effects of causality, the role of goals and environmental feedback in learning, and model selection from approaches like meta-learning.
In particular, IARPA is interested in having the following questions addressed (after the jump):
1. What are your proposed methods for (a) automation of architecture and algorithm selection and combination, (b) feature engineering, and (c) training data scheduling? How will these automation methods affect the usability of an analytic system by non-experts?
2. What are the compelling reasons to use your proposed approach in a scalable multi-modal analytic system?
3. How will your approach handle different time scales, missing data, and sparse data?
4. How will your approach be applied to diverse data, such as speech, language, vision, sensor processing, and multi-modal integration?
5. How will you supplement training data with real-world and previously learned knowledge?
6. What is known about your proposed approach? Please provide suitable references.
7. What are the appropriate metrics to measure performance?
8. What other solutions are being suggested to overcome the challenges in this RFI?
9. What is the timescale needed to demonstrate progress?
10. What are the data sets and other resources needed?
11. Are supporting technologies readily available or does new technology need to be created?
IARPA seeks responses of up to five pages by 4pm EST on Jan. 27, 2012 via electronic submission to dni-iarpa-rfi-12-01@ugov.gov. According to the RFI, “teams with complementary areas of expertise are encouraged.”
For more details, check out the RFI in its entirety.
(Contributed by Erwin Gianchandani, CCC Director)