Layered architectures involving lots of linearity, some smooth nonlinearities, and stochastic gradient descent seem to be able to memorize huge numbers of patterns while interpolating smoothly (not oscillating) "between" the patterns; moreover, there seems to be an ability to discard irrelevant details, particularly if aided by weight- sharing in domains like vision where it's appropriate. I do think that Bayesian nonparametrics has just as bright a future in statistics/ML as classical nonparametrics has had and continues to have. My colleague Yee Whye Teh and I are nearly done with writing just such an introduction; we hope to be able to distribute it this fall. Liberating oneself from that normalizing constant is a worthy thing to consider, and general CRMs do just that. Decision trees, nearest neighbor, logistic regression, kernels, PCA, canonical correlation, graphical models, K means and discriminant analysis come to mind, and also many general methodological principles (e.g., method of moments, which is having a mini-renaissance, Bayesian inference methods of all kinds, M estimation, bootstrap, cross-validation, EM, ROC, and of course stochastic gradient descent, whose pre-history goes back to the 50s and beyond), and many many theoretical tools (large deviations, concentrations, empirical processes, Bernstein-von Mises, U statistics, etc). For example, I've worked recently with Alex Bouchard-Cote on evolutionary trees, where the entities propagating along the edges of the tree are strings of varying length (due to deletions and insertions), and one wants to infer the tree and the strings. Note that many of the most widely-used graphical models are chains---the HMM is an example, as is the CRF. In particular, I recommend A. Tsybakov's book "Introduction to Nonparametric Estimation" as a very readable source for the tools for obtaining lower bounds on estimators, and Y. Nesterov's very readable "Introductory Lectures on Convex Optimization" as a way to start to understand lower bounds in optimization. When Leo Breiman developed random forests, was he being a statistician or a machine learner? Let's not impose artificial constraints based on cartoon models of topics in science that we don't yet understand. What are the most important high level trends in machine learning research and industry applications these days? This seems like as good a place as any (apologies, though, for not responding directly to your question). This last point is worth elaborating---there's no reason that one can't allow the nodes in graphical models to represent random sets, or random combinatorial general structures, or general stochastic processes; factorizations can be just as useful in such settings as they are in the classical settings of random vectors. (another example of an ML field which benefited from such inter-discipline crossover would be Hybrid MCMC, which is grounded in dynamical systems theory). One characteristic of your "extended family" of researchers has always been a knack for implementing complex models using real-world, non-trivial data sets such as Wikipedia or the New York Times archive. (6) How do I deal with non-stationarity? That's a useful way to capture some kinds of structure, but there are lots of other structural aspects of joint probability distributions that one might want to capture, and PGMs are not necessarily going to be helpful in general. I take the pizza and I tell them: "I've got a bad feeling about this. He was a professor at MIT from 1988 to 1998. With all due respect to neuroscience, one of the major scientific areas for the next several hundred years, I don't think that we're at the point where we understand very much at all about how thought arises in networks of neurons, and I still don't see neuroscience as a major generator for ideas on how to build inference and decision-making systems in detail. Why do you believe nonparametric models haven't taken off as well as other work you and others have done in graphical models? Do you mind explaining the history behind how you learned about variational inference as a graduate student? Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. The emergence of the "ML community" has (inter alia) helped to enlargen the scope of "applied statistical inference". Anything beyond CRFs? Lastly, I'm certainly a fan of coresets, matrix sketching, and random projections. If you got a billion dollars to spend on a huge research project that you get to lead, what would you like to do? Moreover, not only do I think that you should eventually read all of these books (or some similar list that reflects your own view of foundations), but I think that you should read all of them three times---the first time you barely understand, the second time you start to get it, and the third time it all seems obvious. Do you still think this is the best set of books, and would you add any new ones? I have a few questions on ML theory, nonparametrics, and the future of ML. He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics. Are the SVM and boosting machine learning while logistic regression is statistics, even though they're solving essentially the same optimization problems up to slightly different shapes in a loss function? Why does anyone think that these are meaningful distinctions? In addition Jordan had completely cemented his position as best SG in history, a title that many had thought would be Drexler's just a few years earlier. By using our Services or clicking I agree, you agree to our use of cookies. These are a few examples of what I think is the major meta-trend, which is the merger of statistical thinking and computational thinking. I've personally been doing exactly that at Berkeley, in the context of the "RAD Lab" from 2006 to 2011 and in the current context of the "AMP Lab". What current techniques do you think students should be learning now to prepare for future advancements in approximate inference? Note also that exponential families seemed to have been dead after Larry Brown's seminal monograph several decades ago, but they've continued to have multiple after-lives (see, e.g., my monograph with Martin Wainwright, where studying the conjugate duality of exponential families led to new vistas). I'd use the billion dollars to build a NASA-size program focusing on natural language processing (NLP), in all of its glory (semantics, pragmatics, etc). On ML theory, nonparametrics, and the future of ML. I'm not sure that I'd view them as "less data-hungry methods", though; essentially they provide a scalability knob that allows systems to take in more data while still retaining control over time and accuracy. Prof. Jordan is a member of the National Academy of Sciences, a member of the National Academy of Engineering and a member of the American Academy of Arts and Sciences. He is a Fellow of the American Association for the Advancement of Science. 