## AI and Symbolic Regression

At Berlin's Data Science Retreat, I gave a talk that posed the following questions:

- What is technology?
- What is AI?
- WTF is genetic programming or symbolic regression? Why should I care?
- How
*does*Google find furry robots?

Here's my take on each.

** Technology is** a tool where other people know enough theory and algorithms, with implementations so good that the rest of us can just type 'A/B'. I got this view from Steve Boyd of Stanford. Your brain = tech. Your car = tech. I've found it incredibly helpful in building algorithms over the years. If you've got a pet algorithm where you're fiddling & twiddling parameters just to make it work... that's *not *tech. Set the bar such that when you ship the tech inside a user-facing tool, you're not holding your users' hands just to make it work.

**AI is** <definition keeps changing, and that's fine!> In the early days of AI it was: a machine that can replicate human cognitive behavior, as embodied by the Turing test. More recently, it was: a machine that can perform a cognitive task, that was previously only possible with a human, as embodied by Deep Blue for chess. Most recently, there are several definitions. A modern pragmatic one is: a machine that can perform a non-analytical information processing task, at speed / accuracy / capacity not possible by a human. I do appreciate the efforts of the Artificial General Intelligence folks, which goes back to the earliest goals of AI. I just don't see that as the *only* AI work.

There's another way to think about AI: the things in computer science that are still a mystery. There was a time when databases were sufficiently mysterious that they were considered AI. No more. Same for constraint logic programming, spreadsheets, density estimation, the list goes on and on. These days many machine learning tools, such as large-scale linear regression or SVMs, are sufficiently understood (and useful as technologies) that they no longer fall under the "mystery" umbrella. And, mysteries remain!

**Genetic Programming** is cool! It asks questions almost as general as AI, but scopes them down enough to sink your teeth into and make real systems to answer. At its core it's going for the long-standing aim of *automatic programming*, i.e. can I automatically generate a computer program that performs task x? GP has many great examples of automatic programming for everything from designing sorting networks to automatic bug detection & repair. And since computer programs can represent near anything, they can represent engineering structures, which is a very broad class of problems in itself. GP researchers have auto designed space antennae, analog circuit topologies, robot morphologies, scientific formulae, quantum computing algorithms, and more.

GP's solver is an evolutionary algorithm, though of course other solvers work too; for example Una-May O'Reilly used Simulated Annealing, and Kumara Sastry used an iterative Bayesian approach. Even random sampling works pretty well in some cases, as John Koza himself showed. The power of GP isn't the solver; rather, it's the questions that it has asked *and successfully answered* for the first time.

**Symbolic regression** is the problem of inducing an equation describing a relation among variables, given a set of training datapoints. Ideally the equation is simple enough to be interpreted by a human. That goal was one of the originating goals in the field of machine learning, and of course goes back further than that. It's had many labels over the years, from automatic function induction to nonlinear system identification (for the special case of dynamical systems). It's useful for everything from scientific discovery (e.g. discovering laws for pendulums with >1 joints, as Lipson and Schmidt did) to gaining insight into designing electric circuits (e.g. work I've done). It turns out that GP is pretty good at SR, and has become probably the biggest sub-branch of GP research. GP isn't the only way however. For example, I found that generating a giant set of basis functions, then solving them for them all at once with large-scale linear regression also works. It not only works, it's orders of magnitude faster and more scalable (with a small hit to generality).

Finally... how does **Google find furry robots**? In other words, how does Google do image search? How they do it now is more Deep Learning based. But back in the days of NIPS '09 they did it by large scale linear classification. Image search was one of the drivers of large scale linear modeling. It was a revelation to me that you could build a linear model with 10K or 1M input variables (or even 1T these days - thanks VW), in a sane amount of time on a single processor, with # training points << # input variables. This last bit is thanks to regularization, which mathematically amounts to minimizing the volume of the confidence ellipsoid of future predictions, or put simply minimizing how much you can screw up when predicting. The effect is that models use fewer input variables or have a smaller slope. Large scale linear modeling is ridiculously useful.