AI and Symbolic Regression

At Berlin's Data Science Retreat, I gave a talk that posed the following questions:

  • What is technology? 
  • What is AI?
  • WTF is genetic programming or symbolic regression? Why should I care?
  • How does Google find furry robots?

Here's my take on each.

Technology is a tool where other people know enough theory and algorithms, with implementations so good that the rest of us can just type 'A/B'. I got this view from Steve Boyd of Stanford. Your brain = tech. Your car = tech.  I've found it incredibly helpful in building algorithms over the years. If you've got a pet algorithm where you're fiddling & twiddling parameters just to make it work... that's not tech. Set the bar such that when you ship the tech inside a user-facing tool, you're not holding your users' hands just to make it work.

AI is <definition keeps changing, and that's fine!> In the early days of AI it was: a machine that can replicate human cognitive behavior, as embodied by the Turing test. More recently, it was: a machine that can perform a cognitive task, that was previously only possible with a human, as embodied by Deep Blue for chess. Most recently, there are several definitions. A modern pragmatic one is: a machine that can perform a non-analytical information processing task, at speed / accuracy / capacity not possible by a human. I do appreciate the efforts of the Artificial General Intelligence folks, which goes back to the earliest goals of AI. I just don't see that as the only AI work.

There's another way to think about AI: the things in computer science that are still a mystery. There was a time when databases were sufficiently mysterious that they were considered AI. No more. Same for constraint logic programming, spreadsheets, density estimation, the list goes on and on. These days many machine learning tools, such as large-scale linear regression or SVMs, are sufficiently understood (and useful as technologies) that they no longer fall under the "mystery" umbrella. And, mysteries remain!

Genetic Programming is cool! It asks questions almost as general as AI, but scopes them down enough to sink your teeth into and make real systems to answer. At its core it's going for the long-standing aim of automatic programming, i.e. can I automatically generate a computer program that performs task x? GP has many great examples of automatic programming for everything from designing sorting networks to automatic bug detection & repair. And since computer programs can represent near anything, they can represent engineering structures, which is a very broad class of problems in itself. GP researchers have auto designed space antennae, analog circuit topologies, robot morphologies, scientific formulae, quantum computing algorithms, and more. 

GP's solver is an evolutionary algorithm, though of course other solvers work too; for example Una-May O'Reilly used Simulated Annealing, and Kumara Sastry used an iterative Bayesian approach. Even random sampling works pretty well in some cases, as John Koza himself showed. The power of GP isn't the solver; rather, it's the questions that it has asked and successfully answered for the first time.

Symbolic regression is the problem of inducing an equation describing a relation among variables, given a set of training datapoints. Ideally the equation is simple enough to be interpreted by a human. That goal was one of the originating goals in the field of machine learning, and of course goes back further than that. It's had many labels over the years, from automatic function induction to nonlinear system identification (for the special case of dynamical systems). It's useful for everything from scientific discovery (e.g. discovering laws for pendulums with >1 joints, as Lipson and Schmidt did) to gaining insight into designing electric circuits (e.g. work I've done). It turns out that GP is pretty good at SR, and has become probably the biggest sub-branch of GP research. GP isn't the only way however. For example, I found that generating a giant set of basis functions, then solving them for them all at once with large-scale linear regression also works. It not only works, it's orders of magnitude faster and more scalable (with a small hit to generality).

Finally... how does Google find furry robots? In other words, how does Google do image search? How they do it now is more Deep Learning based. But back in the days of NIPS '09 they did it by large scale linear classification. Image search was one of the drivers of large scale linear modeling. It was a revelation to me that you could build a linear model with 10K or 1M input variables (or even 1T these days - thanks VW), in a sane amount of time on a single processor, with # training points << # input variables. This last bit is thanks to regularization, which mathematically amounts to minimizing the volume of the confidence ellipsoid of future predictions, or put simply minimizing how much you can screw up when predicting. The effect is that models use fewer input variables or have a smaller slope. Large scale linear modeling is ridiculously useful. 


Could Bitcoin save Moore's Law?

I originally posted the following on my Google+ account in Jan 2014. My semiconductor friends thought it was a little off the wall. But I posted it anyway. 

Moore's Law is at risk, not because of the physics but because of the economics. As discussed on e.g. SemiWiki.

Bitcoin mining is taking off. In 2013, TSMC, AMD and others saw $200M in sales for bitcoin-related parts (from basically zero a year or two before). Link.

The Bitcoin network difficulty is growing sharply exponentially. In the last several months, it's been 2x per month. Link.

There are many arguments why Bitcoin could become the underpinning of the whole global economy, or even just the internet economy. For example, ask Marc Andreesen [link]. Of course, it might not as well. But it might!  

Given these points, Bitcoin might just become the biggest driver of semi revenue. And since Moore's Law is at the most risk for economic reasons, Bitcoin might just be the new driver for Moore's Law... 

What's cool is that it really could be happening: bitcoin mining company KnCMiner is one of the very first companies to tape out at TSMC's 16nm process node. Here. (TSMC manufactures about half the silicon in the world, and 16nm is their newest, smallest node.) This upstart bitcoin (!) company beat Apple, Qualcomm, Nvidia and almost everybody else to the punch.

What gives? If you think about it, it's fully rational. Since they're building money-printing machines - more BTC every 10 minutes (when they win the lottery), they can calculate precisely how much money they expect to make based on how many Ghashes they can run. Maximize the hash rate, minimize the power costs, and the difference is profit. Marcus Erlandsson, the CTO of KnCMiner, confirmed this when I chatted with him recently. Cool!


Active Analytics, and Auto vs. Manual Design

In  2014, "Predictive Analytics" hit the mainstream. Many people got very excited about the idea that you could take a pinch of "big data" or "data mining", add in a dash of "visualization", and get "business value". I agree. I only use the air quotes because it was framed as something novel. But this stuff has been going on for decades, (though to be fair for much of that time it was with smaller datasets). For example, go to the appendix of Friedman's famous 1991 MARS paper and you'll find data mining + visualization for new insights. And then there's statistics + Tufte-style visualization. Then you have the likes of Spotify and Tableau. We'd been doing this sort of thing at Solido since 2004, and ADA before that, to help designers get insight into designing computer chips. My PhD included "knowledge extraction." It's great to see that this tech is starting to hit the mainstream - it's incredibly useful.

What's cool is that there is state of the art beyond predictive analytics. It's basically about closing the loop, rather than working with a static dataset. Get some data, do some analysis, but then (auto) find new data and repeat. The "find new data" part can be active, i.e. you can choose which sample to take next. You could also think of it as classic optimization, but with a visual element. I call it "Active Predictive Analytics", or "Active Analytics" for short. We've been doing this with a new tool at Solido, and designers really like it as a new style of design tool. It turns out to address auto vs. manual design too..

There's been a long running debate on whether automatic or manual design is better, and both sides have had really great arguments. But what if you can get the best of both worlds, if you can reconcile manual vs. automatic design? That's what the tool turns out to do: if you want to design fully manually, i.e. you pull the design, you can. If you want fully automatic, i.e. the tool pushes the design, you can. But the cool thing is that it allows the shades of gray in between: it gives insight what designs and design regions might be good, and you can easily pull the design with a visual editor. Call it supercharged manual design, if you will. I'm quite excited about this because it has applications far beyond circuits, for everything from deep learning to business intelligence to website optimization (evolution from A/B testing to multi-armed bandit to this).

I gave an invited talk on this at the Berlin Machine Learning group in May 2014. Slides are here.


The Ultimate Bootstrap: AI & Moore's Law

People talk about a Moore's Law for gene sequencing, a Moore's Law for software, etc. But what about the Moore's Law? Transistors keep getting exponentially smaller. It's the bull that the other "Laws" ride, a "Silicon Midas Touch": once a technology gets touched by the silicon Moore's Law, that technology goes exponential. Moore's Law is a technology backbone that is driving humanity. I love that! It's a driving reason why I've spent 15+ years of my life in semiconductors, to help drive Moore's Law. I've co-created software enabling chip design on bleeding-edge process nodes. 

What's cool: it's AI-based software, which runs on the most advanced microprocessors. To design the next generation of microprocessors. For that smartphone in your pocket, for the servers powering Google, and for the companies designing the next gen of chips. Put another way: the computation drives new chip designs, and those new chip designs are used for new computations, ... ad infinitum. It's the ultimate bootstrap of silicon brains. The only thing it's clocked only by is manufacturing speed.

I've given a couple talks about this. Here's one from 2013 I gave at a singularity meetup. And here's one I gave as an invited talk to the PyData Berlin conference (and the video too).


Predicting Black Swans for Fun and Profit

I've always been a big fan of Nassim Nicholas Taleb's writing. Though not always his conclusions. In "The Black Swan: The Impact of the Highly Improbable" he describes "black swan" events, which have extremely low probability but huge impact when they do happen. Partway through, he makes an assumption that they're so hard to predict, that you should just not bother, and instead protect yourself against the downside (if a negative event) or make sure you're exposed to the upside (if a positive event). I disagree: just because something's hard doesn't mean it's impossible. It's just a challenge! And it's worth going for if the upside to prediction is high.

Case in point: designing memory chips where the chance of failure is 1 in a billion or so. The Sonys and TSMCs of the world have huge motivation to estimate that value quite precisely. What's cool: they can now estimate these "black swans" with good confidence (using tech I helped develop), and they're very happy about it. It was hard, but not impossible!

I gave a talk on this at the Berlin Algorithms group in Feb 2014. The slides are here.



Artificial Intelligence and the Future of Cognitive Enhancement

I was invited to keynote Berlin's "Data Science Day" for 2014. They asked me to give something visionary. So I talked about cognitive enhancement (CogE), a longtime pet interest of mine and related to my work at Solido. Whereas the first machine age was augmenting muscles, our second machine age is about augmenting brains, ie CogE. Today's CogE has examples like search and recommendation, and also more extreme versions that we see in designing advanced computer chips.  Future CogE will continue to be catalyzed by the positive feedback cycle of AI & the "Silicon Midas Touch", and my favorite singularity scenario (BW++).

The slides are here.



Welcome! This is my first post. I've had a buildup of things I've been meaning to blog about, so the next several posts will be a flurry of activity while I get those off my chest. Many will be based on talks I've given in the last year. PS in Saskatchewan, flurry = mild wind + medium sized snowflakes. Bigger flakes than you'd see in a tweetstorm.

Page 1 2 3