Freud, Genetic Programming, and Blockchain

Sigmund Freud was a funny guy [1]. He got famous by asking a bunch of novel questions (or near-novel questions framed well)… then answering them with exactly the wrong answers! Of course no one knew they were wrong at the time. Though everyone should have been more skeptical, as Freud himself repeatedly said he was not using the scientific method to arrive at his answers. He didn’t pose testable hypotheses, let alone test them and publish data and analyses back up or refute the hypotheses.

Genetic programming’s a bit like that too. It asked a bunch of novel questions  (or near-novel questions framed well)… then answering them with a one-size-fits-all technology. The algorithm was able to answer the questions, but usually in a horribly inefficient way. Moore’s Law is a partial answer, but if you’re needing 10,000x speedup to get something usable you probably aren’t willing to wait the decades for Moore’s Law to meet the needs. But I love the questions that GP asked. And to be fair, GP and other evolutionary algorithms are super convenient. I like to think of them as the second-best algorithm for everything: ranked highly because the effort to prototype it is super low, but can never be #1 because it’s (usually) not fast enough for production and there’s going to be a more suitable algorithm for the problem at hand.

I see Bitcoin like Freud too. It’s asked a bunch of novel questions  (or near-novel questions framed well)… then answering them with a one-size-fits-all technology in a horribly inefficient way and (arguably) the wrong answers. A global distributed trusted database is cool. But do you really need all that electricity for consensus? Is a 30Gb database -- small enough for a thumb drive -- really “bloated”? (Ironically, GP has a so-called bloat problem too.) Do you really even need coins if it’s fundamentally just a database? The terminology is admittedly fun – just like “penis envy” was back in the day.

I really do love the questions that Bitcoin has asked. And GP before. And Freud before that. And their related fields. I just prefer thoughtfully engineered solutions. So for now I’m going to call Bitcoin the #2 solution: convenient (now, anyway), but not fast enough for production, and there’s going to be a more suitable technology.


[1] Funny interesting more than funny ha-ha


Blockchain Scalability Part I – The Problem

The most interesting thing about Bitcoin isn’t electronic money. It’s the blockchain. As many have written [M. Andreesen 2014][J. Monegro 2014][M. Swan 2015], the blockchain has great promise as an open and global ledger, transforming not only the monetary system but virtually every industry. We'll be seeing:

If it exists, there is blockchain of it.

If the vision is realized at full potential, the blockchain will be swallowing all the photographs, videos and personal data streams of the world, in real time. It would need millions of transactions per second, and each transaction might have media files of 20Gb or more. The Bitcoin community calls the size problem “bloat” but that assumes that we want a small blockchain. I disagree: an ideal blockchain should be able to handle this volume and throughput.

The blockchain wants to be big.

But wishing doesn’t make it true. The hitch in the grand vision of the blockchain is that it simply doesn’t scale.  Unfortunately, the blockchain’s not even close. It doesn’t just need a 2x or 10x, it needs 10+ orders of magnitude.

The blockchain has several issues. Let’s enumerate:

Throughput. The Bitcoin network processes just 1 transaction per second (tps) on average, with a theoretical max of 7 tps []. It could handle higher throughput if each block was bigger, though right now that leads to size issues (see below). Compare this to Visa (2,000 tps typical, 1,000 tps peak) [] or twitter (5,000 tps typical, 15,000 tps peak), or ad networks (>100,000 tps typical).

Latency. Each block takes 10 minutes to process. For sufficient security, one should wait for about an hour. And for larger transfer amounts it needs to be longer, because it must outweigh the cost of a double spend attack. Visa takes seconds at most. Financial systems are orders of magnitude faster yet.

Size and bandwidth. The blockchain is 28Gb large, and grew by 14Gb in the last year []. So it already takes a long time to download (e.g. 1 day). If throughput increased by 2000x to Visa standards, that would be 1.42 Pb/year. At 150,000 tps, the blockchain would grow by 214 Pb/year. Try downloading that!

Permanence and data flexibility are issues too. Let’s table those for another day.

Interestingly, trust isn’t perfect either. At first glance Bitcoin may seem tailor-made to solve the double-spend problem.  However, the known 51% problem is really a problem when mining pools like keep bumping up against it [J. Matonis 2014]. Just a handful of mining pools and ASIC builders control the whole network. There’s nothing preventing these organizations from colluding [M. Hearn 2014].

Note: I gave a similar list to my friend Melanie Swan, who incorporated many of those ideas into her new book on the blockchain, from O’Reilly. Great! And Melanie - your book rocks! Also, obviously other people have talked about these issue in their own way too. This is simply my take:)

So, is there a way out?

If we want to engineer a solution, we first need to specify the problem in more technical detail. So let’s get to it. Starting simply, the blockchain’s a database, just a very special one.  Technically speaking, the blockchain solves the “Byzantine Generals + Sybil Attacks” problem. In the Byzantine Generals problem [L. Lamport 1980], nodes need to agree on some value for a database entry, with the constraint that the nodes don’t necessarily get along. The Sybil Attack problem [J. Douceur 2002] is when one or more nodes figure out how to get unfairly disproportionate vote in the course of agreeing. Some have proposed to label this the Strong Byzantine Generals’ (SBG) problem [P. Koshy 2014]. (Note: Bitcoin and others aren't technically BFT because they aren't consistent in the CAP sense. But they are still robust to Byzantine behavior.)

Blockchain is not the first solution to plain old Byzantine Generals problem. [PBFT] is the first practical solution. Or if you only have crash faults and don't need to worry about malicious faults, then [Paxos] is the standard. Basically any distributed database needs it, from [Bigtable] to [Cassandra] to [Neo4J]. They all use Paxos.

Distributed databases are big. They’re the technology of “Big Data”. They start on petabytes and grow up to exabytes and beyond. Yet somehow the bitcoin community worries about “bloat” on a 30 Gb database. You better have the “right” transaction in the database, because if not you’re giving it “dust”. That’s right, this database that’s a million times smaller than a modern distributed database, that fits on your thumb, has “bloat”. Lol!

Here’s some insight why Bitcoin scales so poorly. Consider a RAID array (Redundant Array of Independent Disks), which behaves like a large monolithic database but consists of multiple cheap hard drives. A data record is stored redundantly in >1 drives, so that if a drive fails then the data can be still easily recovered. If only one disk fails at a time, then there only needs to be one backup drive for that data. One can make the risk arbitrarily small, based on assumptions of how many disks might fail at once. RAID arrays have one or two backups to be reliable at scale in industry [ref]. Similar to RAID arrays, databases like a Cassandra have a “replication factor” of 3. Bitcoin has about 6,500 full nodes [ref]. That’s akin to a RAID array with 6,500 backups, or Cassandra with a replication factor or 6,500. Lol!

I gave a talk on these issues and database disparities for Bitcoin Startups Berlin in Oct 2014. [These slides] have more detail. (Thanks for the opportunity, Brian!) 


MARS, Deep Learning, and Space Lego

I’m not really into trends. Definitely not fashion (ask anyone) and not even music (though I’d like to think I was). Also, not algorithms. There are definitely trendy algorithms. These days it’s deep neural networks. I acknowledge that they’re pretty cool. They’ve had a nice string of successes, and I expect several more. They’ve been eating image processing and are nibbling the edges of other fields too. Though they’re far from perfect: they take forever to train, there’s limited theory, and intuition is difficult.

Like anything, the shiny new thing obscures the older things. Sometimes those older things are really awesome.

One example of a really awesome older thing is MARS. It’s 24 years old. It has linear scaling on the # input variables, and the # training samples. It has theory. In practice it performs as well or better than many “state of the art” algorithms. It’s elegant and intuitive. You can do cool things like extract important variables. It builds on something even older and “more boring”, decision trees aka CARTs. Yet hardly anyone talks about MARS, let alone uses it. Well, forget that. I love it, and I don’t care if it’s not the shiny new thing. I’m using it and shipping product. Without needing a Tesla cluster. So there.

And by the way, MARS hinge functions = rectified linear units (ReLUs). So sorry, deep learning people, ReLUs was a rediscovery (and that’s ok!). And also BTW, if think about deep NNs as big PWL models, then you get gobs of theory, e.g. in the control systems and analog circuits literature.

Other old but awesome things: Wilson confidence interval (1927), the bootstrap (1979), the lasso (1996), the Reimann zeta function (1740/1859) and 1980s Space Lego.

Want more detail? Here’s the talk I gave to Berlin’s Machine Learning group in Feb 2015. 


Blockchain: The Moon Laser Analogy

While I got my first bitcoins in 2010 (alas not enough – woulda coulda shoulda), I’ve been spending a lot more time thinking about it this last year. One of my pet thought threads is to think up new analogies to the blockchain. There’s the standard ones like “writing into a book that everyone can see” or “etching into stone”. Those are ok, but a little too simplistic because the analogy doesn’t cover certain characteristics of the blockchain.

In my view, the blockchain has four specific characteristics: “write once, read forever, delete never, and trusted”; where “trusted” is implemented as “public, transparent, and owned by no one via decentralization” (also called “trustless” by some).

My current pet analogy is the moon laser:

Imagine that the moon’s surface is a ledger; anyone can etch with a laser from earth; once there it’s there forever, and anyone can read it with a telescope. This moon laser system is also “write once, read forever, delete never, and trusted”. (But flipping bits is easier than shooting lasers to the moon; hence the blockchain.)

This analogy seems to cover more characteristics than other analogies. It’s a bit on the weird side though. I don’t use this one when explaining bitcoin to central bankers!

I’m sure there are several great analogies out there, and improvements to the moon laser one too. Suggestions?



Natural Born Cyborgs: A Prelude to the BW++ Singularity Scenario

I recently chaired a panel about the singularity at the [OEB conference]. The aim was to paint a picture of what the singularity could look like, and the shades-of-gray technologies leading to it. I gathered friends who live and breathe those technologies – virtual reality (Tore Knabe), augmented reality (Miho Tanaka), bio engineering (Jan Saam), and technology for thinking (Bruce Pon). I was the AI guy. Each of us discussed the techs in the context of singularity.

We also laid out a few of the singularity scenarios and related risks. I discussed my pet scenario (bandwidth++) which I think is the safest path for humanity. The BW++ scenario avoids the robot-overlord scenario while still keeping us in the flesh until we’re ready. Bill J and Elon M – you’re wrong, there is a way. The BW++ way means we don’t need to lock down scientists or make interplanetary colonies of our meat bag selves. Following the conference I gave an interview about the BW++ scenario; the article is [here][local]. I’ve given talks on it before - to a group of neuroscientist in Vancouver in 2012 [slides] and a group of futurists in Berlin in early 2014 [slides]. One of these days I’ll make an essay (and proper post) of it.



AI and Symbolic Regression

At Berlin's Data Science Retreat, I gave a talk that posed the following questions:

  • What is technology? 
  • What is AI?
  • WTF is genetic programming or symbolic regression? Why should I care?
  • How does Google find furry robots?

Here's my take on each.

Technology is a tool where other people know enough theory and algorithms, with implementations so good that the rest of us can just type 'A/B'. I got this view from Steve Boyd of Stanford. Your brain = tech. Your car = tech.  I've found it incredibly helpful in building algorithms over the years. If you've got a pet algorithm where you're fiddling & twiddling parameters just to make it work... that's not tech. Set the bar such that when you ship the tech inside a user-facing tool, you're not holding your users' hands just to make it work.

AI is <definition keeps changing, and that's fine!> In the early days of AI it was: a machine that can replicate human cognitive behavior, as embodied by the Turing test. More recently, it was: a machine that can perform a cognitive task, that was previously only possible with a human, as embodied by Deep Blue for chess. Most recently, there are several definitions. A modern pragmatic one is: a machine that can perform a non-analytical information processing task, at speed / accuracy / capacity not possible by a human. I do appreciate the efforts of the Artificial General Intelligence folks, which goes back to the earliest goals of AI. I just don't see that as the only AI work.

There's another way to think about AI: the things in computer science that are still a mystery. There was a time when databases were sufficiently mysterious that they were considered AI. No more. Same for constraint logic programming, spreadsheets, density estimation, the list goes on and on. These days many machine learning tools, such as large-scale linear regression or SVMs, are sufficiently understood (and useful as technologies) that they no longer fall under the "mystery" umbrella. And, mysteries remain!

Genetic Programming is cool! It asks questions almost as general as AI, but scopes them down enough to sink your teeth into and make real systems to answer. At its core it's going for the long-standing aim of automatic programming, i.e. can I automatically generate a computer program that performs task x? GP has many great examples of automatic programming for everything from designing sorting networks to automatic bug detection & repair. And since computer programs can represent near anything, they can represent engineering structures, which is a very broad class of problems in itself. GP researchers have auto designed space antennae, analog circuit topologies, robot morphologies, scientific formulae, quantum computing algorithms, and more. 

GP's solver is an evolutionary algorithm, though of course other solvers work too; for example Una-May O'Reilly used Simulated Annealing, and Kumara Sastry used an iterative Bayesian approach. Even random sampling works pretty well in some cases, as John Koza himself showed. The power of GP isn't the solver; rather, it's the questions that it has asked and successfully answered for the first time.

Symbolic regression is the problem of inducing an equation describing a relation among variables, given a set of training datapoints. Ideally the equation is simple enough to be interpreted by a human. That goal was one of the originating goals in the field of machine learning, and of course goes back further than that. It's had many labels over the years, from automatic function induction to nonlinear system identification (for the special case of dynamical systems). It's useful for everything from scientific discovery (e.g. discovering laws for pendulums with >1 joints, as Lipson and Schmidt did) to gaining insight into designing electric circuits (e.g. work I've done). It turns out that GP is pretty good at SR, and has become probably the biggest sub-branch of GP research. GP isn't the only way however. For example, I found that generating a giant set of basis functions, then solving them for them all at once with large-scale linear regression also works. It not only works, it's orders of magnitude faster and more scalable (with a small hit to generality).

Finally... how does Google find furry robots? In other words, how does Google do image search? How they do it now is more Deep Learning based. But back in the days of NIPS '09 they did it by large scale linear classification. Image search was one of the drivers of large scale linear modeling. It was a revelation to me that you could build a linear model with 10K or 1M input variables (or even 1T these days - thanks VW), in a sane amount of time on a single processor, with # training points << # input variables. This last bit is thanks to regularization, which mathematically amounts to minimizing the volume of the confidence ellipsoid of future predictions, or put simply minimizing how much you can screw up when predicting. The effect is that models use fewer input variables or have a smaller slope. Large scale linear modeling is ridiculously useful. 


Could Bitcoin save Moore's Law?

I originally posted the following on my Google+ account in Jan 2014. My semiconductor friends thought it was a little off the wall. But I posted it anyway. 

Moore's Law is at risk, not because of the physics but because of the economics. As discussed on e.g. SemiWiki.

Bitcoin mining is taking off. In 2013, TSMC, AMD and others saw $200M in sales for bitcoin-related parts (from basically zero a year or two before). Link.

The Bitcoin network difficulty is growing sharply exponentially. In the last several months, it's been 2x per month. Link.

There are many arguments why Bitcoin could become the underpinning of the whole global economy, or even just the internet economy. For example, ask Marc Andreesen [link]. Of course, it might not as well. But it might!  

Given these points, Bitcoin might just become the biggest driver of semi revenue. And since Moore's Law is at the most risk for economic reasons, Bitcoin might just be the new driver for Moore's Law... 

What's cool is that it really could be happening: bitcoin mining company KnCMiner is one of the very first companies to tape out at TSMC's 16nm process node. Here. (TSMC manufactures about half the silicon in the world, and 16nm is their newest, smallest node.) This upstart bitcoin (!) company beat Apple, Qualcomm, Nvidia and almost everybody else to the punch.

What gives? If you think about it, it's fully rational. Since they're building money-printing machines - more BTC every 10 minutes (when they win the lottery), they can calculate precisely how much money they expect to make based on how many Ghashes they can run. Maximize the hash rate, minimize the power costs, and the difference is profit. Marcus Erlandsson, the CTO of KnCMiner, confirmed this when I chatted with him recently. Cool!


Active Analytics, and Auto vs. Manual Design

In  2014, "Predictive Analytics" hit the mainstream. Many people got very excited about the idea that you could take a pinch of "big data" or "data mining", add in a dash of "visualization", and get "business value". I agree. I only use the air quotes because it was framed as something novel. But this stuff has been going on for decades, (though to be fair for much of that time it was with smaller datasets). For example, go to the appendix of Friedman's famous 1991 MARS paper and you'll find data mining + visualization for new insights. And then there's statistics + Tufte-style visualization. Then you have the likes of Spotify and Tableau. We'd been doing this sort of thing at Solido since 2004, and ADA before that, to help designers get insight into designing computer chips. My PhD included "knowledge extraction." It's great to see that this tech is starting to hit the mainstream - it's incredibly useful.

What's cool is that there is state of the art beyond predictive analytics. It's basically about closing the loop, rather than working with a static dataset. Get some data, do some analysis, but then (auto) find new data and repeat. The "find new data" part can be active, i.e. you can choose which sample to take next. You could also think of it as classic optimization, but with a visual element. I call it "Active Predictive Analytics", or "Active Analytics" for short. We've been doing this with a new tool at Solido, and designers really like it as a new style of design tool. It turns out to address auto vs. manual design too..

There's been a long running debate on whether automatic or manual design is better, and both sides have had really great arguments. But what if you can get the best of both worlds, if you can reconcile manual vs. automatic design? That's what the tool turns out to do: if you want to design fully manually, i.e. you pull the design, you can. If you want fully automatic, i.e. the tool pushes the design, you can. But the cool thing is that it allows the shades of gray in between: it gives insight what designs and design regions might be good, and you can easily pull the design with a visual editor. Call it supercharged manual design, if you will. I'm quite excited about this because it has applications far beyond circuits, for everything from deep learning to business intelligence to website optimization (evolution from A/B testing to multi-armed bandit to this).

I gave an invited talk on this at the Berlin Machine Learning group in May 2014. Slides are here.


The Ultimate Bootstrap: AI & Moore's Law

People talk about a Moore's Law for gene sequencing, a Moore's Law for software, etc. But what about the Moore's Law? Transistors keep getting exponentially smaller. It's the bull that the other "Laws" ride, a "Silicon Midas Touch": once a technology gets touched by the silicon Moore's Law, that technology goes exponential. Moore's Law is a technology backbone that is driving humanity. I love that! It's a driving reason why I've spent 15+ years of my life in semiconductors, to help drive Moore's Law. I've co-created software enabling chip design on bleeding-edge process nodes. 

What's cool: it's AI-based software, which runs on the most advanced microprocessors. To design the next generation of microprocessors. For that smartphone in your pocket, for the servers powering Google, and for the companies designing the next gen of chips. Put another way: the computation drives new chip designs, and those new chip designs are used for new computations, ... ad infinitum. It's the ultimate bootstrap of silicon brains. The only thing it's clocked only by is manufacturing speed.

I've given a couple talks about this. Here's one from 2013 I gave at a singularity meetup. And here's one I gave as an invited talk to the PyData Berlin conference (and the video too).


Predicting Black Swans for Fun and Profit

I've always been a big fan of Nassim Nicholas Taleb's writing. Though not always his conclusions. In "The Black Swan: The Impact of the Highly Improbable" he describes "black swan" events, which have extremely low probability but huge impact when they do happen. Partway through, he makes an assumption that they're so hard to predict, that you should just not bother, and instead protect yourself against the downside (if a negative event) or make sure you're exposed to the upside (if a positive event). I disagree: just because something's hard doesn't mean it's impossible. It's just a challenge! And it's worth going for if the upside to prediction is high.

Case in point: designing memory chips where the chance of failure is 1 in a billion or so. The Sonys and TSMCs of the world have huge motivation to estimate that value quite precisely. What's cool: they can now estimate these "black swans" with good confidence (using tech I helped develop), and they're very happy about it. It was hard, but not impossible!

I gave a talk on this at the Berlin Algorithms group in Feb 2014. The slides are here.



Artificial Intelligence and the Future of Cognitive Enhancement

I was invited to keynote Berlin's "Data Science Day" for 2014. They asked me to give something visionary. So I talked about cognitive enhancement (CogE), a longtime pet interest of mine and related to my work at Solido. Whereas the first machine age was augmenting muscles, our second machine age is about augmenting brains, ie CogE. Today's CogE has examples like search and recommendation, and also more extreme versions that we see in designing advanced computer chips.  Future CogE will continue to be catalyzed by the positive feedback cycle of AI & the "Silicon Midas Touch", and my favorite singularity scenario (BW++).

The slides are here.



Welcome! This is my first post. I've had a buildup of things I've been meaning to blog about, so the next several posts will be a flurry of activity while I get those off my chest. Many will be based on talks I've given in the last year. PS in Saskatchewan, flurry = mild wind + medium sized snowflakes. Bigger flakes than you'd see in a tweetstorm.

Page 1 ... 1 2 3 4