Tuesday, February 28, 2017

Perceptrons, Neurons, and Learning Logic


OK, fine. I'll tell you about perceptrons.

Warning 1: These are the guys most responsible for getting me hooked on this subject. They may hook you too.

And credit where credit's due: I learned pretty much everything written below from the sections on Perceptrons and Sigmoid Neurons in Chapter 1 of Michael Nielsen's fantastic free on-line Neural Networks and Deep Learning book. In fact, if you'd rather skip my post and just go read those sections now, I won't be offended.

I will, however, insist that afterward you check out the frickin' awesome neural network playground visualization tool created by Daniel Smilkov and Shan Carter as part of this larger project. You can fiddle with various parameters and see in real time how your fiddlings affect the way that a neural net learns to separate various non-linearly separable $2$-dimensional data sets. Well worth the $\geq 20$ minutes of your life it will cost you to do so.

Warning 2: It's going to seem for a moment that we're on a different planet, but bear with me. We'll be back on Earth again soon, I swear.

Perceptrons

Suppose you want to make a Yes/No decision based on your own personal weighting of answers to a finite collection of Yes/No questions.

(I want us all to pause for a moment and appreciate the following fact: This is basically how we do make decisions.)

The example Michael Nielsen gives in his book (Attend a cheese festival: Yes or No?) is far better than anything I can come up with, so I won't try. Instead I'll keep things abstract and just refer you there if you want an example.

A perceptron is a simple mathematical gadget that models this type of decision. People usually draw perceptrons like this:

An n-input perceptron


The $n$ nodes on the left (ordered from top to bottom, say, and labeled $x_1, \ldots, x_n$) represent the inputs. Each input will either be a "0" ("No") or a "1" ("Yes"), so there will be $2^n$ possible inputs to the perceptron. Let's call these inputs $x_1, \ldots, x_n$.

The output to the perceptron will also be a "0" or "1." How does the perceptron choose? Well, the perceptron has $n$ weights, $w_1, \ldots, w_n \in \mathbb{R}$, assigned to the $n$ input nodes, along with an overall threshold, $b \in \mathbb{R}$. Its output is then given by:
\[\left\{\begin{array}{cl} 0 & \mbox{if } \sum_{i=1}^n w_ix_i < b, { and}\\
1 & \mbox{if } \sum_{i=1}^n w_ix_i \geq b
\end{array}\right.\]

In other words, if the weighted sum of the inputs achieves or exceeds the threshold, output $1$. If it doesn't, output $0$.

If you read my post about linearly separating data, this should be starting to sound eerily familiar. Explicitly, a perceptron with $n$ inputs answers the following concrete question:

"Let ${\vec x} \in \{0,1\}^n \subset \mathbb{R}^n$ be a binary $n$-tuple, regarded as an $n$-dimensional real vector. On which side of the hyperplane defined by weight vector $\vec{w} := (w_1, \ldots, w_n)$ and shift $b$ does it lie?"

But when we think about it in these terms, the first thing that should jump to our minds is: Why on earth should a perceptron accept only binary inputs? There's nothing stopping it from accepting real-valued inputs. In fact, the geometric interpretation of what a perceptron is doing is somehow more natural when we allow our inputs to be any real numbers and not just $0$s and $1$s.

OK. Hold that thought while I tell you something else completely unrelated but even cooler.

Any logical sentence (or "circuit") can be built by concatenating perceptrons:

This is what blew my mind when I first heard about these guys.

If you've ever studied elementary logic (maybe you took an elementary proofs course or something), you've probably built yourself a truth table or two. If you haven't, don't worry. I'll tell you what they are and what they do.

(And if you're interested in what they have to do with mathematical proofs, you might like these notes written by Michael Hutchings, which I often use as a first reading assignment whenever I teach Intro to Proofs at Boston College.)

At its core, elementary logic is a language or "calculus" that allows us to specify how to take a finite, ordered tuple of $0$s and $1$s (a so-called binary string), perform a collection of operations on it, and obtain another ordered tuple of $0$s and $1$s. The $4$ basic operations in this language are:

  1. $*_1$ And $*_2$, 
  2. $*_1$ Or $*_2$, 
  3. If $*_1$ Then $*_2$,
  4. Not $*$
The *s above represent inputs. Note that the first three operations take two inputs, and the last takes one.

The binary output or value of each of the statements above is determined by a so-called truth table (it's a bit like a multiplication table). It tells you, e.g., that the output of the statement "$*_1$ and $*_2$"is $0$ unless both $*_1$ and $*_2$ are $1$. This should make sense to you since a statement like "I love cheese and it's 10 AM" is only true if the statements "I love cheese" and "It's 10 AM" are both true.

So the truth table for ($*_1$ And $*_2$) looks like this:



The inputs $*_1$ and $*_2$ are along the left and the top, and the outputs for the various corresponding combinations are recorded in the interior. You can easily find the truth tables for the other logical operations online (or in Hutchings' notes), so I won't record those here.

OK, great. The point now is that you can model any of these operations using a perceptron

I'll do "And," then leave the other three as exercises. Here's the picture:



So e.g. if we let $w_1 = w_2 = 10$, and $b = 15$, then the perceptron will return $1$ iff both inputs are $1$, as desired.

Note that there are (infinitely) many choices of weights $w_1, w_2$ and shift $b$ that work to model this "And" perceptron; all you need is a hyperplane separating the point $(1,1)$ from the points $(0,0), (1,0), (0,1) \in \mathbb{R}^2$. This is pretty easy to do, and also gives you an idea how to solve the exercises above once you find their truth tables online!

Once you've solved the exercises above, you have all the tinker toys you need to build any simple logical statement out of perceptrons (mathematicians also like using quantifiers like "for all" and "there exists," but let's ignore those). Connect the inputs to outputs in the appropriate way, and you've got your statement. For example, here's a perceptron model of the statement

"If ($*_1$ And $*_2$) Then ($*_3$)":



For fun, you might now imagine letting $*_1, *_2, *_3$ represent the truth values of the statements,

  1. "You're happy," 
  2. "You know it," 
  3. "You clap your hands," resp. 
(Credit to Josh Greene for this.)

Then the perceptron circuit will output a $1$ if, e.g., you're sad and you don't know it and you clap your hands, but a $0$ if you're happy and you know it and you don't clap your hands.

And you can go crazy with it. You can, e.g., make a perceptron circuit that looks like this:



N.B.: It looks like our perceptrons have multiple outputs in this picture. They don't. Each perceptron can have multiple inputs but only a single output. The multiple lines emanating from each perceptron in the pic above correspond to sending that perceptron's output to multiple places.

But I hope that the picture of a perceptron circuit I drew poorly above is starting to look a leeeeeetle bit like the pictures of neural networks we've seen in the news.

And Now Back to That Thought we were Holding:

Perceptron circuits are awesome, but they're pretty rigid. Their inputs and outputs are binary. Not to mention that the rule each perceptron uses to decide between outputting a $0$ and a $1$ is a step function: output $1$ if you're on or to one side of a hyperplane, $0$ if you're on the other side.

Perceptrons use step functions


The rigidity and jumpiness of this set-up doesn't seem ideally-suited to learning, does it?

(It's not.)

BUT what if we fuzz things out a bit, like we were doing before? That is, let's allow real-valued inputs and outputs, and replace our step functions with sigmoid functions. In other words, let's replace each perceptron with a sigmoid neuron. In case you haven't already guessed, a sigmoid neuron still has associated to it an $n$-dimensional weight vector $\vec{w} \in \mathbb{R}^n$ and a threshold $b \in \mathbb{R}$. But it allows its $n$-dimensional input vector $\vec{x}$ to live anywhere in $\mathbb{R}^n$, and its output will interpolate between $0$ and $1$ using a sigmoid function. That is, its output will be $\sigma(\vec{w}\cdot \vec{x} - b) \in (0,1)$. Recall that $\sigma(t) = \frac{1}{1+e^{-t}}$.

Sigmoid neurons use sigmoid functions

Sigmoid functions look an awful lot like step functions if you squint hard at them (increasingly so if we use a sigmoid function like $\frac{1}{1+e^{-ct}}$ and let $c \rightarrow \infty$)


Well, by George, we have a neural network. Moreover, it can, in principle learn logic. At least the elementary models of logic I teach in Math 216. 

All we need now is:

  1.  a reasonable cost function,
  2.  a reasonable way of computing its partial derivatives with respect to the weights and thresholds associated to all of the sigmoid neurons,
  3. buttloads of data for it to learn from.
We'll talk more about that next time.

BTW: I should thank the Isaac Newton Institute in Cambridge for its hospitality, since that's where I am at the moment!


















Wednesday, February 1, 2017

Idiot's Guide (written by an actual Idiot)

I still haven't told you what a neural net is or how one works. (Don't worry. We'll get there soon.)

You may be itching to play around with one all the same.

Unfortunately (or maybe fortunately...) you can't build a neural net with paper clips and string in your living room. And I wouldn't recommend writing much code from scratch either, even if you're already an amazing programmer (I'm not).

Luckily, some actual amazing programmers have already done all the hard work for you. Unluckily, finding which people have done it best and actually understanding how to get your hands on what they've done can sometimes feel harder than just doing the damn thing yourself.

(I don't recommend doing the damn thing yourself.)

The purpose of this post is to record roughly what I did to get myself up and running. Maybe provide a link or two or three. The post is almost entirely selfish, because I know I will never remember how to do this unless I write it down somewhere. May as well write it somewhere where it has a chance of helping others too.

User beware: I have close to zero programming experience and even less experience mucking around with Unix (besides using pine as an undergrad--ugh, dates me terribly, I know...). I can't guarantee that the steps below are maximally efficient or even close. All I know is that I can do what I need to do on my computer now. And the recipe I followed is below.

Ingredient list:

1) Python: This is the programming language everything is written in. Don't know anything about this language? That's OK. There are a buttloads of resources on-line for learning the syntax and basic structure, etc. When I'm writing a program and google a python syntax question, I often end up here. I also spent a good amount of time at the beginning going through this set of lecture notes. Or you could take one of the 10,000 MOOCs on it if you want more direction.
2) Jupyter (iPython) Notebook: This is a web-based environment that will allow you to write python code and execute it in the notebook so you can troubleshoot your code while you're writing it instead of after you've done it all wrong.
3) TensorFlow: This is the machine learning package developed by Google. People seem to like it! As do I, so far!
4) Keras: This is a front end for TensorFlow (it can also work with Theano as a back end--this is an alternative to TensorFlow that I have heard good things about but never used) that is (precisely) one zillion times more user-friendly. I was dreading building a neural net using TensorFlow until someone (Mark Hughes) told me about Keras.


Instructions (for a Mac, which is what I have):

0) Open a Terminal (look in your Applications folder if you don't know what I mean)
1) Install Anaconda (a python distribution with a package manager called "conda" that can be used to install various other packages) using the link in the "Anaconda Installation instructions" found on the TensorFlow website here. Keep in mind that there are lots of different versions of python, and the syntax actually varies quite a bit (annoyingly) among them. I went ahead and got the most recent version of python 3.
2) Create a conda environment called tensorflow by typing (where you may replace "3.5" with the python version number you are using):
      $ conda create -n tensorflow python=3.5
3) Activate the tensorflow environment by typing:
$ source activate tensorflow 
(tensorflow)$  # Your prompt should change
4) Install tensorflow using conda:
# Linux/Mac OS X, Python 2.7/3.4/3.5, CPU only:
(tensorflow)$ conda install -c conda-forge tensorflow
5) At this point, I followed an amazingly helpful answer on Quora found here to the question "How can I work with Keras on a Jupyter notebook with TensorFlow as a backend?" One needs to install ipython, Jupyter, and Keras (in that order) inside the tensorflow environment by typing:
  1. (tensorflow) username$ conda install ipython
  2. (tensorflow) username$ pip install jupyter
  3. (tensorflow) username$ pip install keras
6) Now deactivate and reactivate the tensorflow environment (not sure why you need to do this):

  1. (tensorflow)username$ source deactivate tensorflow
  2. username$ source activate tensorflow

7)  And open the Jupyter notebook (it will open in a browser: I think Safari is the default, at least that's what opens on my computer):
(tensorflow)username$ jupyter notebook
8)  Once the Jupyter notebook opens, navigate to whichever directory you want to use to store your ipython notebooks and click (upper right) New-->Python3 and you'll be in a Jupyter notebook. You can execute python code by doing a Shift-Return. You might try writing "print('hello, world')" to make sure you've got it. (It should output "hello, world" obvs).

9) To shut everything down, save your notebook, close the browser window, do a Ctrl-c at your terminal and answer "Y" when it asks you whether you want to shut down your Jupyter notebook.

10) You'll be back at the tensorflow prompt, at which point you deactivate tensorflow environment:
  1. (tensorflow)username$ source deactivate tensorflow

11) Type "Exit" to close your terminal and you're all done.

12) Now whenever you want to open a Jupyter Notebook and use Keras, TensorFlow you just do:
$ source activate tensorflow 
(tensorflow)$  # Your prompt should change
 then
(tensorflow)username$ jupyter notebook
and once you're all finished with your jupyter notebook:
(tensorflow)username$ source deactivate tensorflow
13) BTW, here's the documentation for Keras. I followed this great tutorial to understand how to build an LSTM (a particular neural network architecture that is good for analyzing sequences--will write more about this later) using Keras. Really cool thing I didn't realize until I did this tutorial: you can use Keras to download interesting data sets (this tutorial uses some imdb movie reviews and classifies them into "positive" or "negative")--I haven't explored which ones.