A Neural Network Primer
The rationale of ANNs from a programmer's perspective. A gentle but indepth treatment to the topic of introducing ANNs.
Preface/Purpose
I chanced upon this primer from the 90s, on artificial neural networks, and thought it might be a good idea to write something similar with a different flavour for a different time.
The idea is to treat the programmerreader with respect while still being introductory reading.
The code samples are mostly in Clojure, with possibly some Python used via libpythonclj. But they are mostly instructional and not written in the playalong way, as there is background setup not documented in this note. But the intention of the code is to be very much understandable in isolation.
Thinking about thinking
Biologically, the brain is known to be made up of a huge number of neurons  an average of 86 billion for humans. It probably is still a mystery how it all works, but various models have been proposed. None may have been accurate, but all have them have been useful.
Neurons are considered as individual computing units of the brain  each with advanced and complex IO capabilities. In isolation, a single neuron may not seem very exciting, but as part of a huge network of connected neurons that the brain is, the system as a whole is highly capable. Capable enough, to study itself – as we are doing right now.
Artificial neural networks (ANNs) are approximations of the biological model that we feel are extremely useful for certain classes of computational tasks that may be hard to code in the traditional style.
There are various kinds of ANNs that one comes across, and each of them is but an approximation of the a biological brain might behave under specific circumstances. In other words, depending on the kind of problem being attempted to be solved, a different ANN architecture may make more sense. But they did not appear in one go. It started very simple.
Our goal here is not to track the history of the evolution of ANNs but to see the roadmap for a deeper understanding.
A Simplified Neuron
 Dendrites

A neuron receives input electrical signals via dendrites
 Axons

Once a neuron decides to
fire
a signal, it is carried outwards to downstream dendrites via axons  Activation Function

The summed up input signals via the dendrites are processed by the
activation function
that characterizes the computation of each neuron. Typically, these activation functions are directly correlated to the strength of the summed inputs. Thecorrelation
also happens to be one of the key choices to be made when defining models.
The Perceptron  Basics
The Perceptron is the oldest model of the neuron. It sums up all input into a single input to the activation function and outputs a binary value. It discerns between different inputs and slots them into two categories.
So, let's consider the following table, where x and y are inputs, and z is the output. z = f(x, y)
x  y  z 

0  0  0 
0  1  1 
1  0  1 
1  1  1 
Consider x
and y
as two inputs, and the corresponding outputs are captured in the z
column. The perceptron is in effect just the function f
as specified above. How can we implement one? If the inputs are binary as in the table above, the perceptron can be a simple lookup function into the table above and serve the purpose! (Think  a dictionary or a hashmap).
We can approximate capture the essense of the above OR table in a realvalued function as thus (while still using binaryvalued inputs)
z = (1.5x + 1.5y ≥ 1.5)
Feeding x
and y
from the above table into the code below
(let [w1 1.5
w2 1.5
f (fn [[x y]]
[x y (>= (+ (* w1 x) (* w2 y)) 1.5)])]
(map f inputtable))
0  0  false 
0  1  true 
1  0  true 
1  1  true 
Let's shift the goalpost – slightly. Let us set the threshold to 1.51 from 1.5 and rerun with the same x
and y
inputs.
(let [w1 1.5
w2 1.5
f (fn [[x y]]
[x y (>= (+ (* w1 x) (* w2 y)) 1.51)])]
(map f inputtable))
0  0  false 
0  1  false 
1  0  false 
1  1  true 
z = (1.5x + 1.5y ≥ 1.51) – Now, that's AND!
So, AND and OR are almost the same, except for the discriminant being laterally translated.
We have now an example of a Perceptron with an ability to compute AND and OR over real values – even if the examples above used only binary values.
A discriminant is a possibly curved line in 2D. By extension, will be a 2D surface in 3D. And so on. Some boundary that divides the space it is embedded in into two disjoint spaces.
The Perceptron – Getting Somewhat Real
What does the figure above indicate? The lines represent the exact values of our threshold but any combination of realvalued x
and y
on one side of the slanted lines will classify always into a single category – true
or false
. Which line? That will decide whether we are looking at the AND operation or the OR operation.
In real life, we expect inputs to be – well, real. Pardon the pun, but for welldefined inputs and wellknown functions, we wouldn't be exploring neural networks.
The insight here is that both the functions are similar and map their inputs to opposite sides of some discriminant. Only that the discriminants are translated, while sharing the same slope.
Having the same slope is just incidental given our arbitrary choice of representation. The slopes can very well be different, and depend on multiple factors including what example datasets we work with.
Can we do a simple neural network to demonstrate learning AND? Indeed. Let's first create some sample data. It's synthetic  we already know the solution to the AND problem, so we randomly generate x
and y
values and then label their z
.
(def naivesamplesource [0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0])
;; We're cheating  we're creating our sample here you see
(defn naiveclassfn [[x y]]
(if (< 0.5 (+ x y)) 0 1))
(map vector naivesamplesource naivesamplesource)
(defn naivesample [samplecount]
(let [f #(randnth naivesamplesource)
xs (>> f repeatedly (take samplecount))
ys (>> f repeatedly (take samplecount))
sample (zipmap (doall xs) (doall ys))]
[sample (map naiveclassfn sample)]))
(>> #(randnth naivesamplesource) repeatedly (take 10))
(naivesample 10)
So, what's the conclusion?
Nothing earthshattering, except that we now have a notion of AND and OR computation for approximately 1 and approximately 0 values of x
and y
. And of course, infinitely many combinations in the entire realnumbers domain.
And, the neural network?
We've seen a neuron in action. Let's dwell further on the class of problems it solves for us.
The AND and OR functions classify their inputs into two groups. What we have above is an absolutely naïve classifier.
It's worth repeating – in the case of simple boolean functions, we understand the behavior exactly and have precise formulae to represent them. So, creating a neural network is superfluous. So, what make's it interesting is that we may want to deal with realworld noisy x
and y
values that don't fit into the theoretical bounds (err, exact boolean values) as required.
The premise of (artificial) neural networks is to save us from the trouble of finding specific and precise solutions for arbitrary problems. Just like humans learn and acquire new knowledge and skills.
Imagine if we fed the above truth tables to a black box, with x and y as our input, and z as the expected outcome. And in turn, this black box learnt the rules and then readied itself to respond with z values for any combination of x and y we threw at it – as long as those input values stayed within reasonable bounds. The definition of reasonable does not exist  it's subjective and is only meant to satisfy the human(s) that approved of that black box's behaviour at some point.
Not hard, right? We could cache all combinations of the input and corresponding output, and respond back with the right values. That's eminently doable for the small size of the training dataset we have. But it breaks down miserably when we unconstrain the input values (add some noise), or even deal with unforeseen input values that are different from the training data by wider margins.
As a first step towards creating such an entity, we can approximate the black box as a linear regression. And training it as an activity of solving this linear regression. The general equation looks as follows, for the two inputsignal scenario
z = (w_{x} x + w_{y} y ≥ c)
Or, making it somewhat more general and rearranging terms to be on one side
z = (c + Σ w_{i} · x_{i} ≥ 0)
Which can again be rewritten more generally as
Z = (c + W^{T} · X ≥ 0)
Where W and X are the weight and input matrices respectively.
Let's write some helper code to see this in action with real inputs
(defn or' [x y]
(or (= x 1) (= y 1)))
(defn and' [x y]
(and (= x 1) (= y 1)))
(defn xor [x y]
(not= x y))
x  y 

1  0 
1  1 
0  0 
0  1 
(map (fn [[x y]] (list x y (or' x y))) inputtable)
1  0  true 
1  1  true 
0  0  false 
0  1  true 
(map (fn [[x y]] (list x y (and' x y))) inputtable)
1  0  false 
1  1  true 
0  0  false 
0  1  false 
(map (fn [[x y]] (list x y (xor x y))) inputtable)
1  0  true 
1  1  false 
0  0  false 
0  1  true 
References

A Neural Network Primer  by David W. Croft
This is an OR table. It probably didn't take you long to realize. But how did you recognize? Thinking about how you did it can be a great teacher in understanding how ANNs may have evolved, and allowing you to make sense of the landscape somewhat better.