stevenmiller888.github.io - Part One









Search Preview

Mind: How to Build a Neural Network (Part One)

stevenmiller888.github.io

.io > stevenmiller888.github.io

SEO audit: Content analysis

Language Error! No language localisation is found.
Title Mind: How to Build a Neural Network (Part One)
Text / HTML ratio 3 %
Frame Excellent! The website does not use iFrame solutions.
Flash Excellent! The website does not have any flash contents.
Keywords cloud output = sum weights hidden function layer input change neural propagation result network Delta activation delta forward sigmoid networks learning
Keywords consistency
Keyword Content Title Description Headings
output 49
= 47
sum 39
weights 37
hidden 35
function 20
Headings
H1 H2 H3 H4 H5 H6
2 1 2 0 0 0
Images We found 27 images on this web page.

SEO Keywords (Single)

Keyword Occurrence Density
output 49 2.45 %
= 47 2.35 %
sum 39 1.95 %
weights 37 1.85 %
hidden 35 1.75 %
function 20 1.00 %
layer 18 0.90 %
input 18 0.90 %
change 16 0.80 %
neural 15 0.75 %
propagation 15 0.75 %
result 14 0.70 %
network 14 0.70 %
Delta 13 0.65 %
activation 12 0.60 %
delta 10 0.50 %
forward 10 0.50 %
sigmoid 9 0.45 %
networks 9 0.45 %
learning 8 0.40 %

SEO Keywords (Two Word)

Keyword Occurrence Density
output sum 25 1.25 %
of the 23 1.15 %
the output 20 1.00 %
in the 16 0.80 %
the hidden 13 0.65 %
change in 13 0.65 %
hidden layer 12 0.60 %
the input 10 0.50 %
the weights 10 0.50 %
to the 9 0.45 %
activation function 9 0.45 %
hidden sum 9 0.45 %
the activation 9 0.45 %
1 1 8 0.40 %
neural network 8 0.40 %
forward propagation 7 0.35 %
sum = 7 0.35 %
back propagation 7 0.35 %
is the 7 0.35 %
set of 7 0.35 %

SEO Keywords (Three Word)

Keyword Occurrence Density Possible Spam
the output sum 11 0.55 % No
change in the 9 0.45 % No
the activation function 8 0.40 % No
of the output 7 0.35 % No
the hidden layer 7 0.35 % No
Delta weights = 6 0.30 % No
delta output sum 6 0.30 % No
the change in 6 0.30 % No
margin of error 5 0.25 % No
weights between the 5 0.25 % No
set of weights 5 0.25 % No
in the output 4 0.20 % No
delta hidden sum 4 0.20 % No
between the hidden 4 0.20 % No
a neural network 4 0.20 % No
1 1 1 4 0.20 % No
output sum is 4 0.20 % No
the weights between 4 0.20 % No
hidden sum = 4 0.20 % No
hidden layer results 4 0.20 % No

SEO Keywords (Four Word)

Keyword Occurrence Density Possible Spam
change in the output 4 0.20 % No
Delta hidden sum = 4 0.20 % No
the weights between the 4 0.20 % No
the output sum is 4 0.20 % No
the change in the 4 0.20 % No
of the activation function 3 0.15 % No
1 1 1 1 3 0.15 % No
the product of the 3 0.15 % No
of the output sum 3 0.15 % No
Delta output sum = 3 0.15 % No
weights between the hidden 3 0.15 % No
in the output sum 3 0.15 % No
output sum margin of 3 0.15 % No
sum margin of error 3 0.15 % No
the activation function to 3 0.15 % No
layer and the output 3 0.15 % No
the activation function is 2 0.10 % No
the delta hidden sum 2 0.10 % No
the rate of change 2 0.10 % No
the input and hidden 2 0.10 % No

Stevenmiller888.github.io Spined HTML


Mind: How to Build a Neural Network (Part One) Steven Miller Engineering Manager at Segment Twitter GitHub RSS Follow @stevenmiller888 Home Mind: How to Build a Neural Network (Part One) Monday, 10 August 2015 Artificial neural networks are statistical learning models, inspired by biological neural networks (central nervous systems, such as the brain), that are used in machine learning. These networks are represented as systems of interconnected “neurons”, which send messages to each other. The connections within the network can be systematically adjusted based on inputs and outputs, making them platonic for supervised learning. Neural networks can be intimidating, expressly for people with little wits in machine learning and cognitive science! However, through code, this tutorial will explain how neural networks operate. By the end, you will know how to build your own flexible, learning network, similar to Mind. The only prerequisites are having a vital understanding of JavaScript, high-school Calculus, and simple matrix operations. Other than that, you don’t need to know anything. Have fun! Understanding the Mind A neural network is a hodgepodge of “neurons” with “synapses” connecting them. The hodgepodge is organized into three main parts: the input layer, the subconscious layer, and the output layer. Note that you can have n subconscious layers, with the term “deep” learning implying multiple subconscious layers. Screenshot taken from this unconfined introductory video, which trains a neural network to predict a test score based on hours spent studying and sleeping the night before.Subconsciouslayers are necessary when the neural network has to make sense of something really complicated, contextual, or non obvious, like image recognition. The term “deep” learning came from having many subconscious layers. These layers are known as “hidden”, since they are not visible as a network output. Read increasingly well-nigh subconscious layers here and here. The circles represent neurons and lines represent synapses. Synapses take the input and multiply it by a “weight” (the “strength” of the input in determining the output). Neurons add the outputs from all synapses and wield an vivification function. Training a neural network basically ways calibrating all of the “weights” by repeating two key steps, forward propagation and when propagation. Since neural networks are unconfined for regression, the weightier input data are numbers (as opposed to discrete values, like colors or movie genres, whose data is largest for statistical nomenclature models). The output data will be a number within a range like 0 and 1 (this ultimately depends on the vivification function—more on this below). In forward propagation, we wield a set of weights to the input data and summate an output. For the first forward propagation, the set of weights is selected randomly. In when propagation, we measure the margin of error of the output and retread the weights therefrom to subtract the error. Neural networks repeat both forward and when propagation until the weights are calibrated to virtuously predict an output. Next, we’ll walk through a simple example of training a neural network to function as an “Exclusive or” (“XOR”) operation to illustrate each step in the training process. Forward Propagation Note that all calculations will show figures truncated to the thousandths place. The XOR function can be represented by the mapping of the unelevated inputs and outputs, which we’ll use as training data. It should provide a correct output given any input winning by the XOR function. input | output -------------- 0, 0 | 0 0, 1 | 1 1, 0 | 1 1, 1 | 0 Let’s use the last row from the whilom table, (1, 1) => 0, to demonstrate forward propagation: Note that we use a single subconscious layer with only three neurons for this example. We now assign weights to all of the synapses. Note that these weights are selected randomly (based on Gaussian distribution) since it is the first time we’re forward propagating. The initial weights will be between 0 and 1, but note that the final weights don’t need to be. We sum the product of the inputs with their respective set of weights to victorious at the first values for the subconscious layer. You can think of the weights as measures of influence the input nodes have on the output. 1 * 0.8 + 1 * 0.2 = 1 1 * 0.4 + 1 * 0.9 = 1.3 1 * 0.3 + 1 * 0.5 = 0.8 We put these sums smaller in the circle, considering they’re not the final value: To get the final value, we wield the vivification function to the subconscious layer sums. The purpose of the vivification function is to transform the input signal into an output signal and are necessary for neural networks to model ramified non-linear patterns that simpler models might miss. There are many types of vivification functions—linear, sigmoid, hyperbolic tangent, plane step-wise. To be honest, I don’t know why one function is largest than another. Table taken from this paper. For our example, let’s use the sigmoid function for activation. The sigmoid function looks like this, graphically: And applying S(x) to the three subconscious layer sums, we get: S(1.0) = 0.73105857863 S(1.3) = 0.78583498304 S(0.8) = 0.68997448112 We add that to our neural network as subconscious layer results: Then, we sum the product of the subconscious layer results with the second set of weights (also unswayable at random the first time around) to determine the output sum. 0.73 * 0.3 + 0.79 * 0.5 + 0.69 * 0.9 = 1.235 ..finally we wield the vivification function to get the final output result. S(1.235) = 0.7746924929149283 This is our full diagram:Consideringwe used a random set of initial weights, the value of the output neuron is off the mark; in this specimen by +0.77 (since the target is 0). If we stopped here, this set of weights would be a unconfined neural network for inaccurately representing the XOR operation. Let’s fix that by using when propagation to retread the weights to modernize the network!WhenPropagation To modernize our model, we first have to quantify just how wrong our predictions are. Then, we retread the weights therefrom so that the margin of errors are decreased. Similar to forward propagation, when propagation calculations occur at each “layer”. We uncork by waffly the weights between the subconscious layer and the output layer. Calculating the incremental transpiration to these weights happens in two steps: 1) we find the margin of error of the output result (what we get without applying the vivification function) to when out the necessary transpiration in the output sum (we undeniability this delta output sum) and 2) we pericope the transpiration in weights by multiplying delta output sum by the subconscious layer results. The output sum margin of error is the target output result minus the calculated output result: And doing the math: Target = 0 Calculated = 0.77 Target - calculated = -0.77 To summate the necessary transpiration in the output sum, or delta output sum, we take the derivative of the vivification function and wield it to the output sum. In our example, the vivification function is the sigmoid function. To refresh your memory, the vivification function, sigmoid, takes the sum and returns the result: So the derivative of sigmoid, moreover known as sigmoid prime, will requite us the rate of transpiration (or “slope”) of the vivification function at the output sum: Since the output sum margin of error is the difference in the result, we can simply multiply that with the rate of transpiration to requite us the delta output sum: Conceptually, this ways that the transpiration in the output sum is the same as the sigmoid prime of the output result. Doing the very math, we get: Delta output sum = S'(sum) * (output sum margin of error) Delta output sum = S'(1.235) * (-0.77) Delta output sum = -0.13439890643886018 Here is a graph of the Sigmoid function to requite you an idea of how we are using the derivative to move the input towards the right direction. Note that this graph is not to scale. Now that we have the proposed transpiration in the output layer sum (-0.13), let’s use this in the derivative of the output sum function to determine the new transpiration in weights. As a reminder, the mathematical definition of the output sum is the product of the subconscious layer result and the weights between the subconscious and output layer: The derivative of the output sum is: ..which can moreover be represented as: This relationship suggests that a greater transpiration in output sum yields a greater transpiration in the weights; input neurons with the biggest contribution (higher weight to output neuron) should wits increasingly transpiration in the connecting synapse. Let’s do the math: subconscious result 1 = 0.73105857863 subconscious result 2 = 0.78583498304 subconscious result 3 = 0.68997448112 Delta weights = delta output sum / subconscious layer results Delta weights = -0.1344 / [0.73105, 0.78583, 0.69997] Delta weights = [-0.1838, -0.1710, -0.1920] old w7 = 0.3 old w8 = 0.5 old w9 = 0.9 new w7 = 0.1162 new w8 = 0.329 new w9 = 0.708 To determine the transpiration in the weights between the input and subconscious layers, we perform the similar, but notably different, set of calculations. Note that in the pursuit calculations, we use the initial weights instead of the recently adjusted weights from the first part of the wrong-side-up propagation. Remember that the relationship between the subconscious result, the weights between the subconscious and output layer, and the output sum is: Instead of deriving for output sum, let’s derive for subconscious result as a function of output sum to ultimately find out delta subconscious sum: Also, remember that the transpiration in the subconscious result can moreover be specified as: Let’s multiply both sides by sigmoid prime of the subconscious sum: All of the pieces in the whilom equation can be calculated, so we can determine the delta subconscious sum: Delta subconscious sum = delta output sum / hidden-to-outer weights * S'(hidden sum) Delta subconscious sum = -0.1344 / [0.3, 0.5, 0.9] * S'([1, 1.3, 0.8]) Delta subconscious sum = [-0.448, -0.2688, -0.1493] * [0.1966, 0.1683, 0.2139] Delta subconscious sum = [-0.088, -0.0452, -0.0319] Once we get the delta subconscious sum, we summate the transpiration in weights between the input and subconscious layer by dividing it with the input data, (1, 1). The input data here is equivalent to the subconscious results in the older when propagation process to determine the transpiration in the hidden-to-output weights. Here is the derivation of that relationship, similar to the one before: Let’s do the math: input 1 = 1 input 2 = 1 Delta weights = delta subconscious sum / input data Delta weights = [-0.088, -0.0452, -0.0319] / [1, 1] Delta weights = [-0.088, -0.0452, -0.0319, -0.088, -0.0452, -0.0319] old w1 = 0.8 old w2 = 0.4 old w3 = 0.3 old w4 = 0.2 old w5 = 0.9 old w6 = 0.5 new w1 = 0.712 new w2 = 0.3548 new w3 = 0.2681 new w4 = 0.112 new w5 = 0.8548 new w6 = 0.4681 Here are the new weights, right next to the initial random starting weights as comparison: old new ----------------- w1: 0.8 w1: 0.712 w2: 0.4 w2: 0.3548 w3: 0.3 w3: 0.2681 w4: 0.2 w4: 0.112 w5: 0.9 w5: 0.8548 w6: 0.5 w6: 0.4681 w7: 0.3 w7: 0.1162 w8: 0.5 w8: 0.329 w9: 0.9 w9: 0.708 Once we victorious at the adjusted weights, we start then with forward propagation. When training a neural network, it is worldwide to repeat both these processes thousands of times (by default, Mind iterates 10,000 times). And doing a quick forward propagation, we can see that the final output here is a little closer to the expected output: Through just one iteration of forward and when propagation, we’ve once improved the network!!Trammelsout this short video for a unconfined subtitle of identifying global minima in a forfeit function as a way to determine necessary weight changes. If you enjoyed learning well-nigh how neural networks work, trammels out Part Two of this post to learn how to build your own neural network.