Try Mnist
Artificial Neural Network的尝试, 资料来源1
Perceptrons 神经元
1 Orignal representation
- inputs 输入: $x_1, x_2, …,$ inputs number
- weights 权重: $w_1,w_2,…,$ importance of the respective inputs
- threshold 阈值: decide if neuron should be activated
$$ \text{output} = \begin{cases} 0 &\text{if}&\sum_j w_j x_j \leq threshold \newline 1 &\text{if}&\sum_j w_j x_j \leq threshold \end{cases} \tag{1} $$
2 Simplification 简化
move threshold to another side of the inequality:
$$ \text{output} = \begin{cases} 0 & \text{if}&w\cdot x + b \leq 0 \newline 1 & \text{if}&w\cdot x + b \gt 0 \end{cases} \tag {2} $$
Generate basical logical function 生成逻辑函数
perceptrons can be used is to compute the elementary logical functions we usually think of as underlying computation, functions such as AND, OR, and NAND.
1 NAND 与非门
For example, suppose we have a perceptron with two inputs, each with weight −2, and an overall bias of 3. Here’s our perceptron:
Then we see that input 00 produces output 1, since $(−2)\times0+(−2)\times0+3=3$ is positive.
Similar calculations show that the inputs 01 and 10 produce output 1.
But the input 11 produces output 0, since $(−2)\times1+(−2)\times1+3=−1$ is negative.
Sigmoid function
1 Definition 定义
$$ \sigma(z) \equiv \frac{1}{1+e^{-z}}\tag{3} $$
2 With z 带入z
$$ \frac{1}{1+\exp(-\sum_j w_j x_j-b)}\tag{4} $$
Total differential 全微分
$$ \Delta \text{output} \approx \sum_j \frac{\partial,\text{output}}{\partial w_j}\Delta w_j + \frac{\partial,\text{output}}{\partial b} \Delta b, \tag{5} $$
Cost function 损失函数
To quantify how well we’re achieving this goal we define a cost function
Sometimes referred to as a loss or objective function.
$$ C(w,b) \equiv\frac{1}{2n} \sum_x | y(x) - a|^2\tag{6} $$
For example, if a particular training image, $x$, depicts a $6$, then $y(x)=(0,0,0,0,0,0,1,0,0,0)^T$ is the desired output from the network.
We’ll call $C$ the quadratic cost function; it’s also sometimes known as the mean squared error or just MSE.
1 Gradient descent 梯度下降
The aim of our training algorithm will be to minimize the cost $C(w,b)$ as a function of the weights and biases. In other words, we want to find a set of weights and biases which make the cost as small as possible. We’ll do that using an algorithm known as gradient descent.