Tuesday, March 17, 2026

Neural networks: derivation and solutions

The framework, not a single algorithm

A neural network is best understood as a framework for an algorithm rather than a single, fixed procedure. The “derivation” is the process of setting up a mathematical system that can learn from data, and the “solution” emerges through iterative refinement.

The building block: the artificial neuron

We start by mimicking a biological neuron mathematically. Each neuron receives multiple inputs x₁, x₂, …, xₙ, each multiplied by a corresponding weight w₁, w₂, …, wₙ that represents the connection’s strength. These products are summed together with a bias term b, producing a weighted sum. This sum is then passed through an activation function f—such as ReLU or sigmoid—which decides whether the neuron “fires” and introduces non‑linearity. The output of a single neuron is therefore f(w₁x₁ + w₂x₂ + … + wₙxₙ + b).

The architecture: stacking the math

Layers of neurons are composed by treating each layer’s outputs as the inputs to the next. The input layer receives raw data, such as image pixels. Hidden layers perform successive matrix multiplications and activation operations; mathematically, layer 2 is a function of layer 1: L₂ = f(W·L₁ + B). The final output layer produces the network’s prediction, for example a probability distribution over classes. The entire network is thus a single, highly composite function F(x) that maps an input x to an output y.

The goal: the loss function

To measure how well the network performs, we define a loss function L (e.g., mean squared error or cross‑entropy) that compares the predicted output y with the true target. This loss creates a high‑dimensional error surface—a landscape that depends on every weight and bias in the network. The goal of training is to find the configuration of weights that minimises this loss.

The derivation: calculus and gradient descent

The key step is to determine how each weight influences the loss. Using calculus, we compute the gradient of the loss with respect to every weight—the derivative that points in the direction of steepest increase. By moving the weights in the opposite direction (downhill), we reduce the error. This computation is made efficient by the backpropagation algorithm, which applies the chain rule of calculus to propagate error gradients backward through the network’s layers. The gradient tells us exactly how much each weight contributed to the final mistake.

The creation of the solution: iteration

With the gradient in hand, we repeatedly perform a simple loop:

– Forward pass: run the data through the network to obtain a prediction.
– Loss calculation: measure the error using the loss function.
– Backward pass: compute the gradients via backpropagation.
– Update: adjust every weight slightly in the direction that lowers the loss (the negative gradient).

This cycle repeats thousands or millions of times, each step incrementally improving the network. Gradually, the weights settle into a configuration where the overall function F(x) maps inputs to accurate outputs. The network never follows a hand‑coded set of rules; instead, it is sculpted by mathematics and data into a solution that generalises to new examples.


In essence, neural networks derive solutions by defining a flexible mathematical framework, measuring its mistakes with a loss function, and using calculus to iteratively adjust its internal parameters until the mistakes are minimised.

No comments:

Post a Comment

Jyotisha: Iran – Vimshottari Dasha (1978–2030) Jyotisha & the Dasha of Iran (1978 – 2030 · Vimsh...