softmax

#math
Softmax is a mathematical function that converts a vector of real numbers into a probability distribution. It is commonly used in machine learning and statistics, especially in tasks where you want to interpret the outputs of a model as probabilities.

Given an input vector $(z = (z_{1}, z_{2}, \dots, z_{k}))$ of ( k ) real numbers, the softmax function computes the output vector $(σ (z) = (σ (z_{1}), σ (z_{2}), \dots, σ (z_{k})))$ where each component $(σ (z_{i}))$ is calculated as follows:

[σ (z_{i}) = \frac{e^{z_{i}}}{\sum_{j = 1}^{k} e^{z_{j}}}]

In other words:

[σ (z_{i}) = \frac{e^{z_{i}}}{e^{z_{1}} + e^{z_{2}} + \dots + e^{z_{k}}}]

Here's a step-by-step breakdown of how softmax works:

Exponentiation: Compute the exponential of each element in the input vector ( z ). This results in a new vector .

(e^{z} = (e^{z_{1}}, e^{z_{2}}, \dots, e^{z_{k}}))

Summation: Calculate the sum of all elements in the new vector $(e^{z})$ , denoted as $(\sum_{j = 1}^{k} e^{z_{j}})$ .
Normalization: For each element $(z_{i})$ in the input vector $(z)$ , divide $(e^{z_{i}})$ by the sum $(\sum_{j = 1}^{k} e^{z_{j}})$ obtained in the previous step. This normalization ensures that the output vector $(σ (z))$ represents a valid probability distribution, where each element is between 0 and 1, and the sum of all elements is equal to 1.

The softmax function is useful in various machine learning tasks, including classification problems. It transforms the raw scores (logits) produced by a model into probabilities that can be interpreted as the model's confidence scores for each class. The class with the highest probability according to the softmax output is typically chosen as the predicted class.

In summary, softmax function is defined as:

σ (z_{i}) = \frac{e^{z_{i}}}{\sum_{j = 1}^{k} e^{z_{j}}}

for $(i = 1, 2, \dots, k)$ , where $(z = (z_{1}, z_{2}, \dots, z_{k}))$ is the input vector and $(σ (z))$ is the output vector representing probabilities.