softmax
#math
Softmax is a mathematical function that converts a vector of real numbers into a probability distribution. It is commonly used in machine learning and statistics, especially in tasks where you want to interpret the outputs of a model as probabilities.
Given an input vector
In other words:
Here's a step-by-step breakdown of how softmax works:
- Exponentiation: Compute the exponential of each element in the input vector ( z ). This results in a new vector .
-
Summation: Calculate the sum of all elements in the new vector
, denoted as . -
Normalization: For each element
in the input vector , divide by the sum obtained in the previous step. This normalization ensures that the output vector represents a valid probability distribution, where each element is between 0 and 1, and the sum of all elements is equal to 1.
The softmax function is useful in various machine learning tasks, including classification problems. It transforms the raw scores (logits) produced by a model into probabilities that can be interpreted as the model's confidence scores for each class. The class with the highest probability according to the softmax output is typically chosen as the predicted class.
In summary, softmax function is defined as:
for