Thursday, August 30, 2018

data normalization

Data normalization is very important preprocessing step, used to rescale numeric values to fit in a specific range to assure better convergence during backpropagation. If we don’t do this then some of the features (those with high magnitude) will be weighted more in the cost function. The data normalization makes numeric attributes weighted equally.
source: Data Science and Machine Learning Interview Questions

Common Normalization Functions

Z-Score

Converts all values to a z-score. The values in the column are transformed using the following formula:
normalization using z-scores
Mean and standard deviation are computed for each column separately. Population standard deviation is used.

Min-Max

The min-max normalizer linearly rescales every feature to the [0,1] interval. Rescaling to the [0,1] interval is done by shifting the values of each feature so that the minimal value is 0, and then dividing by the new maximal value (which is the difference between the original maximal and minimal values). The values in the column are transformed using the following formula:
normalization using the min-max function

Logarithmic

The values in the column are transformed using the following formula:
formula for normalization by logistic function

Hyperbolic Tangent

All values are converted to a hyperbolic tangent. The values in the column are transformed using the following formula:
normalization using the tanh function

source: Normalize Data (Microsoft Azure)

Here is a graphic of the hyperbolic tangent function for real values of its argument.

No comments:

Post a Comment