https://www.geeksforgeeks.org/what-is-data-normalization/
part of the preprocessing stage of any problem. It is scaling the data to be analyzed in a specific range to provide better results. Data Normalization disposes of various anomalies that can make an examination of the information more complicated. It also makes data more clustered
https://www.youtube.com/watch?v=sxEqtjLC0aM
scaling can be though of as just changing the values of the data where the ratios are the same but the actual values are smaller.
scale it down perfectly, and you get normalization where the data is in smaller ranges and is not affected in its ratios.
There are a few normalization techniques
-
min max normalizaiton. The formula is (x β x_min) / (x_max β x_min)
the data is between 0 and 1
-
Z score / standardization. Formula is (x β x_mean)/(std). Subtract x by x mean and divide by standard deviation.
a misconception beginners have is that standardization distributions are normal distributions. This is not even true!
Standardization is important for :
-
convergence. when data is being converged at the same time. one dataset is huge! Another is tiny tiny small, as a result to keep both datasets read within the same time, the stepsize for the small one is small, the stepsize for the huge one is huge causing unwanted oscilization during convergence.
-
Computing distance correctly. You have an algorithm that gets distance. But you see the x axis is huge! If you measure here, then your result will be too big and the x is too big relative to y. If you donβt scale both features to have the same distance scale, then you will have data that is compares on the wrong axis/features.
left is before scale. Right is after