🍈 Zettelkasten

Home

❯

Machine Learning

❯

Data Normalization

Jun 05, 20252 min read

math
data_science
machine_learning

https://www.geeksforgeeks.org/what-is-data-normalization/

part of the preprocessing stage of any problem. It is scaling the data to be analyzed in a specific range to provide better results. Data Normalization disposes of various anomalies that can make an examination of the information more complicated. It also makes data more clustered

https://www.youtube.com/watch?v=sxEqtjLC0aM

scaling can be though of as just changing the values of the data where the ratios are the same but the actual values are smaller.

scale it down perfectly, and you get normalization where the data is in smaller ranges and is not affected in its ratios.

There are a few normalization techniques

min max normalizaiton. The formula is (x – x_min) / (x_max – x_min)

the data is between 0 and 1
Z score / standardization. Formula is (x – x_mean)/(std). Subtract x by x mean and divide by standard deviation.

a misconception beginners have is that standardization distributions are normal distributions. This is not even true!

Standardization is important for :

convergence. when data is being converged at the same time. one dataset is huge! Another is tiny tiny small, as a result to keep both datasets read within the same time, the stepsize for the small one is small, the stepsize for the huge one is huge causing unwanted oscilization during convergence.
Computing distance correctly. You have an algorithm that gets distance. But you see the x axis is huge! If you measure here, then your result will be too big and the x is too big relative to y. If you don’t scale both features to have the same distance scale, then you will have data that is compares on the wrong axis/features.

left is before scale. Right is after

Graph View

GitHub
Discord Community

🍈 Zettelkasten

Explorer

Data Normalization

Graph View