Statistics

Cluster Analysis

Basics Tidyverse is a set of R libraries that enables the best methods for Data Management. I will use the tidyverse libraries to perform cluster analysis and provide this information to other data science teams in the industry. library(devtools) install_github("kassambara/factoextra") Introduction to R Data Preparation and R Packages Required Packages dplyr tidyr testthat cluster factoextra Data Standardization We need the ability to transform vectors in our data frames to standard variables.

Continue reading

Statistics: Central Limit Theorem

Overview The Central Limit Theorem states that when samples of a population are large, the sampliing distribution will take the shape of a normal distribution regardless of the shape of the population from which the sample was drawn. This is proven out through the simulation below that projects the theoretical mean of the exponential distribution compared to the sampling. The variance between the theorectical mean, and the sample mean is .

Continue reading