Basics Tidyverse is a set of R libraries that enables the best methods for Data Management. I will use the tidyverse libraries to perform cluster analysis and provide this information to other data science teams in the industry. library(devtools) install_github("kassambara/factoextra") Introduction to R Data Preparation and R Packages Required Packages dplyr tidyr testthat cluster factoextra Data Standardization We need the ability to transform vectors in our data frames to standard variables.
Overview The Central Limit Theorem states that when samples of a population are large, the sampliing distribution will take the shape of a normal distribution regardless of the shape of the population from which the sample was drawn. This is proven out through the simulation below that projects the theoretical mean of the exponential distribution compared to the sampling. The variance between the theorectical mean, and the sample mean is .
Overview A data set has been created from an experiment in 1952 which demonstrates the impact of Vitamin C, on the growth of guinea pigs teeth. The response is the length of teeth in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid). The analysis below works to determine if the two supplement types have different impacts on growth of the guinea pig’s teeth.
Utilizing Natural Language Processing to predict text is interesting. I built a small little tool to test the NLP process. The video shows how to use this tool to predict text. I embedded this video within the application I wrote that utilizes the Natural Language Processing model I built to predict the next text word after you put them in place. I developed this when I was in the Data Science Capstone on-line course with John Hopkins University.
Neo4j in the language R I have been fortunate to work with the Neo4j graph database using the language R over the last two years. A former data scientist that worked for Neo4j, Nicole White, implemented the package RNeo4j and released it in 2015. This solution enabled R developers and data scientists to access the Neo4j database in the R language. The package uses the Neo4j REST API and then process the data in a data.