Science

Cluster Analysis

Basics Tidyverse is a set of R libraries that enables the best methods for Data Management. I will use the tidyverse libraries to perform cluster analysis and provide this information to other data science teams in the industry. library(devtools) install_github("kassambara/factoextra") Introduction to R Data Preparation and R Packages Required Packages dplyr tidyr testthat cluster factoextra Data Standardization We need the ability to transform vectors in our data frames to standard variables.

Continue reading

Statistical Analysis: Vitamin C of Tooth Growth on Guinea Pigs

Overview A data set has been created from an experiment in 1952 which demonstrates the impact of Vitamin C, on the growth of guinea pigs teeth. The response is the length of teeth in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid). The analysis below works to determine if the two supplement types have different impacts on growth of the guinea pig’s teeth.

Continue reading

Applying Natural Language Processing to Predict

Utilizing Natural Language Processing to predict text is interesting. I built a small little tool to test the NLP process. The video shows how to use this tool to predict text. I embedded this video within the application I wrote that utilizes the Natural Language Processing model I built to predict the next text word after you put them in place. I developed this when I was in the Data Science Capstone on-line course with John Hopkins University.

Continue reading