EDUCATION
Applied Data Science, Master of Science
University of Michigan, Dec 2022
Entrepreneurship, Bachelor of Science
University of North Carolina at Greensboro, Dec 2015
SKILLS
- Programming Languages: Python (Scikit-Learn, Pandas, Matplotlib, TensorFlow, NLTK), R, SQL, SAS
- Analytical Software: Visual Studio, SQL Server, PowerBI, Tableau, JIRA, Confluence, GitHub, Salesforce
- Proficiency: Data Mining, Data Manipulation, Data Visualization, Big Data Analysis, RESTful APIs, Machine Learning, Natural Language Processing, Project Management
PROFESSIONAL EXPERIENCES
Cybersecurity CISO Intern
IBM | RTP, NC | May 2022 - Aug 2022
IT Intern
BCBS Michigan | Remote | Jun 2021 - Oct 2021
PROJECTS
New Restaurant Location Recommender for Orlando, FL
Python, Data visualization, EDA, Feature Engineering, KNN
- Led a team of 3 members to analyze 4 datasets with data related to restaurant features, order history, population demographics, neighbourhood and crime using Python Pandas.
- Conducted exploratory data analysis, imputed missing values and treated outliers, and engineered composite features.
- Visualized trends and correlations in the data with Matplotlib, inferred data-driven insights for restaurant operations.
- Classified restaurants based on 20+ features using KNN algorithm, proposed new restaurant location recommendations and showcased them on a map.
Post-COVID Single Family Home Value Prediction in Wake County, NC
Python, Data Mining, EDA, Clustering, Regression
- Led a team of 3 members to predict impact of COVID-19 on the housing prices, with an Agile approach within 2 months.
- Explored housing sales, census, crime, Covid cases, housing demand, mortgages, commodity indexes, unemployment and interest rates data in Python, identified 70 relevant features with correlation \& descriptive plots with Plotly.
- Implemented PCA for feature dimensionality reduction and Kmeans, DBSCAN, Agglomerative Clustering for unsupervised learning to determine hidden patterns in the data, evaluated the clusters visually and selected DBSCAN due to feasibility and meaningfulness of the clusters.
- Built Linear Regression, Lasso, Ridge, KNN, Decision Tree, Gradient Boosted Tree and Random Forest machine learning models from Sklearn to predict the price of single-family homes, achieved lowest RMSE of 0.136 with Random Forest.
When fact is false and false is funny: Automated detection of fake and satire news
Python, Text Mining, NLP, Classification
- Investigated 3 data sources (80k+ data) with labeled Real/Fake, Real/Satire, Fake/Satire articles respectively for the analysis.
- Generated unigrams and bigrams from the article text using Scikit-learn’s TfidfVectorizer, post removal of English stopwords, punctuation & non-English words using NLTK and engineered 14 lexical categories using Empath
- Constructed Logistic Regression and LinearSVC classifers based on Article terms and punctuation keywords for Real/Fake news detection, selected Logistic Regression with F1-score of 0.76 respectively.
- Developed Multinomial Naive Bayes, Logistic Regression Classifier, and Random Forest Classifiers to predict Satire news articles, selected Logistic Regression with 94% accuracy.
- Produced and optimized a Logistic Regression model using article terms and titles to classify Satire/Fake news with 78\% accuracy.