About ME

About ME

Github | LinkedIn

EDUCATION

Applied Data Science, Master of Science

University of Michigan, Dec 2022

Entrepreneurship, Bachelor of Science

University of North Carolina at Greensboro, Dec 2015

SKILLS

  • Programming Languages: Python (Scikit-Learn, Pandas, Matplotlib, TensorFlow, NLTK), JAVA, C++, SQL
  • Analytical Software: Visual Studio, SQL Server, PowerBI, Tableau, JIRA, Confluence, GitHub, Salesforce
  • Proficiency: Data Mining, Data Manipulation, Data Visualization, Big Data Analysis, RESTful APIs, Machine Learning, Natural Language Processing, Project Management

PROFESSIONAL EXPERIENCES

Cybersecurity CISO Intern

IBM | RTP, NC | May 2022 - Aug 2022

  • Collaborated with IBM subject matter experts, researched and supported various data extraction, data formatting and automation development activities.
  • Designed and built a multi-platform synchronization and automation tool independently using Python conducting and maintaining IT audit certifications.
  • Implemented 4+ custom RESTful APIs using Python and VSCode for information extraction and retrieval.
  • Presented bi-weekly progress updates to the entire business including senior executives, with impactful presentations.
  • IT Intern

    BCBS Michigan | Remote | Jun 2021 - Oct 2021

    PROJECTS

    New Restaurant Location Recommender for Orlando, FL

    Python, Data visualization, EDA, Feature Engineering, KNN

    • Led a team of 3 members to analyze 4 datasets with data related to restaurant features, order history, population demographics, neighbourhood and crime using Python Pandas.
    • Conducted exploratory data analysis, imputed missing values and treated outliers, and engineered composite features.
    • Visualized trends and correlations in the data with Matplotlib, inferred data-driven insights for restaurant operations.
    • Classified restaurants based on 20+ features using KNN algorithm, proposed new restaurant location recommendations and showcased them on a map.

    Post-COVID Single Family Home Value Prediction in Wake County, NC

    Python, Data Mining, EDA, Clustering, Regression

    • Led a team of 3 members to predict impact of COVID-19 on the housing prices, with an Agile approach within 2 months.
    • Explored housing sales, census, crime, Covid cases, housing demand, mortgages, commodity indexes, unemployment and interest rates data in Python, identified 70 relevant features with correlation \& descriptive plots with Plotly.
    • Implemented PCA for feature dimensionality reduction and Kmeans, DBSCAN, Agglomerative Clustering for unsupervised learning to determine hidden patterns in the data, evaluated the clusters visually and selected DBSCAN due to feasibility and meaningfulness of the clusters.
    • Built Linear Regression, Lasso, Ridge, KNN, Decision Tree, Gradient Boosted Tree and Random Forest machine learning models from Sklearn to predict the price of single-family homes, achieved lowest RMSE of 0.136 with Random Forest.

    When fact is false and false is funny: Automated detection of fake and satire news

    Python, Text Mining, NLP, Classification

    • Investigated 3 data sources (80k+ data) with labeled Real/Fake, Real/Satire, Fake/Satire articles respectively for the analysis.
    • Generated unigrams and bigrams from the article text using Scikit-learn’s TfidfVectorizer, post removal of English stopwords, punctuation & non-English words using NLTK and engineered 14 lexical categories using Empath
    • Constructed Logistic Regression and LinearSVC classifers based on Article terms and punctuation keywords for Real/Fake news detection, selected Logistic Regression with F1-score of 0.76 respectively.
    • Developed Multinomial Naive Bayes, Logistic Regression Classifier, and Random Forest Classifiers to predict Satire news articles, selected Logistic Regression with 94% accuracy.
    • Produced and optimized a Logistic Regression model using article terms and titles to classify Satire/Fake news with 78\% accuracy.