Data Projects
In this section I present personal projects that are based on public information. These projects highlight my ability to work with open data and reflect my commitment to transparency and collaboration. For a deeper insight into my work, I invite you to visit my GitHub profile. If you have any questions or are interested in discussing job opportunities, contact me.
Employee Turnover - Financial Industry
This is a People Analytics project focused on employee turnover. The goal is to identify the main variables that contribute to an employee's departure.
The following steps were performed for this project:
Definition of the type of termination to be analyzed
Extraction of variables to analyze (COVID outliers)
Analysis of the variable to predict
Logistic regression and importance of variables
Sentiment Analysis MC songs
Miley Cyru's discography letters were extracted with genius and stop words with tidytext. We searched for the song with the lowest sentiment score.
For the development of this project, the following steps were carried out:
Extraction of song lyrics by album
Removal of stop words and creation of a list of words to extract
Scoring each word using tidytext
Summing up the scores for each song and album
Turning Attrition into Retention - IBM
This project aims to transform attrition into retention gold. By blending undersampling and oversampling magic, we’re deciphering the secrets of employee loyalty.
For this project the following were carried out:
Cleaning and understanding the data, get some interesting insights
Balance the data using Oversampling and Undersampling
Evaluate the models using Recall and get feature importance
Pedidos Nulidad de Votos - 2021
After elections in Peru, there were nullities of votes. The communication of the process was confusing. A dashboard was created for better monitoring of the process, updating daily until almost 100% of the tables were reviewed.
For this project, the following were carried out:
Data extraction from a Google Sheets made by Twitter users
Cleaning data to format a data table
Simple graphs to see the status of nullities
Agenda 2030 vs Planes de Gobierno- Elecciones 2021
For the 2021 presidential elections, the candidates' government plans were evaluated. The UP School of Public Management created a comparative table of these plans with the 2030 agenda. To facilitate visualization, a radar chart was developed
For the development of this project I used:
Manual data extraction from PDF published in the 2030 agenda report by UP
Comparative groups were created on R (first places, right, left, best agenda score)
Radar chart graphics with ggradar library in R
Comparación Debate Definitivo - Elecciones 2021
In the last debate of the 2021 presidential elections in Peru, topics such as the pandemic, security, corruption and economy were discussed. Wordclouds were created by topic and candidate to simplify the comparison of proposals, and shared on Twitter.
For the development of this project I used:
Data extraction with Youtube Automatic Transcript
Data cleaning, stopword extraction with R
Graphics with wordcloud2 and ggplot2
Other previous projects
Dashboard covid Infection Huancavelica ☣️ Project's repo
Comparison of educational proposals - 2021 elections 🗳️ Project's repo
Aprobación de la Gestión de Pedro Castillo 🤠 Project's repo