Close

Gonzalo

Hernandez-Muñoz

Data Scientist

LinkedIn

About Me

I have extensive knowledge within machine learning, data science and software engineering fields. I have worked in projects in the healthcare sector covering all parts of the development process, including: data collection, data exploration (EDA), feature selection, feature engineering, model validation, model analysis (metrics), MLOps.

Furthermore, I have a comprehensive understanding of machine learning models like Random Forest, Ensemble methods, SVM, Gaussian Processes, Neural Networks.

During my time working at IIC I developed the necessary skills to collaborate with clients from the healthcare industry, designing and implementing data pipelines and models which allowed them to gain valuable insights from their data.

Previously, I was a machine learning research scientist at UAM were I acquired a deep understanding of machine learning models, especially probabilistic models. The research culminated with the development of a new probabilistic deep learning method which was published in The European Conference on Machine Learning (ECML) 2019.

I moved to the US as a Green Card holder and I'm looking for challenging opportunities with a major research component that allows me to explore data and find new and exciting applications. I love taking time to make my code and projects efficient and beautiful (I am a PEP8 fanboy) and I enjoy automating every single aspect of CI and CD pipelines.

Experience

Instituto de ingeniería del conocimiento (IIC)

Data Scientist

As a data scientist in the Health and Energy Predictive Analytics department, I carried out a wide range of Data Science and Data Engineering tasks. Some of the most important projects that I participated in:

• Bisepro

Bisepro is a product developed in collaboration with the Public Health Service of the Balearic Government. We worked with hospitals in Spain to collect EHR data from patients to build a predictive model to detect sepsis. The objective was to beat the hospital's rule-based system which produced a substantial amount of False Positives.

Furthermore, we developed an ETL capable of handling real time data from multiple sources within the hospital. Part of the architecture is described publicly (ELK Stack) while part of it is under NDA.

The project was completed and deployed at "Hospital Universitario Son Llàtzer" (Mallorca), obtaining 0.95 AUC and successfully increasing the Specificity by 21% versus the previous system. It was later expanded to other hospitals.

• Bisepro - COVID

As the pandemic hit, we started a project to detect respiratory failure due to COVID-19. This required to build a model with a small amount of data using the same sources as the original Bisepro. We built an ensemble that also included rule-based models achieving 0.958 AUC. Due to the low amount of data the model had high specificity but low sensitivity. The model was deployed at the "Hospital Universitario 12 de Octubre" in Madrid (one of the biggest in Spain).


The research and implementation was published in the paper:
Pedrera Jiménez, Miguel & Cruz-Rojo, Jaime & Martínez, Alvar & Hernandez-Munoz, Gonzalo & Serrano-Garcia, A & Calle-Romero, M & Garcia-Hinojosa, C & Serrano-Balazote, Pablo. (2022). DETECCIÓN PRECOZ DE COMPROMISO RESPIRATORIO EN COVID-19 MEDIANTE IA: PROCESO DE IMPLANTACIÓN EN HCE.

• Project under NDA

This project, requested by one of the biggest health insurance companies in Spain, reduced the wait time for patients that requested medical procedures. I was the leading data scientist of the project and in charge (along with the technical leader of the department) of taking the right decisions to complete it. This model was built upon million of rows of data from past authorized and denied procedures and entailed considerable difficulty. Some of the problems we faced:

  • A highly imbalanced dataset with almost 95% of the records belonging to just one class.
  • Millions of data points (sometimes reaching 100k daily rows).
  • Most variables were categorical variables and many of them had a high cardinality which prevented us from using widely used techniques (as one hot encoder). We researched and applied creative techniques such a Target Encoder variations.
  • Data was geographically correlated with underpopulated areas of Spain having patterns that differed from big cities but at the same time without enough data to construct models for those areas.
• Project under NDA

A small company that was developing a model for the automatic recognition of early stage colorectal cancer from blood tests requested guidance and consulting services to improve their results.

The main difficulties with the dataset were that the number of variables (columns) was too high but there were not enough data points (rows). Additionally, the dataset was divided in cohorts and each cohort had different processes to handle the blood samples which produced different distributions for each cohort.

We wrote a report with recommendations, most notably, we suggested new train and validation techniques that trained the model with some of the cohorts but tested the model with cohorts that were not used previously. That way we were able to guarantee that the chosen model was generalizing properly instead of overfitting to some of the cohorts (the one with more data).

See also:
https://cordis.europa.eu/project/id/957099
https://amadix.com/prevecol/
https://cordis.europa.eu/article/id/435694-non-invasive-test-for-colorectal-cancer-detection

• Project under NDA

We built a collection of tools to perform NER over EHR text from physicians. The tool was able to extract HPO terms from a given text input. This helped doctors to perform diagnoses on patients with rare diseases by grouping patients with similar terms. It also allowed to perform semantic search on their unstructured data. In addition, the HPO ontology can be connected to other ontologies such as OMIM and ORPHA ontologies enabling genetics research.


Some of the languages and tools that I used in projects at IIC: Python, Kafka, Spacy, Flask, Docker, AWS, Elasticsearch, ELK Stack, MongoDB, PostgreSQL (and Oracle), Swagger, DVC, ClearML.

Universidad Autónoma de Madrid (UAM)

Machine Learning Research Scientist

I developed a new machine learning method for Deep Gaussian Processes (DGPs) using Monte Carlo methods and approximate inference techniques.

In order for this new method to be suitable for big data problems I implemented one of the sparse approximations for GPs, reducing the computational complexity.

I also designed and managed the experiments pipeline to be able to compare it with other state of the art models.


The resulting paper was accepted and published in the European Conference on Machine Learning (ECML 2020).
Another part of my research included presenting papers in group seminars about the current state of the art for probabilistic methods.


During this time I worked with (some of) the following technologies and frameworks: Python, Sacred, MLflow MongoDB, Tensorflow, Keras, Scikit-learn, Theano, gpflow.

BBVA

Full stack developer

I designed and implemented a web-based application to analyze and control statistics and KPIs about internal data from the company.

I also developed a Python desktop application to analyze, parse and process information from multiple auto-generated reports from MicroStrategy.


I used these technologies: Python, JavaScript, PHP, SQL Server, Google Script.

Xceed

Information security audit

I found some critical vulnerabilities in an Android mobile application related to user payments. I elaborated a report recommending adequate actions to solve the security bug and presented it to the CEO and CTO.

I performed a man in the middle attack by connecting a virtualized phone (with Android Studio) to my laptop using a proxy server (Burp/OWASP/Fiddler). The connection was encrypted (https) but installing some self-signed certificates into the phone allowed me to decrypt the messages being sent from the phone to the Xceed server. I discover that some of the essential information to perform a transaction (buy concert tickets) were being stored and sent from the phone. Editing the messages in my laptop allowed the attacker to buy tickets for any concert at no cost.

Fundacion Habaneras TPD

Web designer/developer

I was in charge of the front-end and back-end design and development of a wordpress (PHP) website for a non-profit organization. This included testing and maintenance.

Education

Universidad Autónoma de Madrid

September 2017 - December 2018

Master of Science in Research and Innovation in ICT (I2-ICT)

Advanced and specialized training for careers that require a high level of technical skills within the ICT field, including: numerical and data-intensive computing, management and direction of scientific and technological projects, machine learning and computer vision. Major: Computational Intelligence.

Universidad Autónoma de Madrid

September 2012 - February 2017

Bachelor of Science in Computer Science

Obtained a degree focused on both theoretical and practical aspects of computer science, including: CPU architectures, compilers, parallel computing, programming (high and low level), software development cycle, etc.

I.E.S Joan Miró

2012

Junior and Senior year, Technology and Science High School

Projects

Master's Thesis

I created a new machine learning method capable of modeling complex problems. The thesis is titled: "Deep Gaussian Processes using Expectation Propagation and Monte Carlo Methods". It is an extension of Gaussian Processes to the multi layer case (as in neural networks). Experiments show that the performance is comparable to that of the state of the art methods in the field.

Thesis text (.pdf). Presentation (.pdf) Source code available in github.

Cheap Flights

Application that parsed daily flights from selected airports and helped choosing the best dates to flight.
Built under NodeJS + Express and AngularJs

Bachelor Dissertation

My end of degree dissertation consisted on adding new functionality to an e-learning system that allows students to code compile and test C programs online. The platform did not have a module to add and edit content. I was in charge of building the module from scratch ensuring that it was scalable and the integration was successful. Built using PHP + Laravel (MVC design).

Internal management tool for BBVA

Design, implement and test a web application for BBVA that allowed managing contracts, work teams, budgets, projects and other internal department data.
The project had to face difficult task including (but not limited to) handling complex data structures, parsing big data files and building a security module to manage the different access level of the employees of the bank. Built using PHP, Google Script and Python.

Youtube Watch history

An attempt to improve the youtube video history of your account, it had features like searching, filtering and much more. Sadly, on 2016 Google disabled the youtube history API, which forced me to shut down the project.

Fundacion de habaneras

My first project, a webpage for a non-profit organization, built under wordpress and custom PHP/CSS.

View Project

Skills

Get in Touch