AI/ML Data Engineer · PhD in Computing · Researcher
Based in Barcelona, I design and ship production-ready AI/ML systems spanning classical machine learning, Generative AI (LLMs, RAG), and full-stack MLOps. I recently completed a PhD at UPC & ULB on many-objective feature selection, fairness, and interpretability in ML. I'm passionate about turning research insights into real-world solutions—particularly in healthcare and life sciences.
What I work with
Where I've worked
What I've built
Predictive maintenance ML system for solar panel performance forecasting. Full MLOps pipeline with Airflow orchestration, MLflow experiment tracking, and containerised AWS deployment for proactive renewable energy management.
View on GitHubGenAI application using RAG architecture with semantic search. Achieved 8.58/10 LLM performance score across 140 evaluated prompts. Deployed with Docker and a real-time monitoring dashboard.
View on GitHubReAct agent-powered assistant for healthcare professionals using multi-source RAG architecture. Integrates a Chroma vectorstore, live clinical trials API, and a web-search fallback via LangChain for comprehensive drug information retrieval.
View on GitHubProduction ML system for healthcare insurance cost prediction. Benchmarked 8 ML algorithms, then deployed the best-performing model as a Flask REST API via Docker on AWS with automated monitoring.
View on GitHubCustomer risk assessment system using an XGBoost classifier. Processed multi-table datasets with feature engineering, achieved 0.64 AUC, and deployed as an interactive Streamlit application for real-time loan default probability prediction.
View on GitHubMulti-stage Medallion pipeline to automate ingestion and transformation of multi-year Medicare SynPUF datasets from federal sources. Engineered a memory-efficient streaming engine to transfer ZIP files directly to GCS, bypassing local disk constraints, and developed a dbt-powered transformation layer in BigQuery to union disparate claim datasets. Culminated in a Looker dashboard analysing longitudinal patient spending and the impact of chronic conditions on medical costs across demographic segments.
View on GitHubResearch output
Academic background
Thesis: Towards Effective and Interpretable Many-Objective Feature Selection in Machine Learning
Writing
Let's connect