Fortune Uche Njoku

Fortune Uche Njoku

AI/ML Data Engineer  ·  PhD in Computing  ·  Researcher

Based in Barcelona, I design and ship production-ready AI/ML systems spanning classical machine learning, Generative AI (LLMs, RAG), and full-stack MLOps. I recently completed a PhD at UPC & ULB on many-objective feature selection, fairness, and interpretability in ML. I'm passionate about turning research insights into real-world solutions—particularly in healthcare and life sciences.

Skills & Technologies

AI & Machine Learning

Supervised ML Unsupervised ML Generative AI LLMs RAG Scikit-learn XGBoost NumPy Pandas LangChain MLxtend jMetalPy

MLOps & Data Ops

MLflow Airflow Docker Kubernetes GitLab CI/CD Terraform Git Shell Scripting Eclipse EDC

Data & Cloud

GCP AWS BigQuery HDFS MinIO Delta Lake PostgreSQL MongoDB ChromaDB Elasticsearch

Programming & Visualization

Python SQL Apache Spark FastAPI Flask Streamlit Plotly Dash Looker Studio

Work Experience

Universitat Politècnica de Catalunya Barcelona, ES
Data Engineer
Mar 2025 – Present
  • Led development of a data space demonstrator in the Agro domain using Eclipse EDC, deployed in a Kubernetes cluster with infrastructure provisioned via Terraform for secure, trusted data exchange.
  • Built a distributed data infrastructure with HDFS and MinIO for storage and Delta Lake for processing; containerized with Docker and a REST API via FastAPI supporting 30+ participating entities.
Lecturer
Mar 2025 – Present
  • Delivered master's-level lectures in Data Warehousing and Big Data Management, covering ETL, NoSQL databases (HBase, MongoDB), MapReduce, and Apache Spark.
  • Guided students through hands-on lab exercises and assessed project deliverables.
Doctoral Researcher
Oct 2021 – Feb 2025
  • Conducted large-scale benchmarking of filter-based and many-objective feature selection algorithms across diverse OpenML datasets (finance, healthcare, criminology) using Scikit-learn, MLxtend, and jMetalPy.
  • Developed a novel many-objective feature selection framework using the NSGA-III algorithm, integrating fairness metrics (demographic parity) to reduce bias and enhance interpretability—published in the ESWA journal.
  • Validated results via paired t-tests, Wilcoxon, and sign tests across 50 participants; visualised outputs with Looker Studio dashboards.
  • Led research seminars, mentored MSc students in ML and big data, and presented findings at 3+ international conferences.
Orange Brussels, BE
Data Scientist
Apr 2023 – Dec 2023
  • Applied feature selection methods to optimise XGBoost models in Vertex AI Model Registry on GCP, reducing input features by 50% and compute costs by 30% while maintaining predictive performance (AUC).
  • Used PCA and K-Means clustering to generate compressed geo-profile features, boosting model interpretability and accuracy.
ehealth4everyone Abuja, NG
Python Data Scientist
May 2019 – Jul 2019
  • Conducted EDA on healthcare datasets using Pandas and Matplotlib to identify trends and actionable insights for health interventions.
  • Built a Plotly Dash dashboard for health KPI visualisation and real-time clinical decision support.
  • Migrated internal datasets from Google Sheets to BigQuery, improving storage scalability and analytical efficiency.
  • Automated weekly reporting with Python scripts that queried PostgreSQL and sent formatted results to stakeholders by email.

Projects

☀️ Solar Efficiency Forecast System

Predictive maintenance ML system for solar panel performance forecasting. Full MLOps pipeline with Airflow orchestration, MLflow experiment tracking, and containerised AWS deployment for proactive renewable energy management.

MLflow Airflow AWS S3 Docker Streamlit
View on GitHub

✈️ AI-Powered Travel Itinerary Planner

GenAI application using RAG architecture with semantic search. Achieved 8.58/10 LLM performance score across 140 evaluated prompts. Deployed with Docker and a real-time monitoring dashboard.

OpenAI GPT-4 RAG Elasticsearch Docker
View on GitHub

💊 AI-Assistant for Drug Insights

ReAct agent-powered assistant for healthcare professionals using multi-source RAG architecture. Integrates a Chroma vectorstore, live clinical trials API, and a web-search fallback via LangChain for comprehensive drug information retrieval.

LangChain OpenAI GPT-4 RAG ChromaDB Streamlit
View on GitHub

🏥 Medical Cost Prediction System

Production ML system for healthcare insurance cost prediction. Benchmarked 8 ML algorithms, then deployed the best-performing model as a Flask REST API via Docker on AWS with automated monitoring.

Random Forest Flask Docker AWS
View on GitHub

🏦 Loan Default Predictor

Customer risk assessment system using an XGBoost classifier. Processed multi-table datasets with feature engineering, achieved 0.64 AUC, and deployed as an interactive Streamlit application for real-time loan default probability prediction.

XGBoost Scikit-learn Streamlit Feature Engineering
View on GitHub

🏥 Medicare Provider Intelligence Hub

Multi-stage Medallion pipeline to automate ingestion and transformation of multi-year Medicare SynPUF datasets from federal sources. Engineered a memory-efficient streaming engine to transfer ZIP files directly to GCS, bypassing local disk constraints, and developed a dbt-powered transformation layer in BigQuery to union disparate claim datasets. Culminated in a Looker dashboard analysing longitudinal patient spending and the impact of chronic conditions on medical costs across demographic segments.

Airflow 3 BigQuery GCS dbt Docker Python Looker
View on GitHub

Publications

  1. Njoku, U. F., Abelló, A., Bilalli, B., & Bontempi, G. (2025). Towards fair machine learning using many-objective feature selection. Applied Soft Computing.
  2. Njoku, U. F., Abelló, A., Bilalli, B., & Bontempi, G. (2025). On many-objective feature selection and the need for interpretability. Expert Systems with Applications.
  3. Njoku, U. F., Abelló, A., Bilalli, B., & Bontempi, G. (2024). Finding relevant information in big datasets with ML. Proceedings of EDBT 2024, Paestum, Italy.
  4. Njoku, U. F., Abelló, A., Bilalli, B., & Bontempi, G. (2023). Wrapper methods for multi-objective feature selection. Proceedings of EDBT 2023, Ioannina, Greece.
  5. Njoku, U. F., Abelló, A., Bilalli, B., & Bontempi, G. (2023). A data-science pipeline to enable the interpretability of many-objective feature selection. arXiv preprint arXiv:2311.18746.
  6. Njoku, U. F., Abelló, A., Bilalli, B., & Bontempi, G. (2022). Impact of filter feature selection on classification: an empirical study. Proceedings of DOLAP 2022, co-located with EDBT/ICDT 2022, Edinburgh, UK.

Education

Ph.D. in Computing
UPC & ULB
Completed February 2026

Thesis: Towards Effective and Interpretable Many-Objective Feature Selection in Machine Learning

M.Sc. in Big Data Management & Analytics
ULB, UPC & TU/e
2021
M.Sc. in Computer Science
African University of Science and Technology
2019
B.Sc. in Computer Science & Mathematics
University of Nigeria
2014

Blog Articles

Medium
Introduction to Machine Learning: A Beginner's Guide
Read article
Medium
An Intro to Large Language Models (LLMs)
Read article
Medium
Unlocking the Power of Vector Databases for RAG Systems
Read article
Medium
Demystifying MLOps: A Beginner's Guide
Read article

Contact

Barcelona, Spain
(+34) 631-409-269