Fortune Uchechukwu Njoku
Hello! I’m a Data Science & Engineering professional with a strong foundation in machine learning, data engineering, and AI research. My journey, highlighted by a Ph.D. in Data Engineering from UPC and ULB, has been focused on developing and deploying scalable ML systems, from data ingestion and preprocessing to production deployment and monitoring. I specialize in optimizing machine learning pipelines, feature selection, and designing robust data architectures, with practical experience in tools like Python, SQL, Spark, and Google Cloud. From leading the development of data space demonstrators to applying advanced ML techniques for real-world impact, I’m dedicated to pushing the boundaries of AI, particularly in generative AI and large language models.
📍 Location: Barcelona, Spain
📧 Email: njokuuchechi@gmail.com
🌐 LinkedIn: linkedin.com/in/funjoku
Skills
- Programming Languages: Python, R
- Machine Learning & AI: Scikit-Learn, TensorFlow, PyTorch, NumPy
- Cloud & Big Data: Google Cloud Platform (GCP), BigQuery, Spark
- Databases: SQL, MongoDB, PostgreSQL
- Data Engineering: ETL pipelines, Cloud Architecture
- Tools & Libraries: Pandas, Matplotlib, Plotly, Jupyter, Git
Projects
🧠 Generative AI Model for Text Generation
Created a text-generation model using GPT-3, exploring the potential of generative AI in real-world applications like chatbots, content creation, and more.
- Tools Used: OpenAI GPT-3, Python, Transformers
- Key Learnings: Fine-tuned a pre-trained model, explored RAG (retrieval-augmented generation), and improved response quality for specific domains.
- View Project on GitHub
🔍 Feature Selection for Machine Learning
Implemented a variety of feature selection techniques to optimize machine learning model performance and reduce computational overhead.
- Tools Used: Scikit-Learn, Pandas, Python
- Key Learnings: Comparison of filter, wrapper, and embedded methods, identifying trade-offs in accuracy vs. complexity.
- View Project on GitHub
📊 Data Pipeline for Geospatial Data Analysis
Built a scalable data pipeline for processing large geospatial datasets, integrating various dimensionality reduction methods to improve data insights.
- Tools Used: GCP, Python, Pandas, BigQuery
- Key Learnings: Created a robust pipeline that reduced computational costs by 30% through efficient feature engineering and model simplification.
- View Project on GitHub
🌐 AI-Based Water Level Prediction System
Developed an AI model using LSTMs to predict water levels and mitigate flooding risks.
- Tools Used: TensorFlow, Python, LSTM, Time Series Data
- Key Learnings: Deployed the model in a cloud environment, ensuring it could scale and provide real-time predictions.
- View Project on GitHub
Education
Ph.D. in Data Engineering
UPC (Spain) & ULB (Belgium)
Expected September 2025
Thesis: Towards Effective and Interpretable Many-Objective Feature Selection in Machine Learning
M.Sc. in Big Data Management and Analytics
ULB (Belgium), UPC (Spain), & TU/e (Netherlands)
Thesis: A Study on the Impact of Feature Selection on Data Analysis
M.Sc. in Computer Science
African University of Science and Technology
Thesis: Text Mining of Twitter Data: Topic Modelling (NLP)
B.Sc. in Computer Science & Mathematics
University of Nigeria
Research Interests
- Generative AI and Large Language Models (LLMs)
- Feature Selection techniques for high-dimensional data
- AI for Real-World Applications (e.g., climate change prediction, healthcare, urban planning)
- Optimization of ML models for better efficiency and scalability
Publications
- Njoku, Uchechukwu F., Abelló, A., Bilalli, B., & Bontempi, G. (2025). On many-objective feature selection and the need for interpretability. Expert Systems with Applications, 267, 126191.
- Njoku, Uchechukwu Fortune, Abelló Gamazo, A., Bilalli, B., & Bontempi, G. (2024). Finding relevant information in big datasets with ML. Proceedings 27th International Conference on Extending Database Technology (EDBT 2024): Paestum, Italy, March 25-March 28, 846–849. OpenProceedings.
- Njoku, Uchechukwu F., Abelló, A., Bilalli, B., & Bontempi, G. (2023). A data-science pipeline to enable the Interpretability of Many-Objective Feature Selection. arXiv Preprint arXiv:2311. 18746.
- Njoku, Uchechukwu Fortune, Abelló Gamazo, A., Bilalli, B., & Bontempi, G. (2023). Wrapper methods for multi-objective feature selection. 26th International Conference on Extending Database Technology (EDBT 2023): Ioannina, Greece, March 28-March 31: Proceedings, 697–709. OpenProceedings.
- Njoku, Uchechukwu Fortune, Abelló Gamazo, A., Bilalli, B., & Bontempi, G. (2022). Impact of filter feature selection on classification: an empirical study. Proceedings of the 24rd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP): Co-Located with the 24th International Conference on Extending Database Technology and the 24th International Conference on Database Theory (EDBT/ICDT 2022): Regne Unit, March 29, 2022, 71–80. CEUR-WS. org.
- Fahland, D., Abello, A., & Bilalli, B. (2021). A Study on the Impact of Feature Selection on Data Analysis.
- Njoku, Uchechukwu Fortune. (2019). Text Mining of Twitter Data: Topic Modelling.
Blogs
Next Steps:
- Feel free to reach out if you have any questions or want to collaborate on research projects, job opportunities, or AI initiatives!
- Keep Learning: I’m always updating my skills and exploring new technologies, so check back for future projects and collaborations!