
Samar Krimi
Applied Mathematics at Khaznadar High School, Tunis
Tunisia
Hi, I'm Samar Krimi!
Pre-selected – OSTX Green Ideation Bootcamp (2025 edition) at Open Startup
✨Currently a Master 2 student in Big Data and Artificial Intelligence at Paris‑Dauphine PSL, with a background that includes a PhD in Signal Processing and a Data Science certification from Coding Dojo. Additionally, I have been pre‑selected for the 2nd edition of the OSTX Green Ideation Bootcamp, an experience that will enable me to strengthen my innovation skills on high‑impact environmental projects. ♻️ Passionate about applying Artificial Intelligence to tackle environmental challenges, I specialize in Green Tech and the energy transition. My interest is particularly focused on areas with significant environmental impact. I am convinced that data analysis and AI techniques can play a key role in implementing innovative solutions for these challenges. 🤝 I am open to any opportunity—on‑site, hybrid, or remote. If you have recommendations, advice, or know of companies or startups working in these fields, I would be delighted to connect with you. 🌍 Thank you in advance for your support, and I look forward to collaborating with you!
Experience
Open Startup
Pre-selected – OSTX Green Ideation Bootcamp (2025 edition)
November 2025 - December 2025
• Innovation program focused on Green Tech and the ecological transition.
• Selected for my innovative project using bio-based solutions to enhance sustainability.
• Preparing for the final phase in November 2025, where I will further develop this solution for a positive environmental impact.
Compétences : Entrepreneurship · Green Tech
Education
Certificates & Badges
No certificates or badges added
Projects
This Kaggle's competition aims to develop a predictive model that can accurately differentiate between Alzheimer’s disease (AD) patients and healthy individuals based on handwriting data.
The evaluation is based on the ROC-AUC (Receiver Operating Characteristic - Area Under the Curve). ROC-AUC measures the performance of a binary classification model by evaluating the trade-off between the true positive rate (sensitivity) and the false positive rate (1 - specificity) across various threshold settings.
- Data Preprocessing : Features and target separation, Normalization of characteristics with StandardScaler to improve the convergence of neural network, Conversion into PyTorch tensors.
- Modelisation & Training : The Model is a network of neurons in PyTorch designed to be used as part of a binary classification, with 3 fully connected hidden layers, forward method is used & ReLU as the activation function.
- Tuning hyperparameters with GridSearch : we can specify different hyperparameters such as learning rate, epochs and batch size.
- Performance evaluation : Evaluate performance on the validation set using the AUC metric, which is used to measure classification quality, display of the Loss and the Val Loss and these 4 metrics: Accuracy, F1 Score, ROC-AUC, Confusion Matrix for each Epoch.
- Prediction : Use the best hyperparameters to make predictions on the test set using the trained model, the accuracy score was : 0.9652
Compétences : Neural Network · Deep learningCompétences : Neural Network · Deep learning
This project involves using Kafka to retrieve and process stock market data AAPL (historical data) via Yahoo Finance API in real-time. The data is extracted, cleaned, processed, and sent through a consumer pipeline.
- Real-Time Data Stream with Confluent Cloud : We successfully streamed real-time stock market data into Kafka hosted on Confluent Cloud, enabling a scalable and reliable flow of updated stock prices and trading volumes.
- Efficient Data Processing with Pandas: Using Pandas, we applied sliding window techniques and aggregation to efficiently process incoming stock data.
- Data Visualization : Processed data was visualized using Matplotlib, providing clear insights into stock trends and trading activity.
In conclusion, this project showcases how Kafka (via Confluent Cloud) and Pandas can work together to build a scalable, real-time financial data processing and visualization system.
Compétences : Real-time Visualization · Stream processing · Deployment & Git · Confluent Cloud
This project is a simulation game based on artificial intelligence algorithms, including Monte Carlo Tree Search (MCTS) and other evolutionary optimization techniques to simulate species dynamics in a changing environment.
Compétences : Reinforcement Learning · Deployment & Git · Monte-Carlo Search and Games
This project is an interactive web application for the visualisation and analysis of climate data. It allows users to upload tabular datasets (CSV) and generate dynamic visualizations based on questions asked in natural language. It uses Large Language Models (LLMs) via LangChain and Google Generative AI to analyze data and suggest visualizations, making climate trend exploration intuitive through artificial intelligence and data visualization.
Compétences : Generative IA · Deployment & Git
This projet aims to develop a generative AI solution that enables:
- The extraction and structuring of data from reports and documents related to the energy transition.
- The automatic generation of analytical reports and strategic recommendations.
- The integration of an interactive interface to visualize the results and allow users to explore the data.
The approach relies on a Retrieval-Augmented Generation (RAG) model, combining a semantic search engine and text generation to produce relevant analyses. The application will be hosted on Google Cloud Platform (GCP) and will leverage technologies such as Vertex AI, Cloud Run, and BigQuery.
Compétences : Generative IA · Advanced Machine Learning · Google Cloud Platform (GCP) · Deployment & Git
This project explores the analysis of journalistic texts on climate change using advanced Natural Language Processing (NLP) techniques. The goal is to identify and classify climate discourses through various approaches.
- Text data preprocessing (cleaning, tokenization, stemming, lemmatization).
- Sentiment analysis with BERT multilingue, revealing a predominantly negative tone on climate.
- Named Entity Recognition (NER) (countries, organizations, personalities, events) with spaCy.
- Topic modeling using BERTopic, identifying dominant themes (renewable energy, climate policies, disasters).
- Automatic classification of articles with ClimateBERT.
Languages
English
French
Italian
Skills
Applied Mathematics
Artificial Intelligence / Machine Learning Applications
Signal Processing
Applying Statistics
Green Energy