
Mounir Salam
Junior Data Engineer at Norna AB
Bioinformatics Engineering at Lebanese American University
Lebanon
Hi, I'm Mounir Salam!
Junior Data Engineer at Norna AB
Data Engineer Specialized in architecting high-volume ETL and web scraping pipelines. Experienced in deploying and managing data ingestion for 200+ diverse web sources using Python, MongoDB and PostgreSQL. Adept at optimizing data quality and deploying scalable cloud solutions within GCP. Committed to building resilient end-to-end pipelines that drive data-informed decision-making.
Experience
Norna AB
Junior Data Engineer
February 2024 - Present
Architected and maintained scalable ETL pipelines for 200+ web sources, leveraging Python and MongoDB to ensure high-availability data ingestion.
Implemented automated data quality checks and validation logic using Tableau and SQL, reducing data discrepancies.
Standardized onboarding and training documentation for new hires, accelerating team ramp-up time and ensuring code consistency across scraping microservices.
Research Fellow
January 2023 - December 2023
Engineered custom Python and R scripts to automate the extraction and processing of ~250 protein sequences, identifying key structural patterns for drug design.
Managed the end-to-end data lifecycle for a research initiative, including data collection, cleaning, and normalization of complex biochemical variables, translating complex biological data into actionable insights.
Education
Bioinformatics Engineering
Lebanese American University
Graduated in 2023
Certificates & Badges
No certificates or badges added
Projects
Log Data ETL & Analytics Engine
•https://github.com/Mounir-Salam/ETL-and-Analysis-Pipeline-for-Access-Logs-using-PySparkDeveloper
A Big Data processing tool built to handle large-scale server access logs. This project demonstrates the ability to transform unstructured "messy" logs into high-performance analytical formats.
Key Technical Achievements:
- Big Data Processing: Utilized PySpark to build a modular ETL tool capable of processing large-scale datasets across distributed clusters.
- Storage Optimization: Implemented Parquet serialization to reduce storage footprint and improve query performance for downstream analytics.
- Schema Evolution: Developed logic to parse unstructured log strings into structured schemas for seamless ingestion into MongoDB.
- Workflow Automation: Currently implementing Apache Airflow to manage the pipeline's DAGs, ensuring robust scheduling and monitoring.
Scalable Legal Data Ingestion Engine
•https://github.com/Mounir-Salam/Extraction-and-Storage-of-Legal-Data-using-Scrapy-FrameworkDeveloper
Developed a web scraping and storage system designed to handle complex legal data across hundreds of domains. The project focuses on "Configuration-over-Code" to ensure long-term maintainability.
Key Technical Achievements:
- High-Throughput Ingestion: Leveraged Scrapy and multi-threading to parallelize data collection, significantly increasing ingestion speed.
- Dynamic Scaling: Designed the system to be configuration-driven via JSON files, allowing new sources to be added without modifying the core codebase.
- Hybrid Storage Architecture: Implemented a dual-layer strategy using MongoDB for metadata and MinIO (S3-compatible) for document persistence, ensuring efficient retrieval of both structured and unstructured data.
- Modern DevOps: Fully containerized with Docker and currently migrating to GCP with Dagster for advanced orchestration.
Languages
Arabic
Native
English
Native
Skills
Docker
Kubernetes
SQL
Machine Learning
Data Engineering
Python
Bash
Java
Apache Spark
MongoDB
Google Cloud
PostgreSQL
Scraping
R Language
Linux
ETL