Mounir Salam

Junior Data Engineer at Norna AB

Bioinformatics Engineering at Lebanese American University

Lebanon

Hi, I'm Mounir Salam!

Junior Data Engineer at Norna AB

Data Engineer Specialized in architecting high-volume ETL and web scraping pipelines. Experienced in deploying and managing data ingestion for 200+ diverse web sources using Python, MongoDB and PostgreSQL. Adept at optimizing data quality and deploying scalable cloud solutions within GCP. Committed to building resilient end-to-end pipelines that drive data-informed decision-making.

Socials

Experience

Education

Certificates & Badges

No certificates or badges added

Projects

Log Data ETL & Analytics Engine

•https://github.com/Mounir-Salam/ETL-and-Analysis-Pipeline-for-Access-Logs-using-PySpark

Developer

A Big Data processing tool built to handle large-scale server access logs. This project demonstrates the ability to transform unstructured "messy" logs into high-performance analytical formats.

Key Technical Achievements:

Big Data Processing: Utilized PySpark to build a modular ETL tool capable of processing large-scale datasets across distributed clusters.
Storage Optimization: Implemented Parquet serialization to reduce storage footprint and improve query performance for downstream analytics.
Schema Evolution: Developed logic to parse unstructured log strings into structured schemas for seamless ingestion into MongoDB.
Workflow Automation: Currently implementing Apache Airflow to manage the pipeline's DAGs, ensuring robust scheduling and monitoring.

Scalable Legal Data Ingestion Engine

•https://github.com/Mounir-Salam/Extraction-and-Storage-of-Legal-Data-using-Scrapy-Framework

Developer

Developed a web scraping and storage system designed to handle complex legal data across hundreds of domains. The project focuses on "Configuration-over-Code" to ensure long-term maintainability.

Key Technical Achievements:

High-Throughput Ingestion: Leveraged Scrapy and multi-threading to parallelize data collection, significantly increasing ingestion speed.
Dynamic Scaling: Designed the system to be configuration-driven via JSON files, allowing new sources to be added without modifying the core codebase.
Hybrid Storage Architecture: Implemented a dual-layer strategy using MongoDB for metadata and MinIO (S3-compatible) for document persistence, ensuring efficient retrieval of both structured and unstructured data.
Modern DevOps: Fully containerized with Docker and currently migrating to GCP with Dagster for advanced orchestration.

Languages

Arabic

Native

English

Native

Skills

Docker

Kubernetes

SQL

Machine Learning

Data Engineering

Python

Bash

Java

Apache Spark

MongoDB

Google Cloud

PostgreSQL

Scraping

R Language

Linux

ETL