Hi, I'm Zahra

Data Scientist

About Me

I hold a Ph.D. in Physical Chemistry and am skilled in research and statistical analysis. I enjoy coding and have moved into data science to create helpful apps. I also value teamwork and building strong relationships.

Skills

Programming

Python

SQL

C

MATLAB

Data Analysis and Visualization

NumPy

pandas

matplotlib

scikit-learn

Plotly

OpenCV

OCR

Pillow

seaborn

Machine Learning

Supervised Learning

Unsupervised Learning

Scikit-Learn

Model Evaluation and Validation

Deep Learning

Natural Language Processing (NLP)

Language Model Fine-Tuning (LLM)

Transfer Learning

Convolutional Neural Networks (CNNs)

TensorFlow

Transformers

Hugging Face

Software Development and Collaboration

Code Collaboration (Git, GitHub, Bitbucket)

Web Scraping (Data Extraction, HTML Parsing, Web Data Collection, Beautiful Soup, Selenium)

Industrialization (Docker, Deployment)

My Personal Journey

Education
Work

Data Scientist Training – FullStack

Jedha, Paris, France
04/2022-07/2022

Ph.D., Physical Chemistry

Kharazmi University, Tehran, Iran
2007-2012

M.Sc., Physical Chemistry

K. N. Toosi, Tehran, Iran
2004-2006

Data Scientist

Luna, Paris, France
01/2023 - 09/2023

Researcher

Institute for Research in Fundamental Sciences, Tehran, Iran
2018-2021

Lecturer

Kharazmi University, Tehran, Iran
2012 - 2018

My Projects

Symptom Checker and Diagnosis App

NLP, AI Diagnostics, BERT Model, Medical AI
  • Developed an intuitive diagnostic platform on Hugging Face Spaces for easy symptom entry and quick feedback.

  • Achieved notable diagnostic precision, with an average F1-score of approximately 0.97.

  • Enhanced functionality to include API key input, allowing access to detailed disease information via GPT-3 integration.

  • View Code View APP

    Lung Cancer Diagnosis from CT Scan Images

    Image classification, CNN, Transfer learning, Inception
  • Developed a lung cancer diagnosis tool using chest CT scans, employing CNNs and Inception for transfer learning, enhancing the model's ability to differentiate between healthy and cancerous lungs.

  • Addressed the critical need for early lung cancer detection, with the model achieving 99.4% sensitivity, 96.3% specificity, and 80% accuracy in identifying cancer types.

  • View Code View App

    Detecting Uber pickups hotspots in NY

    Unsupervised Machine Learning, DBScan, Kmeans, Plotly
  • Proposed pickup hotspots for Uber drivers' standing locations in New York.

  • Incorporated the impact of day of the week and hour on pickup hotspots.

  • Noted significant differences in hotspots between Saturday midnight and Monday at the same time.

  • View Code

    Scraping Booking.com

    BeautifulSoup, Plotly, Scrapy, Boto3, SQLalchemy
  • Developed a travel suggestion system for 35 French cities, selecting 5 based on weather conditions and recommending top-rated hotels from Booking.com for each.

  • Displayed selected cities, hotel details (addresses, prices), and local temperatures on a map for an intuitive travel planning experience.

  • View Code

    Natural Language Processing with Disaster Tweets

    Tensorflow, Spacy, NLP, RNN, LSTM, GRU
  • Developed a model capable of accurately distinguishing between tweets related to actual disasters and those that are not, utilizing advanced NLP techniques.

  • The project encompasses three key phases: conducting Exploratory Data Analysis (EDA), preparing the data through necessary pre-processing for deep learning models, and training at least two models for the prediction task.

  • View Code

    Predict Walmart Sales

    Supervised Machine Learning, Scikit-learn, Pandas, Numpy
  • Constructed a machine learning model capable of predicting Walmart's weekly sales with high accuracy, aiding in the understanding of sales trends influenced by economic factors for more informed marketing strategies.

  • The project involves initial Exploratory Data Analysis (EDA) and data preprocessing, followed by the development of a baseline linear regression model for sales forecasting.

  • View Code

    Getaround

    Scikit-learn, FastAPI, Streamlit, Uvicorn, Mlflow, Heroku, Docker
  • Analyzed data for understanding the significance of canceled rentals in the Getaround car-sharing platform, improving service efficiency and user experience.

  • Developed and integrated a model that recommends optimal rental prices, presented through a Streamlit dashboard and deployed using Docker & Heroku, complemented by an API that connects the model with AWS and MLflow for robust deployment.

  • View Code