Author

Fathurrahman Syarief

Data Scientist

About

I’m Fathur, a recent Data Science graduate with strong analytical and problem-solving skills. I have experience performing end-to-end data science projects, from descriptive to prescriptive analytics, on a company-wide scale. My strong communication and teamwork abilities complement my technical expertise, enabling effective collaboration and impactful results

Working Experience

Jun - Dec 2020

Backend Web Developer

PT Padma Pramata Indopersada

  • Integrated UI elements with server-side logic, enhancing app interactivity and user experience
  • Increased user satisfaction and global accessibility with multi-language support in the payroll system UI
  • Track KPIs: improved engagement, reduced language barriers, increased adoption rates.
Jan - Jul 2022

Assistant Lecturer & Researcher

Universitas Airlangga

  • Assisted in teaching the Algorithmic Programming subject
  • Contributed to published a data science paper indexed in Scopus with a score of 3.0
Aug - Dec 2022

Data Scientist

PT Central Artificial Intelligence

  • Developed a Facebook scraper script using Selenium to collect profile’s data including posts, media, URLs, responses, timestamps, and friends' details
  • Labelled 10,000 BPOM indonesian news texts and labeled 5,000 Facebook indonesian captions data
  • Achieved an 85% accuracy rate for BPOM Indonesian news text and 94% for Facebook posts by fine-tuning the IndoBERT model, enhancing Central AI's sentiment analysis capabilities by 17%
  • Building Facebook post and BPOM Indonesian news sentiment analysis model APIs (predict, update, and create models) using Python Flask and deployed the models on Google Cloud Run
  • Achieved a 15% cost-saving optimization by forecasting Central AI's cloud compute usage with the ARIMA model, leading to more efficient resource allocation and improved operational efficiency
Aug - Dec 2023

Bangkit Academy Cohort

Led by Google, Tokopedia, Gojek, & Traveloka

  • Join as a Cloud Computing learning path
  • I achieved a 92% accuracy in food image classification by developing CNN models, serving them as APIs using FastAPI, and deploying them on Google Cloud Run with Docker, resulting in efficient and scalable model implementation
  • Certified as an Associate Cloud Engineer by Google

Projects

2020
Banking Marketing Targets

Data Analysis

  • Conducted end-to-end data analysis on banking marketing targets, highlighting insights for strategic decisions
  • Performed exploratory data analysis and data visualization, providing intuitive understanding of underlying patterns and trends
  • Implemented various machine learning algorithms, achieving an average accuracy of ~81%, showcasing high performance and precision in model building
  • Evaluated models to ensure their effectiveness and reliability, optimizing for the best performing model for predictive tasks.
2021
Tokopedia Flash Sale Catalogue Scrapper

Data Mining

  • Built a fast Tokopedia Flash Sale Catalogue Scrapper
  • Leveraged Selenium & BeautifulSoup for efficient data mining
  • Optimized for speedy data collection from e-commerce sales.
2022
Tweetoxicity

Natural Language Processing

  • Tweetoxicity is a program that utilizes machine learning to analyzes Twitter/X user sentiment’s throught their recents tweets or retweets, Tweetoxicity can also analyze topics or hashtags
  • Scraped over 20,000 tweets, including likes, retweet counts, quotes, tweet text, and usernames across diverse topics using Selenium to create a text dataset
  • Incorporated human and two pre-trained Indonesian text transformer sentiment analysis models (IndoBERT and IndoRoBERTa) for labeling 20,000 tweets text, with a final sentiment label determined using a voting mechanism
  • Achieved 93% accuracy in sentiment classification (negative, neutral, positive) by fine-tuning Distilled IndoBERT on labeled Indonesian tweets, resulting in 3x faster inferencing time and a 27% reduction in model size compared to other BERT models
2023
TRxNSLATE

Computer Vision

  • TRxNSLATE is a program that can read handwritten medical prescriptions by converting them into digital text and feeding them into a large language model to decipher their meaning for the nuanced audience
  • Trained a YOLOv10 model, achieving 85% accuracy in detecting drug names, dosages, and instructions, and trained a TrOCR model to convert detected handwritten medical text to digital text with 98% accuracy
  • Fine-tuned MedLlama2 LLM to generate prescription explanations
2023
nitter-harvest: Twitter Scrapper Modules

Python Modules

  • Fast runtime Twitter data scraper includes: user, topic, and hashtag scraper
  • No login/authentication is required; simply query what you need, and the program will return the output
  • Built with the Selenium and bs4 libraries

Education

2017 - 2020

SMA Negeri 34 Jakarta

State School

2020 - 2024

Bachelor's Degree in Data Science Technology

Universitas Airlangga

  • GPA: 3.7/4.0
  • Graduated with cumlaude

Organization

Mar - Dec 2022

Badan Pengurus Mahasiswa Teknologi Sains Data

Universitas Airlangga

  • Scientific Research Staff
  • Appointed as the Head of the Competition Division for the national Data Driven Analytics Competition, involving more than 35 teams.
Jan - Dec 2023

Himpunan Mahasiswa Teknologi Sains Data

Universitas Airlangga

  • Head of the Competition and Achievement Department of Research and Scientific
  • Appointed as the Person in Charge of national-scale data science competitions, namely DATAQUEST, involving over 100 teams
  • Successfully implemented a tutoring program that assisted fellow college students in preparing for midterm and final exams
Sep - Dec 2023

Academic Core Team

Google Developer Student Club UNAIR

  • Speaker at Fireside Chat: Python 101
  • Developed and managed the machine learning path curriculum

Skills

A/B Testing

ANOVA

ARIMA

Computer Vision

Data Mining

Data Visualization

Data Preprocessing

Data Wrangling

Deep Learning

Descriptive Statistics

ETL

Git

Inferential Statistics

Large Language Models

MLOps

Model Architecture

NLP

SQL

Supervised ML

Unsupervised ML

Tools

BigQuery

Docker

Flask

FastAPI

GitHub/GitLab

Google Cloud Platforms

Hadoop

HuggingFace Transformers

Kubeflow

MATLAB

MongoDB

MySQL

PostgreSQL

Power BI

Prophet

PyTorch

SPSS

Spark

Tableau

Tensorflow

Vertex AI

Programming Languages

Python

R

MATLAB

JS

HTML

CSS

Contact