Fathurrahman Syarief
Data Scientist
About
I’m Fathur, a recent Data Science graduate with strong analytical and problem-solving skills. I have experience performing end-to-end data science projects, from descriptive to prescriptive analytics, on a company-wide scale. My strong communication and teamwork abilities complement my technical expertise, enabling effective collaboration and impactful results
Working Experience
Backend Web Developer
PT Padma Pramata Indopersada
- Integrated UI elements with server-side logic, enhancing app interactivity and user experience
- Increased user satisfaction and global accessibility with multi-language support in the payroll system UI
- Track KPIs: improved engagement, reduced language barriers, increased adoption rates.
Assistant Lecturer & Researcher
Universitas Airlangga
- Assisted in teaching the Algorithmic Programming subject
- Contributed to published a data science paper indexed in Scopus with a score of 3.0
Data Scientist
PT Central Artificial Intelligence
- Developed a Facebook scraper script using Selenium to collect profile’s data including posts, media, URLs, responses, timestamps, and friends' details
- Labelled 10,000 BPOM indonesian news texts and labeled 5,000 Facebook indonesian captions data
- Achieved an 85% accuracy rate for BPOM Indonesian news text and 94% for Facebook posts by fine-tuning the IndoBERT model, enhancing Central AI's sentiment analysis capabilities by 17%
- Building Facebook post and BPOM Indonesian news sentiment analysis model APIs (predict, update, and create models) using Python Flask and deployed the models on Google Cloud Run
- Achieved a 15% cost-saving optimization by forecasting Central AI's cloud compute usage with the ARIMA model, leading to more efficient resource allocation and improved operational efficiency
Bangkit Academy Cohort
Led by Google, Tokopedia, Gojek, & Traveloka
- Join as a Cloud Computing learning path
- I achieved a 92% accuracy in food image classification by developing CNN models, serving them as APIs using FastAPI, and deploying them on Google Cloud Run with Docker, resulting in efficient and scalable model implementation
- Certified as an Associate Cloud Engineer by Google
Projects
Data Analysis
- Conducted end-to-end data analysis on banking marketing targets, highlighting insights for strategic decisions
- Performed exploratory data analysis and data visualization, providing intuitive understanding of underlying patterns and trends
- Implemented various machine learning algorithms, achieving an average accuracy of ~81%, showcasing high performance and precision in model building
- Evaluated models to ensure their effectiveness and reliability, optimizing for the best performing model for predictive tasks.
Data Mining
- Built a fast Tokopedia Flash Sale Catalogue Scrapper
- Leveraged Selenium & BeautifulSoup for efficient data mining
- Optimized for speedy data collection from e-commerce sales.
Natural Language Processing
- Tweetoxicity is a program that utilizes machine learning to analyzes Twitter/X user sentiment’s throught their recents tweets or retweets, Tweetoxicity can also analyze topics or hashtags
- Scraped over 20,000 tweets, including likes, retweet counts, quotes, tweet text, and usernames across diverse topics using Selenium to create a text dataset
- Incorporated human and two pre-trained Indonesian text transformer sentiment analysis models (IndoBERT and IndoRoBERTa) for labeling 20,000 tweets text, with a final sentiment label determined using a voting mechanism
- Achieved 93% accuracy in sentiment classification (negative, neutral, positive) by fine-tuning Distilled IndoBERT on labeled Indonesian tweets, resulting in 3x faster inferencing time and a 27% reduction in model size compared to other BERT models
Computer Vision
- TRxNSLATE is a program that can read handwritten medical prescriptions by converting them into digital text and feeding them into a large language model to decipher their meaning for the nuanced audience
- Trained a YOLOv10 model, achieving 85% accuracy in detecting drug names, dosages, and instructions, and trained a TrOCR model to convert detected handwritten medical text to digital text with 98% accuracy
- Fine-tuned MedLlama2 LLM to generate prescription explanations
Python Modules
- Fast runtime Twitter data scraper includes: user, topic, and hashtag scraper
- No login/authentication is required; simply query what you need, and the program will return the output
- Built with the Selenium and bs4 libraries
Education
SMA Negeri 34 Jakarta
State School
Bachelor's Degree in Data Science Technology
Universitas Airlangga
- GPA: 3.7/4.0
- Graduated with cumlaude
Organization
Badan Pengurus Mahasiswa Teknologi Sains Data
Universitas Airlangga
- Scientific Research Staff
- Appointed as the Head of the Competition Division for the national Data Driven Analytics Competition, involving more than 35 teams.
Himpunan Mahasiswa Teknologi Sains Data
Universitas Airlangga
- Head of the Competition and Achievement Department of Research and Scientific
- Appointed as the Person in Charge of national-scale data science competitions, namely DATAQUEST, involving over 100 teams
- Successfully implemented a tutoring program that assisted fellow college students in preparing for midterm and final exams
Academic Core Team
Google Developer Student Club UNAIR
- Speaker at Fireside Chat: Python 101
- Developed and managed the machine learning path curriculum
Skills
A/B Testing
ANOVA
ARIMA
Computer Vision
Data Mining
Data Visualization
Data Preprocessing
Data Wrangling
Deep Learning
Descriptive Statistics
ETL
Git
Inferential Statistics
Large Language Models
MLOps
Model Architecture
NLP
SQL
Supervised ML
Unsupervised ML
Tools
BigQuery
Docker
Flask
FastAPI
GitHub/GitLab
Google Cloud Platforms
Hadoop
HuggingFace Transformers
Kubeflow
MATLAB
MongoDB
MySQL
PostgreSQL
Power BI
Prophet
PyTorch
SPSS
Spark
Tableau
Tensorflow
Vertex AI
Programming Languages
Python
R
MATLAB
JS
HTML
CSS