Data Science & ML — IIT Madras

I build the pipeline before the prediction.

Tushar Kumar Singh — Data Science undergraduate at IIT Madras with experience in Python, SQL, Machine Learning, predictive modeling, and data analytics. I approach every problem as a structured pipeline: ingest → clean → engineer → model → evaluate. Currently expanding into AI Engineering, LLM APIs, Prompt Engineering, and Retrieval-Augmented Generation (RAG).

Stage 02 — Process

About & education

The cleaning stage: where raw background turns into something structured enough to build on.

I'm a Data Science undergraduate at IIT Madras (B.Sc. in Programming & Data Science, Machine Learning & Data Science track), building a solid base in Python, SQL, and applied machine learning.

Most of my project work follows the same shape: ingest the data, clean and explore it, engineer features, select and tune a model, and evaluate it honestly — with fixed seeds so results are reproducible. That same workflow discipline is what I'm now applying to LLM APIs, prompt design, and Retrieval-Augmented Generation, with self-directed work on the OpenAI, Gemini, and Claude APIs.

I'm comfortable with relational databases and structured data — schema design, joins, window functions, query optimization — which I think of as the backend that AI applications and chatbots ultimately have to query against. English proficiency: C1+.

B.Sc. in Programming & Data Science
Indian Institute of Technology Madras — Online Degree Programme
2022 – Present · Expected Graduation: 2027
  • Data Structures & Algorithms (Python)
  • Database Management Systems (SQL)
  • Machine Learning Foundations
  • Statistics for Data Science
  • Linear Algebra & Calculus
Stage 03 — Features

Skills & tools

What goes into the model: the languages, libraries, and habits I draw on for every project.

Languages

Python SQL

ML & Data Science

Pandas NumPy SciPy Scikit-learn Matplotlib Seaborn Regression Classification Clustering Feature Engineering

Databases

MySQL SQLite Schema Design Window Functions Query Optimization

Tools

Jupyter Git / GitHub Google Colab Excel Tableau

⚡ Currently building (self-directed)

LLM APIs — OpenAI Gemini Claude Prompt Design RAG Vector Databases
Stage 03.5 — Validation

Certifications & Experience

Verified learning and practical experience beyond coursework.

2025 Forage

Deloitte Australia Data Analytics Job Simulation

Completed data analysis and forensic technology tasks, created interactive dashboards using Tableau, classified business data in Excel, and delivered data-driven business recommendations.

Excel Tableau Data Analytics Business Intelligence
2024 IIT Madras

Programming in Python — A Grade

Achieved an A grade in Programming in Python through IIT Madras coursework, covering data structures, algorithms, problem solving, and software fundamentals.

Python Algorithms Problem Solving
Stage 04 — Model

Projects

Where the pipeline produces something testable — four projects, end to end.

2024 Personal project

Customer Churn Prediction

An end-to-end ML pipeline from data ingestion through cleaning, EDA, feature engineering, model selection, and evaluation. Logistic Regression and Random Forest classifiers, with results made reproducible by fixing random seeds.

PandasScikit-learnSeabornMatplotlib
2023 DBMS course project

Database Design & Query Optimization

Designed a normalized (3NF) relational schema for a simulated e-commerce system covering customers, orders, and inventory. Wrote advanced SQL with multi-table joins, subqueries, aggregations, and window functions (RANK, LAG, LEAD), then analyzed and indexed for performance on large synthetic datasets.

MySQLSQLIndexingSchema Design
2023 Course project

DSA Problem Solver

Implemented 20+ algorithms in Python — sorting, BFS/DFS, Dijkstra's shortest path, and dynamic programming — each with complexity analysis and deterministic, fixed-seed test cases for validation.

PythonAlgorithmsComplexity Analysis
2023 – Present Ongoing

Data Science Competitions

Structured prediction and tabular-data competitions, running the full pipeline from EDA through feature engineering to model tuning and submission. Cross-validation and hyperparameter search (GridSearchCV, RandomizedSearchCV) with ensemble methods like Random Forest and Gradient Boosting.

PandasSeabornScikit-learnEnsembles
Stage 05 — Deploy

Ready to put this to work on a real team.

I'm actively seeking Data Science, Machine Learning, and AI Engineering internship opportunities where I can contribute to real-world projects while continuing to grow my technical expertise. Feel free to connect through LinkedIn or email.

Location
Delhi, India