Tushar Kumar Singh — Data Science undergraduate at IIT Madras with experience in Python, SQL, Machine Learning, predictive modeling, and data analytics. I approach every problem as a structured pipeline: ingest → clean → engineer → model → evaluate. Currently expanding into AI Engineering, LLM APIs, Prompt Engineering, and Retrieval-Augmented Generation (RAG).
The cleaning stage: where raw background turns into something structured enough to build on.
I'm a Data Science undergraduate at IIT Madras (B.Sc. in Programming & Data Science, Machine Learning & Data Science track), building a solid base in Python, SQL, and applied machine learning.
Most of my project work follows the same shape: ingest the data, clean and explore it, engineer features, select and tune a model, and evaluate it honestly — with fixed seeds so results are reproducible. That same workflow discipline is what I'm now applying to LLM APIs, prompt design, and Retrieval-Augmented Generation, with self-directed work on the OpenAI, Gemini, and Claude APIs.
I'm comfortable with relational databases and structured data — schema design, joins, window functions, query optimization — which I think of as the backend that AI applications and chatbots ultimately have to query against. English proficiency: C1+.
What goes into the model: the languages, libraries, and habits I draw on for every project.
Verified learning and practical experience beyond coursework.
Completed data analysis and forensic technology tasks, created interactive dashboards using Tableau, classified business data in Excel, and delivered data-driven business recommendations.
Achieved an A grade in Programming in Python through IIT Madras coursework, covering data structures, algorithms, problem solving, and software fundamentals.
Where the pipeline produces something testable — four projects, end to end.
An end-to-end ML pipeline from data ingestion through cleaning, EDA, feature engineering, model selection, and evaluation. Logistic Regression and Random Forest classifiers, with results made reproducible by fixing random seeds.
Designed a normalized (3NF) relational schema for a simulated e-commerce system covering customers, orders, and inventory. Wrote advanced SQL with multi-table joins, subqueries, aggregations, and window functions (RANK, LAG, LEAD), then analyzed and indexed for performance on large synthetic datasets.
Implemented 20+ algorithms in Python — sorting, BFS/DFS, Dijkstra's shortest path, and dynamic programming — each with complexity analysis and deterministic, fixed-seed test cases for validation.
Structured prediction and tabular-data competitions, running the full pipeline from EDA through feature engineering to model tuning and submission. Cross-validation and hyperparameter search (GridSearchCV, RandomizedSearchCV) with ensemble methods like Random Forest and Gradient Boosting.
I'm actively seeking Data Science, Machine Learning, and AI Engineering internship opportunities where I can contribute to real-world projects while continuing to grow my technical expertise. Feel free to connect through LinkedIn or email.