Data Science Bootcamp Curriculum
Curriculum Highlights
Basics of Data Science
Python for AI
Machine Learning
NLP, LLMs,
Computer Vision
Curriculum Overview
- Data Literacy
- Statistical Foundations
- Descriptive Statistics
- Statistical Inference and Sampling Techniques
- Regression Analysis
- Basic Data and Statistical Concepts
- Data Cleaning, Preparation, and Management
- Data Processing
- Importing data from diverse sources
- Creating a basic report with various visuals
- Data analysis, manipulation and filtering in Power BI
- Creating measures and calculated columns
- Filtering data in a report
- Using slicers, dynamic filtering of a report
- Introduction to DAX
- Using DAX to solve complex data problems
- Visualizing cross sections of data
- Introduction to SQL
- Filtering, Sorting, and Aggregating Data
- Table Joins and Case Statements
- Advanced Query Techniques
- Window Functions and Data Modification
- Creating New Tables
- Advanced Topics and Artificial Intelligence
- Date Variables and Artificial Intelligence
- Basics of Prompt Engineering
- Definition and importance of prompt engineering.
- The relationship between prompts and AI outputs.
- Applications in Data Science and Machine Learning
- Case studies illustrating the impact of prompt engineering in these fields.
- Introduction of spatial analysis
- Introduction of QGIS
- Creating shapefiles. (Point, line and polygon)
- Learning basic Cartography
- Learning basic vector analysis tools.
- Creating heatmaps
- Using Google maps to create maps in QGIS
- Introduction of Raster file
- Real-world examples of GIS in various industries and fields
- Linear algebra
- Probability
- Statistics
- Introduction to Python: history, features, and advantages
- Expressions and operators: arithmetic, assignment, comparison, and logical
- Understanding type() function and type inference
- Introduction to data structures: lists, tuples, and dictionaries
- Recap of Python basics
- Working with arithmetic operators: addition, subtraction, multiplication, division, modulus, and exponentiation
- Using comparison operators: equal to, not equal to, greater than, less than, etc.
- Logical operators: and, or, and not
- Exploring advanced data types: sets and strings manipulation.
- Evaluating expressions: operator precedence and associativity
- Introduction to conditional statements: if, elif, and else
- Executing code based on conditionals.
- Understanding the flow of control in conditional statements
- Iteration using the for loop: range(), iteration over lists, and strings.
- Working with while loop: syntax, conditions, and examples
- Combining loops and conditionals
- Using the break statement to exit loops prematurely.
- Utilizing the continue statement to skip iterations.
- Implementing nested loops for complex iterations
- Introduction to functions: purpose, advantages, and best practices
- Defining and calling user-defined functions
- Parameters and arguments: positional, keyword, and default values
- Return statement and function output.
- Variable scope and lifetime
- Function documentation and code readability
- Understanding exceptions: errors, exceptions, and exception hierarchy
- Handling exceptions using try-except blocks: handling specific exceptions, multiple exceptions, and else and finally clauses.
- Raising exceptions and creating custom exception classes
- File handling in Python: opening, reading, writing, and closing files.
- Working with different file modes and file objects
- Introduction to the NumPy module: features and applications
- Working with multidimensional arrays: creation, indexing, slicing, and reshaping
- Performing element-wise operations: arithmetic, logical, and statistical
- Overview of the Matplotlib module: data visualization and plotting
- Customizing plots: line properties, markers, colors, labels, and legends
- Introduction to Kaggle platform: features and benefits
- Leveraging Kaggle for real-life datasets: data exploration, analysis, and visualization
- Introduction to machine learning modules on Kaggle: scikit-learn, TensorFlow, and PyTorch
- Overview of running machine learning experiments on Kaggle
- Resources for further learning and exploration
- Introduction to Exploratory Data Analysis (EDA)
- Importance of EDA in data analysis
- Steps involved in EDA
- Handling missing values: identification, analysis, and treatment strategies • Imputation techniques for missing values
- Data consistency checks using fuzzy logic
- Binning and discretization techniques for continuous variables
- Outlier detection and analysis methods
- Handling outliers: techniques for treatment or removal
- Importance of feature selection in EDA
- Feature selection techniques: filter methods, wrapper methods, and embedded methods
- Data wrangling: cleaning and transforming data for analysis
- Handling categorical variables: encoding techniques
- Inference and hypothesis testing in EDA
- Common statistical tests: t-test, chi-square test, ANOVA, etc.
- Visualization techniques for EDA: histograms, box plots, scatter plots, etc.
- Hands-on practical session for complete EDA using a dataset
- Evaluation metrics for classification problems: accuracy, precision, recall, F1 score, etc.
- Introduction to Naive Bayes algorithm and its applications
- Implementing Naive Bayes for classification tasks
- Logistic Regression: theory, interpretation, and applications
- Support Vector Machines (SVM): concepts, kernels, and use cases
- Decision Trees: construction, pruning, and interpretability
- Random Forests: ensemble learning and feature importance
- Bagging and Boosting: techniques for improving model performance
- Hyperparameter tuning techniques: grid search, random search, and Bayesian optimization
- Principal Component Analysis (PCA): dimensionality reduction and feature extraction
- Singular Value Decomposition (SVD): applications in matrix factorization and data compression
- Introduction to clustering: unsupervised learning technique
- Partitioning algorithms: K-means, K-medoids
- Hierarchical clustering: agglomerative and divisive approaches
- Density-based clustering: DBSCAN, OPTICS
- Cluster evaluation metrics: silhouette coefficient, Davies-Bouldin index
- Introduction to regression analysis
- Linear regression: assumptions, interpretation, and model evaluation • Evaluation metrics for regression: mean squared error, R-squared, etc.
- Other regression methods: polynomial regression, ridge regression, lasso regression
- Concepts and characteristics of time series data
- Time series components: trend, seasonality, and noise
- Popular time series forecasting models: ARIMA, SARIMA, exponential smoothing • Implementing time series forecasting models
- Evaluation metrics for time series forecasting: mean absolute error, mean absolute percentage error, etc.
- Cross-validation techniques for time series data
- Hyperparameter tuning for time series models
- Overview of Natural Language Processing (NLP)
- Evolution of Large Language Models (LLMs)
- Importance and Applications of NLP and LLMs
- Linguistic Concepts
- Tokenization and Text Preprocessing
- Part-of-Speech (POS) Tagging
- Named Entity Recognition (NER)
- Sentiment Analysis
- Text Classification
- Word Embeddings and Language Representations
- The Transformer Architecture
- Attention Mechanisms
- GPT, BERT, and Other Key Models
- Pretraining and Fine-Tuning Techniques
- Evaluation Metrics and Benchmarks
- Chatbots and Conversational AI
- Text Summarization
- Machine Translation
- Content Generation and Creative Writing
- Question Answering Systems
- Semantic Search and Text Mining
- Bias and Fairness
- Privacy and Security
- Model Interpretability and Explainability
- Environmental Impact and Computational Requirements
- Getting Started with NLP Libraries (spaCy, NLTK, Hugging Face Transformers)
- Building a Simple Text Classifier
- Fine-Tuning a Large Language Model for a Specific Task
- Evaluating Model Performance and Error Analysis
- Multimodal Models and Human-AI Interaction
- Low-Resource Languages and Transfer Learning
- Knowledge-Enhanced Language Models
- Efficient Training and Deployment Techniques
- Cascade and HOG classifiers to detect faces
- Face detection using OpenCV and Dlib library
- Detect other objects using OpenCV, such as cars, clocks, eyes, and full body of people
- KCF and CSRT algorithms to perform object tracking
- convolutional neural networks and implement them using Python and TensorFlow
- Detect objects in images in videos using YOLO, one of the most powerful algorithms today
- Recognize gestures and actions in videos using OpenCV
- Create hallucinogenic images with Deep Dream
- Create images that don’t exist in the real world with GANs (Generative Adversarial Networks)
- Fundamentals of Reinforcement Learning
- Sample-based Learning Methods
- Prediction and Control with Function Approximation
- Fundamentals of Diffusion Models
- Stable Diffusion in Practice
- Methods, Jobs and Tools of Stable Diffusion
- Github Actions
- Airflow
- Kubernetes
- MLFlow
- ML System Design
- API Building (Flask/FastAPI)
- Cloud Services (AWS/Azure)
- WandB
- 10+ projects in 6 months
- International speakers and mentors for guided projects
- Industry level data sets and projects
- Continuous practice with real world case studies with data analytics
- Skills demonstration on data cleaning, data analysis, data visualization
- Email writing
- Logic and critical thinking
- Reporting writing
- LinkedIn optimisation
- Presentation and visual communication
- Resume, CV and cover letter writing
- Acing interviews
- Personal branding
- Global market understanding
- One on one mentorship