UCS654: Predictive Analytics using Statistics
(Jan to June 26 - EVEN2526)
Table of Content Join WhatsApp Group | Click Here
01 - Syllabus
02 - Lecture Resources
03 - LDP
04 - Lab Experiments
05 - Evaluation Scheme
06 - Kaggle Hack
Recommended Books
-
Peter Dalgaard, Introductory Statistics with R, Springer, Second Edition
-
Brett Lantz, Machine Learning with R (2nd Edition), www.PacktPub.com.
Reference Books
-
Online Resource| Click Here
-
Introduction to Machine Learning in R| Click Here
01 - Syllabus
UCS654 (Scheme-2023) | Syllabus | Link
Topics
-
Probability, conditional probability, random variable, PDF, PMF, joint distribution, statistical independence, variance, co-variance, correlation, differenrent distribution functions, Bayes theorem, central limit theorem. [will be asked in MST]
-
Sampling-Distributions, Parameter-Estimations, Hypothesis-Testing, Two-population, Tests, Regression and Correlation, UniVariate-Analysis, Multi-Variate, ANOVA. [will be asked in MST]
-
Mathematical modeling of regression (linear, non-linear, multiple), understanding error in model training (loss, bias, variance, overfitting, underfitting), maximum likelihood estimation to solve regression, transformation of classification to regression, ensembling. [will be asked in MST]
-
Basics of Neural Networks, different loss functions, validation and regularization, , multilayered, parameter optimization methods. [will be asked in EST]
-
Data generation using modeling and simulation, Association mining, ECLAT, Measuring data similarity and dissimilarity, and TOPSIS. [will be asked in EST]
Course Learning Outcomes (CLOs) / Course Objectives (COs)
CO1: Demonstrate the ability to use basic probability concepts with descriptive statistics. [Covered before MST]
CO2: Visualize the patterns in the data. [Covered before MST]
CO3: Demonstrate the use of statistical methods to estimate characteristics of the data. [Covered before MST]
CO4: Explain and demonstrate the use of predictive analytics in the field of data science. [Covered before EST]
Instruction(s)
-
The MST exam will be a blend of the programming and theory questions.
-
The exam will contain question with equal distribution of the marks. The maximum marks may be 30 marks.
-
There will be negative marking, if answers are not found in the sequence.
-
The cutting of the answer and any answer using pencil, will be awarded zero.
-
The answers will be evaluated with reference to the ideal solution.
02 - Lecture Resources
Unit 01: - Probability Theory | Link
Unit 02: - Advanced Statistics | Link
Unit 03: - Regression and Statistics in ML | Link
Unit 04: - Neural Network | Link
Unit 05: - Data Generation | Link
03 - Lecture Delivery Plan [till MST]
Week 1: Data Distribution & Identity [CODE Link]
L1: Probability, PMF, and PDF. Application: Using PDFs to detect data drift and reason the normalization of input data.
L2: Random Variables & Different Distributions (Gaussian, Bernoulli). Application: Initializing Neural Network weights.
Week 2: Relationships between Features
L3: Joint Distribution & Statistical Independence. Application: Identifying redundant features in a dataset to reduce model complexity.
L4: Variance, Co-variance, and Correlation. Application: Constructing correlation heatmaps to prevent Multi-collinearity in training.
Week 3: The Logic of Inference
L5: Conditional Probability & Bayes Theorem. Application: The Naive Bayes classifier and how models update beliefs with new data.
L6: Central Limit Theorem (CLT). Application: reasoning to use Mini-batches (Batch Norm) to stabilize gradients in Deep Learning.
Week 4: Sampling and Estimation
L7: Sampling Distributions & CLT in practice. Application: Why a Validation Set must be a representative sample of the Test Set.
L8: Parameter Estimation (Point vs. Interval). Application: Estimating the confidence of a model’s prediction (Uncertainty).
Week 5: Hypothesis Testing (Model Comparison)
L9: Hypothesis Testing & Tests. Application: Determining if a new DL model is actually better than an old one or just lucky.
L10: Two-population Tests. Application: A/B testing in ML deployments: comparing performance across two different user groups.
Week 6: Analyzing Complex Data
L11: Univariate vs. Multi-Variate Analysis. Application: Handling high-dimensional data inputs for Neural Networks.
L12: ANOVA (Analysis of Variance). Application: Feature selection: determining which input features significantly impact the model's output. Regression, Errors, and Optimization.
Week7: Mathematical Modeling of Regression
L13: Linear, Non-linear, and Multiple Regression. Application: Building the simplest Neural Network (a single neuron with a linear activation).
L14: Regression & Correlation as a Predictive Tool. Application: Predicting continuous values and measuring Goodness of Fit.
Week 8: Error Function
L15: Understanding Error: Loss, Bias, and Variance. Application: The Bias-Variance Tradeoff: why a perfect training score leads to failure.
L16: Overfitting vs. Underfitting. Application: Diagnosing learning curves to decide if you need more data or a smaller model.
04 - Lab Experiments
Experiment-1: Probability Distribution and PDF Fitting [Click Here]
-
Objective: To generate data and fit a probability density function to understanding data assumptions.
-
Theory: PDFs describe continuous random variables. ML models assumed data to be distributed as Gaussian distributions for stable learning.
-
Tasks to be performed
-
Generate synthetic data
-
Plot histogram
-
Fit Gaussian PDF
-
Expected Output: Histogram with overlaid Gaussian PDF.
-
Data Visualization and Statistical Analysis using Python [Guided Project]: Select one numerical feature from a real dataset, analyze its distribution, fit a Gaussian PDF, and justify whether normalization is required before applying a Machine Learning model.

Instructions
-
Guided projects must be of average to high difficulty level. The project completion certificate must be submitted through the link provided by the instructor.
-
Guided projects must be submitted only during the scheduled lab session. Any submission after the scheduled date and time will be awarded zero marks.
-
Lab experiment details and the manual will be shared through the course page.
-
Each lab experiment must be performed within the first 30 minutes of the lab session. A viva voce may be conducted during the evaluation to assess conceptual understanding.
-
Independent work is mandatory. Students must be able to explain their approach, logic, and results. Inability to justify the work may result in reduced or zero marks, even if the output is correct.
-
Use of AI tools, code generators, or online assistance during lab evaluation is strictly prohibited, unless explicitly permitted by the instructor.
-
Absence from the lab, irrespective of the reason, will not be considered for awarding marks in the lab experiment.
-
Latecomers will not be permitted to attend the lab under any circumstances.
-
Any form of unethical practice (plagiarism, copying, proxy attendance, or impersonation) will be dealt with strictly as per institute academic policies.
-
Evaluation will emphasize conceptual clarity, methodology, and interpretation, not merely producing correct results.
-
Students are advised to come well prepared by reviewing the lab manual and relevant theory in advance to ensure meaningful learning during the lab.
06 - Kaggle Hack
1.Kaggle-Hack-Lab-Exam-1 | Due Date: 20 Jan 2026 07:59:59 |
2.Kaggle-Hack-Lab-Exam-2 | Due Date: 27 Jan 2026 07:59:59 |
