top of page

UCS654: Predictive Analytics using Statistics
(Jan to June 26 - EVEN2526)

Table of Content                                                                   Join WhatsApp Group | Click Here    

01 - Syllabus

​02 - Lecture Resources

03 - LDP

​04 - Lab Experiments

​05 - Evaluation Scheme 

06 - Kaggle Hack

Recommended Books

  • Peter Dalgaard, Introductory Statistics with R, Springer, Second Edition

  • Brett Lantz, Machine Learning with R (2nd Edition), www.PacktPub.com.

 

Reference Books

Anchor 1
tableofcontent

01 - Syllabus​

UCS654 (Scheme-2023) | Syllabus | Link

Topics​

  • Probability, conditional probability, random variable, PDF, PMF, joint distribution, statistical independence, variance, co-variance, correlation, differenrent distribution functions, Bayes theorem, central limit theorem. [will be asked in MST]

​​

  • Sampling-Distributions, Parameter-Estimations, Hypothesis-Testing, Two-population, Tests, Regression and Correlation, UniVariate-Analysis, Multi-Variate, ANOVA. [will be asked in MST]

 

  • Mathematical modeling of regression (linear, non-linear, multiple), understanding error in model training (loss, bias, variance, overfitting, underfitting), maximum likelihood estimation to solve regression, transformation of classification to regression, ensembling. [will be asked in MST]

 

  • Basics of Neural Networks, different loss functions, validation and regularization, , multilayered, parameter optimization methods. [will be asked in EST]

 

  • Data generation using modeling and simulation, Association mining, ECLAT, Measuring data similarity and dissimilarity, and TOPSIS. [will be asked in EST]


Course Learning Outcomes (CLOs) / Course Objectives (COs)

       CO1: Demonstrate the ability to use basic probability concepts with descriptive statistics. [Covered before MST]
       CO2: Visualize the patterns in the data. [Covered before MST]
       CO3: Demonstrate the use of statistical methods to estimate characteristics of the data. [Covered before MST]
       CO4: Explain and demonstrate the use of predictive analytics in the field of data science. [Covered before EST]

Instruction(s)

  • The MST exam will be a blend of the programming and theory questions.

  • The exam will contain question with equal distribution of the marks. The maximum marks may be 30 marks.

  • There will be negative marking, if answers are not found in the sequence.

  • The cutting of the answer and any answer using pencil, will be awarded zero.

  • The answers will be evaluated with reference to the ideal solution.

02 - Lecture Resources​

Unit 01: - Probability Theory | Link

Unit 02: - Advanced Statistics | Link

Unit 03: - Regression and Statistics in ML | Link

Unit 04: - Neural Network | Link

Unit 05: - Data Generation | Link

03 - Lecture Delivery Plan [till MST]

Week 1: Data Distribution & Identity [CODE Link]

L1: Probability, PMF, and PDF. Application: Using PDFs to detect data drift and reason the normalization of input data.

L2: Random Variables & Different Distributions (Gaussian, Bernoulli). Application: Initializing Neural Network weights.

 

Week 2: Relationships between Features

L3: Joint Distribution & Statistical Independence. Application: Identifying redundant features in a dataset to reduce model complexity.

L4: Variance, Co-variance, and Correlation. Application: Constructing correlation heatmaps to prevent Multi-collinearity in training.

 

Week 3: The Logic of Inference

L5: Conditional Probability & Bayes Theorem. Application: The Naive Bayes classifier and how models update beliefs with new data.

L6: Central Limit Theorem (CLT). Application: reasoning to use Mini-batches (Batch Norm) to stabilize gradients in Deep Learning.

 

Week 4: Sampling and Estimation

L7: Sampling Distributions & CLT in practice. Application: Why a Validation Set must be a representative sample of the Test Set.

L8: Parameter Estimation (Point vs. Interval). Application: Estimating the confidence of a model’s prediction (Uncertainty).

 

Week 5: Hypothesis Testing (Model Comparison)

L9: Hypothesis Testing & Tests. Application: Determining if a new DL model is actually better than an old one or just lucky.

L10: Two-population Tests. Application: A/B testing in ML deployments: comparing performance across two different user groups.

 

Week 6: Analyzing Complex Data

L11: Univariate vs. Multi-Variate Analysis. Application: Handling high-dimensional data inputs for Neural Networks.

L12: ANOVA (Analysis of Variance). Application: Feature selection: determining which input features significantly impact the model's output. Regression, Errors, and Optimization.

 

Week7: Mathematical Modeling of Regression

L13: Linear, Non-linear, and Multiple Regression. Application: Building the simplest Neural Network (a single neuron with a linear activation).

L14: Regression & Correlation as a Predictive Tool. Application: Predicting continuous values and measuring Goodness of Fit.

 

Week 8: Error Function

L15: Understanding Error: Loss, Bias, and Variance. Application: The Bias-Variance Tradeoff: why a perfect training score leads to failure.

L16: Overfitting vs. Underfitting. Application: Diagnosing learning curves to decide if you need more data or a smaller model.

Experiment-1: Probability Distribution and PDF Fitting [Click Here]

  • Objective: To generate data and fit a probability density function to understanding data assumptions.

  • Theory: PDFs describe continuous random variables. ML models assumed data to be distributed as Gaussian distributions for stable learning.

  • Tasks to be performed

  1. Generate synthetic data

  2. Plot histogram

  3. Fit Gaussian PDF

  • Expected Output: Histogram with overlaid Gaussian PDF.

  • Data Visualization and Statistical Analysis using Python [Guided Project]: Select one numerical feature from a real dataset, analyze its distribution, fit a Gaussian PDF, and justify whether normalization is required before applying a Machine Learning model.

image.png

Instructions

  • Guided projects must be of average to high difficulty level. The project completion certificate must be submitted through the link provided by the instructor.

  • Guided projects must be submitted only during the scheduled lab session. Any submission after the scheduled date and time will be awarded zero marks.

  • Lab experiment details and the manual will be shared through the course page.

  • Each lab experiment must be performed within the first 30 minutes of the lab session. A viva voce may be conducted during the evaluation to assess conceptual understanding.

  • Independent work is mandatory. Students must be able to explain their approach, logic, and results. Inability to justify the work may result in reduced or zero marks, even if the output is correct.

  • Use of AI tools, code generators, or online assistance during lab evaluation is strictly prohibited, unless explicitly permitted by the instructor.

  • Absence from the lab, irrespective of the reason, will not be considered for awarding marks in the lab experiment.

  • Latecomers will not be permitted to attend the lab under any circumstances.

  • Any form of unethical practice (plagiarism, copying, proxy attendance, or impersonation) will be dealt with strictly as per institute academic policies.

  • Evaluation will emphasize conceptual clarity, methodology, and interpretation, not merely producing correct results.

  • Students are advised to come well prepared by reviewing the lab manual and relevant theory in advance to ensure meaningful learning during the lab.

06 - Kaggle Hack

1.Kaggle-Hack-Lab-Exam-1 | Due Date: 20 Jan 2026 07:59:59 | 

2.Kaggle-Hack-Lab-Exam-2 | Due Date: 27 Jan 2026 07:59:59 | 

syllabus
lecture_resources
ldp
lab_experiments
evaluation_scheme
kaggle_hack

Visual and Signal Information Processing Research Group

© 2021 Suresh Raikwar

  • Facebook
  • LinkedIn
bottom of page