Course title Advanced statistical methods
Assesment method Exam Hours per semester 60 Lect. Exercises Lab. Project
ETCS 4 Hours/week 2 2
Prerequisites
Basics of statistics and probability.
Course description

Advanced statistical methods with a special attention devoted to those which are frequently used in data mining will be discussed. In particular, inferential methods in regression, analysis of variance and generalized linear models as well as methods of modelling nominal data will be covered. One of the main recurring subjects will be checking the model adequacy and assessment of accuracy of model based prognosis. Discussion of existing software implementations of the relevant methods will accompany theoretical material.

Course objectives
Ability to detect dependence between attributes, fitting linear model to data, assessing goodness-of-fit and modifying it in case of inadequacy. Knowledge of ANOVA methods and logistic models and their applications such as credit scoring.
Grading
Project worth 30% of total score. Oral exam worth 40%. Exercises 30%.
Reference Texts and Software
  1. J. Koronacki, J. Mielniczuk, Statystyka dla studentów kierunków technicznych i przyrodniczych, WNT 2006
  2. J. Faraway, Linear models with R, Chapman & Hall/CRC 2004
  3. J. Faraway, Extending the linear models with R: generalized linear, mixed effects and nonparametric regression models, Chapman & Hall/CRC 2006
Lecture Schedule
1. Quntitative and qualitative attributes and measuring their dependence: correlation coefficient, Kendall and Spearman correlation.
2. Analysis of qualitative data: contingency tables, chi-square test. Testing hypothesis of independence.
3. Multiple regression model: Least Squares and ML estimators, their properties, total variance decomposition and coefficient of determination.
4. Tests of significance and goodness-of-fit in linear models, formal nad graphical methods of assessing goodness-of-fit, plot of residuals and partial residuals.
5. When the linear model fails: detection of outliers and influential observations, transformation of predictions, Box-Cox method, accounting for heteroscedasticity, collinearity.
6. Prediction in linear models and assessing its precision, Precitive measures of goodness of fit. Feature selection.
7. Beyond least squares: robust method of estimation, ridge, lasso and lars estimators. Partial least squares method.
8. Logistic regression: fitting the model, its diagnostics and tests for coefficients, applications in credit scoring
9. Poisson regression and rate models. Actuarial applications. Loglinear modelling and independence testing.
10. One-way and two-way ANOVA. Interactions. Decomposition of variance. Testing of significance of interaction and factors.
11. Problem of multiple testing, Bonferroni and Tukey approach. Repeated measures.
12. Parametric nonlinear regression, Nonlinear least squares.
13. Overview of nonparametric approaches to regression estimation.
14. Monte Carlo methods: generation of parametric random variables.
15. Monte Carlo-based testing and parametric estimation. Bootstrap.