Course title  Advanced statistical methods  
Assesment method  Exam  Hours per semester  60  Lect.  Exercises  Lab.  Project  
ETCS  4  Hours/week  2  2  
Prerequisites  
Basics of statistics and probability.  
Course description  
Advanced statistical methods with a special attention devoted to those which are frequently used in data mining will be discussed. In particular, inferential methods in regression, analysis of variance and generalized linear models as well as methods of modelling nominal data will be covered. One of the main recurring subjects will be checking the model adequacy and assessment of accuracy of model based prognosis. Discussion of existing software implementations of the relevant methods will accompany theoretical material. 

Course objectives  
Ability to detect dependence between attributes, fitting linear model to data, assessing goodnessoffit and modifying it in case of inadequacy. Knowledge of ANOVA methods and logistic models and their applications such as credit scoring.  
Grading  
Project worth 30% of total score. Oral exam worth 40%. Exercises 30%.  
Reference Texts and Software  


Lecture Schedule  
1.  Quntitative and qualitative attributes and measuring their dependence: correlation coefficient, Kendall and Spearman correlation.  
2.  Analysis of qualitative data: contingency tables, chisquare test. Testing hypothesis of independence.  
3.  Multiple regression model: Least Squares and ML estimators, their properties, total variance decomposition and coefficient of determination.  
4.  Tests of significance and goodnessoffit in linear models, formal nad graphical methods of assessing goodnessoffit, plot of residuals and partial residuals.  
5.  When the linear model fails: detection of outliers and influential observations, transformation of predictions, BoxCox method, accounting for heteroscedasticity, collinearity.  
6.  Prediction in linear models and assessing its precision, Precitive measures of goodness of fit. Feature selection.  
7.  Beyond least squares: robust method of estimation, ridge, lasso and lars estimators. Partial least squares method.  
8.  Logistic regression: fitting the model, its diagnostics and tests for coefficients, applications in credit scoring  
9.  Poisson regression and rate models. Actuarial applications. Loglinear modelling and independence testing.  
10.  Oneway and twoway ANOVA. Interactions. Decomposition of variance. Testing of significance of interaction and factors.  
11.  Problem of multiple testing, Bonferroni and Tukey approach. Repeated measures.  
12.  Parametric nonlinear regression, Nonlinear least squares.  
13.  Overview of nonparametric approaches to regression estimation.  
14.  Monte Carlo methods: generation of parametric random variables.  
15.  Monte Carlobased testing and parametric estimation. Bootstrap. 