Course title Advanced Data Analysis Software Development with R
Assesment method Exam Hours/semester 60 Lect. Exercises Lab. Project
ETCS Year Hours/week 2 2
Prerequisites
Basic knowledge of algorithms, data structures and R programming corresponding to introductory level classes in these topics is assumed.
Course description
R is a de facto standard language and environment for statistical computing, data analysis, and graphics. This course subjects students to the depth and breadth of advanced, state-of-the-art R programming practice.
Course objectives
The students' theoretical knowledge of data analysis, machine learning, and other computational methods often does not go hand-in-hand with their abilities to implement such algorithms on their own. The main aim of this very module is to fill this gap, so that the students shall have necessary skills to develop high quality software for their own scientific or any other purposes, but also to share it within the user community, via peer-reviewed R package repositories like CRAN or Bioconductor.
Skills

By completing the course, the students should be able to:

  • understand some general, advanced programming concepts,
  • analyze a problem and determine how to represent it with R language elements, which algorithms and data structure to use in order to obtain the most effective solution,
  • automatize and optimize data processing tasks,
  • write complex R applications, especially by composing them from simpler parts (if they are available),
  • properly and efficiently implement data analysis, machine learning, or any other in-memory computational methods,
  • debug, test, benchmark, and profile their code.
Grading

6 homework assignments, a couple of tasks each (60%)

Final exam, written (40%)

>50% to pass.

Reference Texts and Software

Books:

[1] Gągolewski M., Programowanie w języku R, Wydawnictwo Naukowe PWN, 2014 (in Polish).

[2] Chambers J.M., Programming with Data, Springer, 1998.

[3] Chambers J.M., Software for Data Analysis. Programming with R, Springer, 2008.

[4] Venables W.N., Ripley B.D., S Programming, Springer, 2000.

[5] Eddelbuettel D., Seamless R and C++ Integration with Rcpp. Springer, 2013.

[6] Wickham H., Advanced R, Chapman and Hall, 2014.

[7] Matloff N., The art of R programming, No Starch Press, 2011.

Software:

  • R (version 3.1 or higher),
  • RStudio (version 0.98 or higher),
  • Rtools (for Windows users), XCode (OS X users) or a C++11-compliant compiler (GCC, clang, or Cracle SolarisStudio) (UNIX/Linux users),
  • git,
  • various R packages available on CRAN (Rcpp, knitr, RSQLite, stringi, roxygen2, etc.).
Lecture Schedule
1.

R basics (part I)

  • Getting started with R
  • Getting started with RStudio
  • Classification of R data types. Basic atomic types
  • Basic vector operations
2.

R basics (part II)

  • Lists a.k.a. generalized vectors
  • Functions (closures)
  • Unit testing. Debugging. Exception handling
3.

R basics (part III)

  • Attributes
  • Compound types
  • Control flow expressions
  • Run-time measurement and estimation
4.

Character string processing

  • Strings representation
  • Basic string processing and searching tasks
  • Regular expressions
  • Date and time
5.

File processing

  • Basic operations on files and directories
  • Text files and connections
  • Common file formats
  • Literate programming & reproducible reports generation
6.

Advanced R programming

  • Environments
  • Computing on the language
  • Object-oriented programming
  • UNIX-like command line
  • Collaborative software development with Git
  • Writing R packages
7.

Rcpp (part I)

  • Rcpp introduction
  • C/C++ primer
  • R basic atomic types in Rcpp
  • Basic functions from the R/C API
8.

Rcpp (part II)

  • R non-basic types in Rcpp
  • C++11
  • C++ Standard Library
  • Rcpp in R packages