Course title Java for science
Assesment method Project + Exam Hours per semester 60 Lect. Exercises Lab. Project
ETCS 4 Hours/week 2 2
Intermediate knowledge of Java (at least one university course completed), basic knowledge of algorithms.
Course description

The first part of the course reinforces knowledge of Java. It is assumed that a student is already familiar with the Java programming language, at least on intermediate level, so that material can be presented in the form of a discussion. Topics such as the outcomes of executions of different Java programs, possible errors, misinterpretations and pitfalls of the language will all be addressed. This course prepares students for international examinations of Java expertise.

The second part of the lecture demonstrates the use of Java in various scientific areas. Topics are taught in the form of step by step solutions to specific problems. Then the user is introduced to numerous possibilities of applying Java and is shown where to find further information. At the end of each class examples of issues are presented as an inspiration for the students to develop their own applications.

Course objectives
Better knowledge of Java as a tool in business and scientific projects.
Project (60%) + exam (40%) or exam (100%).
Reference Texts and Software
  1. James Gosling, Bill Joy, Guy Steele, Gilad Bracha, Alex Buckley "The Java Language Specification, Java SE 7 Edition"
  4. Bruce Eckel "Thinking in Java"
  5. Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman: Mahout in Action
  6. Scaling up Machine Learning: Parallel and Distributed Approaches, Cambridge University Press (December 30, 2011)
  7. A. Rajaraman, J.D. Ullman: Mining of Massive Datasets, Cambridge University Press 2011
  8. Tom White: Hadoop: The Definitive Guide, O'Reilly Media
Lecture Schedule
1. Basic structures and operations
  • primitive types
  • numerical and logical operators
  • operator precedence
  • common mistakes with ==
  • accelerate comparison methods
  • flow control
  • IO streams
2. Threads
  • creation and running
  • deadlocks
  • wait, sleep, notify
  • working in multi-core systems
  • OS implementation
  • limits and costs
3. Regular expressions
  • patterns
  • data validation
  • finding data
  • performance
4. Working with methods, encapsulation and inheritance
  • overloading
  • narrowing return types
  • data access protection
  • polymorphism
  • virtual methods, overriding, hiding
  • interfaces and defaults
5. Debugging and optimization
  • code and state manipulation at runtime
  • Java profiler, JConsole
  • Java memory
  • Object's lifecycle
  • garbage collector
6. Handling exceptions
  • checked and unchecked exceptions
  • assert
  • catching exceptions
  • cost of throwing and catching exception
7. XML
  • DOM, SAX, StAX
  • XSL, XSLT, XPath
  • data serializations
8. Developer's tools
  • Ant
  • Maven
  • git
  • JUnit
  • JavaDoc
9. Language improvements
  • lambda expressions
  • streams
  • generics
  • annotations
  • languages integration (JNI, Nashorn, Jython)
10. Gathering web data for data mining
  • connecting to web server
  • connecting to web services
  • DDoS protection issue
11. Introduction to Data Science
  • Introduction to the problems of Machine Learning
  • supervised learning: regression, classification
  • unsupervised learning: clustering
  • Data Science = Applied, Scalable Machine Learning
12. Introduction to Map-Reduce processing
  • What is parallel computing?
  • Scalable algorithms at web-scale problems
  • The Map-Reduce paradigm
  • The Map-Reduce example
  • The Map-Reduce applications
  • Apache Hadoop Library
13. Distributed File Systems
  • Map-Reduce input and output
  • Introduction to the concept of Distributed File Systems
14. Distributed Databases
  • Modern database technology: SQL vs NoSQL processing
  • Big-Table and HBase
15. Machine Learning in Hadoop environment
  • Mahout Library
  • examples: classification, regression, clustering, dimensionality reduction