Data Science
Faculty of Mathematics and Computer Science
Courses offered in 2023/2024, semester I
Courses offered in 2023/2024, semester I
Last updated: 02.10.2023
INTRODUCTORY MEETING: 2023, October 2 (Monday), at 13:30 in room 119 (Institute of Computer Science)
INFORMATION FOR DATA SCIENCE STUDENTS (Version: 28.09.2023):
2023_information_for_students.pdf
 Courses with title in English are offered in English. Courses with Polish titles _may_ be offered in English upon request
Tutoring: Paweł Rychlikowski
Mandatory Courses:
 Machine Learning (ECTS: 6), Marek Adamczyk
Elective Core Courses:
Mathematical Institute:
 Modele liniowe (ECTS: 6), Liudmyla Zaitseva
 Simulations and algorithmic applications of Markov chains (Symulacje i algorytmiczne zastosowania łańcuchów Markowa) (ECTS: 6), Paweł Lorek
 Theoretical foundations of the analysis of large data sets (ECTS: 6), Liudmyla Zaitseva
 Wprowadzenie do symulacji i metod Monte Carlo (ECTS: 6), Tomasz Rolski
Institute of Computer Science:
 Algorytmy ewolucyjne (ECTS: 6), Piotr WnukLipiński
 Artificial Intelligence for Games: A Bit of Classics (ECTS: 6), Jakub Kowalski
 Introduction to cloud computing (ECTS: 6), Piotr Wieczorek
 Introduction to Linear Optimization (ECTS: 6), Martin Böhm
 Kurs: Obliczenia równoległe na kartach graficznych CUDA (Q2) (ECTS: 3), Andrzej Łukaszewski
 Obliczeniowa teoria uczenia się (Computational learning theory) (ECTS: 6), Jan Otop
 Projekt: silnik szachowy (ECTS: 4), Marek Adamczyk
Elective Courses:
Mathematical Institute:

ALL lectures available at "Plan zajęć" (schedule) which are of type M (Advanced courses) belong to Elective Courses. Some most relevant:
 Statistics and linear models (ECTS: 6), Grzegorz Wyłupek
 Statystyka (ECTS: 7), Grzegorz Wyłupek
 Szeregi czasowe (ECTS: 6), Wojciech Cygan
 Elearning: Computer intensive methods (ECTS: 3) (Hasselt)
 Elearning: Semiparametric regression (ECTS: 4) Jaroslaw Harezlak
Institute of Computer Science:

ALL lectures available at
Zapisy which are of type I.2 or K2 belong to Elective Courses. Some most relevant:
 Metody stochastyczne w analizie algorytmów (ECTS: 6), Filip Zagórski
 Kody korekcyjne (error correcting codes) (ECTS: 6), Artur Jeż
 Scheduling theory (ECTS: 6), Łukasz Jeż
Review seminars:
Computer Science Institute
 Seminar: Graph Neural Networks and Applications (ECTS: 3), Piotr WnukLipiński
 Seminarium: Game Botting (ECTS: 3), Jakub Kowalski
 Seminarium: Kompetytywna Sztuczna Inteligencja (ECTS: 3), Jakub Kowalski
 Seminarium: Wielkie modele językowe / Large Language Models (ECTS: 3), Paweł Rychlikowski
Team projects:
 Projekt: silnik szachowy (ECTS: 4), Marek Adamczyk
Mandatory Core Courses
These three core courses are mandatory for all students. Their role is to give a basic toolbox for future data scientists and provide solid mathematical foundations that enable you to take more advanced and applied courses. We expect students to take them in the first two semesters.
Numerical Optimization
This course is a detailed survey of optimization from both a computational and theoretical perspective. Theoretical topics include convex sets, convex functions, optimization problems, leastsquares, linear and quadratic programs, optimality conditions, and duality theory. Special emphasis is put on scalable numerical methods for analyzing and solving linear programs (e.g. simplex), general smooth unconstrained problems (e.g. firstorder and secondorder methods), quadratic programs (e.g. linear least squares), general smooth constrained problems (e.g. interiorpoint methods), as well as, a family of nonsmooth problems (e.g. ADMM method). The applications in data sciences, such as machine learning, model fitting, and image processing, will be discussed. The computational part covers the following algorithms: gradient method, quasiNewton methods, proximal gradient method, Nesterov’s accelerated gradient method, augmented Lagrangian method, alternating direction method of multipliers, block coordinate descent method and stochastic gradient descent method. Students complete handson exercises using highlevel numerical software.
Machine Learning
This course provides the fundamentals of designing programs that implement a data driven, rather than handimplemented behavior. The course provides a gentle introduction of the topic, but strives to provide enough details and intuitions to explain stateoftheart ML approaches: ensembles of Decision Trees (Boosted Trees, Random Forests) and Neural Networks. Starting with simple linear and Bayesian models, we proceed to learn the concepts of trainable models, selecting the best model based on data, practical and theoretical ways of estimating model performance on new data, and the difference between discriminative and generative training. The course introduces mainstream algorithms for classification and regression including linear models, Naive Bayes, trees, ensembles, and matrix factorizations for recommendation systems. Practical sessions provide a handson experience with the methods.
Statistical Learning
This course is mainly devoted to the analysis of ''fat'' data sets with large number of variables. In this situation the effective analysis requires techniques of dimensionality reduction. We discuss classical and modern methods of dimensionality reduction in the context of supervised and unsupervised learning. Specifically, we consider principal component analysis, subspace clustering and Gaussian graphical models (unsupervised learning) and different penalized methods for building predictive models (supervised learning) including ridge regression, LASSO and SLOPE. The emphasis is be placed on understanding the statistical properties of discussed methodology through theoretical results, simulation studies and analysis of real data.
Elective Core Courses
To gain more specialized knowledge in different areas of data science, students have to take at least four of the following fundamental elective courses. Each course is taught at least once per two years (a semester with the next edition is given at course description). Topics are subject to slight changes and updates, which reflect the evolution of the data science field and varying requirements from the job market.
Methods of classification and dimensionality reduction
The course provides a survey of dimensionality reduction (feature extraction) and classification methods. Dimensionality reduction enhances the performance of computer vision and machine learningbased approaches, it allows to represent the data in a more efficient way, it allows to visualise highdimensional data. Among others, we study principal component analysis (PCA), nonnegative matrix factorization (NMF), independent component analysis (ICA), tdistributed Stochastic Neighbour Embedding (t SNE). Concerning classification methods, we study many classical "shallowlearning" classifiers, e.g., nearest neighbours, naive Bayes, support vector machine (SVM), linear and quadratic discriminant analysis (LDA and QDA), decision trees. Though all details are provided for most methods, we put a strong emphasis on intuition and practical applications: we discuss (and apply the acquired knowledge to various practical problems in lab classes), e.g., classification of multidimensional data (including time series, images and texts), image compression, topic recovery, recommendation systems.
Simulations and algorithmic applications of Markov chains
The course is devoted to discrete time Markov chains with finite state space. We gently start with fundamentals (stationary distributions, transitionmatrix based simulations, reversibility) and go through monte carlo Markov chain methods (MCMC, a class of algorithms providing one of the currently most popular methods for simulating complicated stochastic systems); rate of convergence methods ("how many times should we shuffle a deck of cards?" — we study coupling methods, strong stationary times, strong stationary duality, inequalities (Cheeger and Poincaré) for bounding the second largest eigenvalue of a transition matrix); coupling from the past (CFTP) algorithm (improvement of standard MCMC, allows to obtain an unbiased sample from given distribution on huge state space, e.g., Ising model); estimating winning probabilities in gambler ruinlike problems (first step analysis and Siegmund duality); simulated annealing (a widely used randomized algorithm for various optimization problems); basics of hidden markov models (HMM, a popular machine learning algorithm, e.g., for speech recognition); randomized polynomial time approximation schemes (MCMC originated algorithm for approximating "the answer" to NPhard related problem, e.g., graph coloring).
Natural language processing
The aim of the course is to discuss the methods used in the analysis and processing of texts in natural languages, with particular emphasis on results that can be translated into effective implementation. We consider both classical methods of language modelling (Hidden Markov Models, (Probabilistic) Context Free Grammars, Finite State Transducers) and modern, neural networks based approaches: RNN, LSTM, Convolutional Neural Networks and Transformer. We show several applications of this methods, including POStagging, dependency parsing, Named Entity Recognition, Machine Translation, and Natural Language Generation.
Text mining
In this course, we cover basic and more advanced information retrieval techniques. We also discuss data mining methods applied to texts. We discuss how to implement from scratch efficient systems gathering information from large text corpora. We analyze in detail several variants of word embeddings, and their use in computing texts similarity. We discuss text classification methods, flat and hierarchical clusterization, automatic summarisation, text comprehension methods and question answering.
Advanced Data Mining
This course focuses on advanced data mining algorithms for processing big, complex and unstructured data. It mainly concerns recommendation systems, dimensionality reduction with neighborhood embedding, temporal data mining and decision support systems. In recommendation systems, various approaches from simple collaborative filtering to advanced matrix factorization are presented and discussed in the context of their practical relevance, concerning not only the popular MSE or MAE measures, but also the coverage, diversity, and novelty of recommendations. In temporal data mining, beside the analysis of regular time series with machine learning methods, such as Support Vector Regression and Neural Networks, unstructured temporal data are studied. Student projects concern unstructured datasets, such as irregular multidimensional time series, GPS tracks or medical images.
Tools and methods in big data processing
This course covers the technical background useful in processing large amounts of data in a distributed environment. We discuss cloud computing basics and then introduce Hadoop Distributed File System (HDFS) architecture and MapReduce programming paradigm. The course studies indepth Apache Spark and its ecosystem including Spark programming with Scala, Spark SQL, Spark graph processing framework GraphX, also TinkerPop and Gremlin traversals. Finally, some time is spent on stream processing technologies such as Apache Kafka and Spark Streaming.
Theory of Analysis of Large Data Sets
In this course we will provide mathematical theory, which explains statistical problems in the analysis of high dimensional data. In the first part of the lecture we will concentrate on the generic problem of estimating the vector of means of multivariate normal distribution. We will discuss testing the global hypothesis that this vector is equal to zero and show a variety of methods which are optimal under different scenarios concerning the sparsity of this vector. We will also discuss the detectability curve, which provides the relationship between the sparsity and the magnitude of the elements of the vector of means, so that the signal can be identified with testing procedures. The course will also cover different procedures for multiple testing, Stein identity and JamesStein estimator of the vector of means. Finally, we will discuss the topic of selective inference and the application of the above ideas in the context of high dimensional multiple regression.
Neural Networks
Neural Networks course provides indepth understanding of Artificial Neural Networks and Deep Learning methods, from both a theoretical and practical standpoint. The course will neural network fundamentals and provide details on most commonly used models: convolutional architectures used for image processing, recurrent networks for sequential data, and attentionbased models used in language processing. We will also study modern data generation methods, such as variational autoencoders and generative adversarial networks, concentrating on learning features that are useful in downstream data processing tasks. Accompanying lab sessions will provide handson experience with the material.
Numerical programming tools and methods
Numerical programming tools and methods is an introduction to programming languages and frameworks used in data science and deep learning. Concentrating on Python ecosystem, the course teaches numpy internals, data wrangling with pandas, and efficient usage of hardware accelerators through deep learning frameworks. Techniques are illustrated using practical projects that involve physical simulations, digital signal processing, realworld data analysis and modeling using machine learning.
Analysis of Complex Data. Jarosław Harezlak
Semiparametric regression. Jarosław Harezlak
Seminar: probabilistic graphical models
Team Project: TBA
Additional courses
Students may enrich their competences by taking additional courses that are not directly data science related, but could give them competitive edge in their future careers, providing them, for example, with unique skills in optimization or algorithms. While the students may freely choose any masterlevel courses taught at the department, the list below contains courses that are particularly suited for the data science students.
Additional courses
 Algorithmic game theory. Jarosław Byrka
 Algorithms on strings. Paweł Gawrychowski
 Artificial intelligence in games. Jan Kowalski
 Approximation algorithms. Katarzyna Paluch
 Category theory. Maciej Piróg
 Combinatorial optimization. Katarzyna Paluch
 Combinatorics. Grzegorz Stachowiak
 Computational complexity. Krzysztof Loryś
 Computational geometry. Paweł Gawrychowski
 Cryptography. Grzegorz Stachowiak
 Data compression. Tomasz Jurdziński
 Distributed algorithms. Tomasz Jurdziński
 Interactive theorem proving in Coq. Małgorzata Biernacka and Filip Sieczkowski
 Introduction to simulations and Monte Carlo methods. Tomasz Rolski, Paweł Lorek
 Online algorithms. Marcin Bieńkowski
 Photorealistic computer graphics. Andrzej Łukaszewski
 Program analysis. Witold Charatonik
 Randomized algorithms. Marek Piotrów
 Semantics of programming languages. Dariusz Biernacki and Filip Sieczkowski
 Theory of linear and integer programming. Jarosław Byrka
 UNIX kernel structure. Krystian Bacławski
 Verification of programs. Małgorzata Biernacka and Witold Charatonik
 Word equations. Artur Jeż
Contact
Department of Mathematics and Computer Science, University of Wrocław.
ul. JoliotCurie 15
50384 Wrocław
Program related questions: datascience@uwr.edu.pl
Administration/recruitment procedure related questions: international@uwr.edu.pl