Data Science

Faculty of Mathematics and Computer Science

Courses offered in 2025/2026, semester I

Last updated: 01.10.2025

(Infromation for students from September 2024: 2024_information_for_students.pdf)

Courses with title in English are offered in English. Courses with Polish titles _may_ be offered in English upon request

Mandatory Courses:

Elective Core Courses:

Mathematical Institute:

Probabilistyczne podstawy AI (ECTS: 6), Dariusz Buraczewski
Simulations and algorithmic applications of Markov chains (ECTS: 6), Paweł Lorek
Theoretical foundations of the analysis of large data sets (ECTS: 6), Liudmyla Zaitseva
Wprowadzenie do symulacji i metod Monte Carlo (ECTS: 6), Paweł Lorek

Institute of Computer Science:

Artificial Intelligence <3 Games: Procedural Content Generation (ECTS: 6), Jakub Kowalski
Introduction to Linear Optimization (ECTS: 6), Martin Böhm
Optymalizacja głębokich sieci neuronowych na urządzenia IoT (ECTS: 6), Filip Chudy
Project: Uncertainty in AI (ECTS: 4), Klaudia Dynak
Warsztat AI (ECTS: 3), Maria Szlasa

Elective Courses:

Mathematical Institute:

"Plan zajęć" (schedule)

Numerical methods (ECTS: 6), Michael Hecht
Modele liniowe (ECTS: 6), Michał Kos
Szeregi czasowe (ECTS: 6), Wojciech Cygan
Wnioskowanie statystyczne (ECTS: 6), Grzegorz Wyłupek

Institute of Computer Science:

Zapisy

Digital signal processing in telecommunications (ECTS: 6), Filip Chudy
AI i społeczeństwo (ECTS: 6), Małgorzata Biernacka

Review seminars:

Mathematical Institute

Kernel methods, graphical models and approximate inference , (ECTS: 2) Małgorzata Bogdan

Computer Science Institute

Seminar: Generative AI (ECTS: 3), Rafał Nowak
Seminar: Graph Neural Networks and Applications (ECTS: 3), Piotr Wnuk-Lipiński

Team projects:

Project: eXplainable Artificial Intelligence (also Elective Core Course), (ECTS: 4), Klaudia Balcer
Projekt: Deep Learning (also Elective Core Course), (ECTS: 6), Rafał Nowak
Projekt: Machine Learning for Temporal Data Mining (also Elective Core Course), (ECTS: 4), Piotr Wnuk-Lipiński

Mandatory Core Courses

These three core courses are mandatory for all students. Their role is to give a basic toolbox for future data scientists and provide solid mathematical foundations that enable you to take more advanced and applied courses. We expect students to take them in the first two semesters.

Numerical Optimization

This course is a detailed survey of optimization from both a computational and theoretical perspective. Theoretical topics include convex sets, convex functions, optimization problems, least-squares, linear and quadratic programs, optimality conditions, and duality theory. Special emphasis is put on scalable numerical methods for analyzing and solving linear programs (e.g. simplex), general smooth unconstrained problems (e.g. first-order and second-order methods), quadratic programs (e.g. linear least squares), general smooth constrained problems (e.g. interior-point methods), as well as, a family of non-smooth problems (e.g. ADMM method). The applications in data sciences, such as machine learning, model fitting, and image processing, will be discussed. The computational part covers the following algorithms: gradient method, quasi-Newton methods, proximal gradient method, Nesterov’s accelerated gradient method, augmented Lagrangian method, alternating direction method of multipliers, block coordinate descent method and stochastic gradient descent method. Students complete hands-on exercises using high-level numerical software.

Waldemar Hebisch

Rafał Nowak

Machine Learning

This course provides the fundamentals of designing programs that implement a data- driven, rather than hand-implemented behavior. The course provides a gentle introduction of the topic, but strives to provide enough details and intuitions to explain state-of-the-art ML approaches: ensembles of Decision Trees (Boosted Trees, Random Forests) and Neural Networks. Starting with simple linear and Bayesian models, we proceed to learn the concepts of trainable models, selecting the best model based on data, practical and theoretical ways of estimating model performance on new data, and the difference between discriminative and generative training. The course introduces mainstream algorithms for classification and regression including linear models, Naive Bayes, trees, ensembles, and matrix factorizations for recommendation systems. Practical sessions provide a hands-on experience with the methods.

Jan Chorowski

Statistical Learning

This course is mainly devoted to the analysis of ''fat'' data sets with large number of variables. In this situation the effective analysis requires techniques of dimensionality reduction. We discuss classical and modern methods of dimensionality reduction in the context of supervised and unsupervised learning. Specifically, we consider principal component analysis, subspace clustering and Gaussian graphical models (unsupervised learning) and different penalized methods for building predictive models (supervised learning) including ridge regression, LASSO and SLOPE. The emphasis is be placed on understanding the statistical properties of discussed methodology through theoretical results, simulation studies and analysis of real data.

Małgorzata Bogdan

Elective Core Courses

To gain more specialized knowledge in different areas of data science, students have to take at least four of the following fundamental elective courses. Each course is taught at least once per two years (a semester with the next edition is given at course description). Topics are subject to slight changes and updates, which reflect the evolution of the data science field and varying requirements from the job market.

Methods of classification and dimensionality reduction

The course provides a survey of dimensionality reduction (feature extraction) and classification methods. Dimensionality reduction enhances the performance of computer vision and machine learning-based approaches, it allows to represent the data in a more efficient way, it allows to visualise high-dimensional data. Among others, we study principal component analysis (PCA), non-negative matrix factorization (NMF), independent component analysis (ICA), t-distributed Stochastic Neighbour Embedding (t- SNE). Concerning classification methods, we study many classical "shallow-learning" classifiers, e.g., nearest neighbours, naive Bayes, support vector machine (SVM), linear and quadratic discriminant analysis (LDA and QDA), decision trees. Though all details are provided for most methods, we put a strong emphasis on intuition and practical applications: we discuss (and apply the acquired knowledge to various practical problems in lab classes), e.g., classification of multidimensional data (including time series, images and texts), image compression, topic recovery, recommendation systems.

Paweł Lorek

Simulations and algorithmic applications of Markov chains

The course is devoted to discrete time Markov chains with finite state space. We gently start with fundamentals (stationary distributions, transition-matrix based simulations, reversibility) and go through monte carlo Markov chain methods (MCMC, a class of algorithms providing one of the currently most popular methods for simulating complicated stochastic systems); rate of convergence methods ("how many times should we shuffle a deck of cards?" — we study coupling methods, strong stationary times, strong stationary duality, inequalities (Cheeger and Poincaré) for bounding the second- largest eigenvalue of a transition matrix); coupling from the past (CFTP) algorithm (improvement of standard MCMC, allows to obtain an unbiased sample from given distribution on huge state space, e.g., Ising model); estimating winning probabilities in gambler ruin-like problems (first step analysis and Siegmund duality); simulated annealing (a widely used randomized algorithm for various optimization problems); basics of hidden markov models (HMM, a popular machine learning algorithm, e.g., for speech recognition); randomized polynomial time approximation schemes (MCMC- originated algorithm for approximating "the answer" to NP-hard related problem, e.g., graph coloring).

Paweł Lorek

Natural language processing

The aim of the course is to discuss the methods used in the analysis and processing of texts in natural languages, with particular emphasis on results that can be translated into effective implementation. We consider both classical methods of language modelling (Hidden Markov Models, (Probabilistic) Context Free Grammars, Finite State Transducers) and modern, neural networks based approaches: RNN, LSTM, Convolutional Neural Networks and Transformer. We show several applications of this methods, including POS-tagging, dependency parsing, Named Entity Recognition, Machine Translation, and Natural Language Generation.

Paweł Rychlikowski

Text mining

In this course, we cover basic and more advanced information retrieval techniques. We also discuss data mining methods applied to texts. We discuss how to implement from scratch efficient systems gathering information from large text corpora. We analyze in detail several variants of word embeddings, and their use in computing texts similarity. We discuss text classification methods, flat and hierarchical clusterization, automatic summarisation, text comprehension methods and question answering.

Paweł Rychlikowski

Advanced Data Mining

This course focuses on advanced data mining algorithms for processing big, complex and unstructured data. It mainly concerns recommendation systems, dimensionality reduction with neighborhood embedding, temporal data mining and decision support systems. In recommendation systems, various approaches from simple collaborative filtering to advanced matrix factorization are presented and discussed in the context of their practical relevance, concerning not only the popular MSE or MAE measures, but also the coverage, diversity, and novelty of recommendations. In temporal data mining, beside the analysis of regular time series with machine learning methods, such as Support Vector Regression and Neural Networks, unstructured temporal data are studied. Student projects concern unstructured datasets, such as irregular multidimensional time series, GPS tracks or medical images.

Piotr Wnuk-Lipiński

Tools and methods in big data processing

This course covers the technical background useful in processing large amounts of data in a distributed environment. We discuss cloud computing basics and then introduce Hadoop Distributed File System (HDFS) architecture and MapReduce programming paradigm. The course studies in-depth Apache Spark and its ecosystem including Spark programming with Scala, Spark SQL, Spark graph processing framework GraphX, also TinkerPop and Gremlin traversals. Finally, some time is spent on stream processing technologies such as Apache Kafka and Spark Streaming.

Piotr Wieczorek

Theory of Analysis of Large Data Sets

In this course we will provide mathematical theory, which explains statistical problems in the analysis of high dimensional data. In the first part of the lecture we will concentrate on the generic problem of estimating the vector of means of multivariate normal distribution. We will discuss testing the global hypothesis that this vector is equal to zero and show a variety of methods which are optimal under different scenarios concerning the sparsity of this vector. We will also discuss the detectability curve, which provides the relationship between the sparsity and the magnitude of the elements of the vector of means, so that the signal can be identified with testing procedures. The course will also cover different procedures for multiple testing, Stein identity and James-Stein estimator of the vector of means. Finally, we will discuss the topic of selective inference and the application of the above ideas in the context of high dimensional multiple regression.

Małgorzata Bogdan

Neural Networks

Neural Networks course provides in-depth understanding of Artificial Neural Networks and Deep Learning methods, from both a theoretical and practical standpoint. The course will neural network fundamentals and provide details on most commonly used models: convolutional architectures used for image processing, recurrent networks for sequential data, and attention-based models used in language processing. We will also study modern data generation methods, such as variational auto-encoders and generative adversarial networks, concentrating on learning features that are useful in downstream data processing tasks. Accompanying lab sessions will provide hands-on experience with the material.

Jan Chorowski

Numerical programming tools and methods

Numerical programming tools and methods is an introduction to programming languages and frameworks used in data science and deep learning. Concentrating on Python ecosystem, the course teaches numpy internals, data wrangling with pandas, and efficient usage of hardware accelerators through deep learning frameworks. Techniques are illustrated using practical projects that involve physical simulations, digital signal processing, real-world data analysis and modeling using machine learning.

Jan Chorowski

Analysis of Complex Data. Jarosław Harezlak

Semiparametric regression. Jarosław Harezlak

Seminar: probabilistic graphical models

Team Project: TBA

Additional courses

Students may enrich their competences by taking additional courses that are not directly data science related, but could give them competitive edge in their future careers, providing them, for example, with unique skills in optimization or algorithms. While the students may freely choose any master-level courses taught at the department, the list below contains courses that are particularly suited for the data science students.

Additional courses

Algorithmic game theory. Jarosław Byrka

Algorithms on strings. Paweł Gawrychowski

Artificial intelligence in games. Jakub Kowalski

Approximation algorithms. Katarzyna Paluch

Category theory. Maciej Piróg

Combinatorial optimization. Katarzyna Paluch

Combinatorics. Grzegorz Stachowiak

Computational complexity. Krzysztof Loryś

Computational geometry. Paweł Gawrychowski

Cryptography. Grzegorz Stachowiak

Data compression. Tomasz Jurdziński

Distributed algorithms. Tomasz Jurdziński

Interactive theorem proving in Coq. Małgorzata Biernacka and Filip Sieczkowski

Introduction to simulations and Monte Carlo methods. Tomasz Rolski, Paweł Lorek

Online algorithms. Marcin Bieńkowski

Photorealistic computer graphics. Andrzej Łukaszewski

Program analysis. Witold Charatonik

Randomized algorithms. Marek Piotrów

Semantics of programming languages. Dariusz Biernacki and Filip Sieczkowski

Theory of linear and integer programming. Jarosław Byrka

UNIX kernel structure. Krystian Bacławski

Verification of programs. Małgorzata Biernacka and Witold Charatonik

Word equations. Artur Jeż

Contact

Department of Mathematics and Computer Science, University of Wrocław.

ul. Joliot-Curie 15

50-384 Wrocław

Program related questions: datascience@uwr.edu.pl

Administration/recruitment procedure related questions: international@uwr.edu.pl