Submodularity in Data Science


Andreas Krause and I gave a 2-day long tutorial on Submodularity.


Many problems in Machine Learning and Data Science require solving large-scale discrete optimization problems under uncertainty. In the recent years, a fundamental problem structure has emerged as extremely useful for addressing such problems: Submodularity is an intuitive diminishing returns property, closely related to convexity, with similarly important implications for optimisation. Submodularity naturally occurs in numerous applications such as data summarisation, information retrieval, network analysis, active learning and experimental design, sparse modelling and inference in probabilistic models. Exploiting submodularity allows to devise efficient algorithms with strong theoretical guarantees. In this tutorial, I will give an introduction to the concept of submodularity and discuss basic algorithms for minimising and maximising submodular functions. I will also discuss current research directions, such as tackling sequential decision problems via online and adaptive submodular optimisation, learning submodular functions from data, and solving large-scale submodular optimisation problems in distributed and streaming settings, as well as applications in probabilistic inference and deep learning. A particular emphasis is on demonstrating their usefulness in solving concrete data science problems.