Guido Montúfar
UCLA Department of Mathematics and Department of Statistics
Los Angeles, CA, USA, 900951555


Short CV
Research Group Leader, ERC Project Deep Learning Theory, Max Planck Institute for Mathematics in the Sciences.
Postdoc, Max Planck Institute for Mathematics in the Sciences, Information Theory of Cognitive Systems Group.
(06/2013  09/2017)
PhD in Mathematics, MPI MIS, Leipzig University
(10/2012).
Research Assistant, Institute for Theoretical Physics, TUBerlin
(03/2008  12/2008)
Diplom Physiker, TUBerlin
(12/2008)
Teaching Assistant, Institute for Mathematics, TUBerlin
(03/2006  02/2008)
Diplom Mathematiker, TUBerlin
(08/2007)

Research Interests
Deep Learning Theory
Neural Networks
Graphical Models
Information Geometry
Algebraic Statistics
Wasserstein Information Geometry
Events
AIM Workshop Boltzmann Machines Sep 2018


Optimization Theory for ReLU Neural Networks Trained with Normalization Layers.
Yonatan Dukler, Quanquan Gu, Guido Montufar. To be presented at Thirtyseventh International Conference on Machine Learning (ICML 2020). Preprint [arXiv:2006.06878].
Haar Graph Pooling.
Yu Guang Wang, Ming Li, Zheng Ma, Guido Montufar, Xiaosheng Zhuang, Yanan Fan. To be presented at Thirtyseventh International Conference on Machine Learning (ICML 2020). Preprint [arXiv:1909.11580].
Kernelized Wasserstein Natural Gradient.
Michael Arbel, Arthur Gretton, Wuchen Li, Guido Montufar. In International Conference on Learning Representations 2020 (ICLR 2020). Preprint [arXiv:1910.09652].
Optimal Transport to a Variety.
T. O. Celik, A. Jamneshan, G. Montufar, B. Sturmfels, L. Venturello. Mathematical Aspects of Computer and Information Sciences 2019 (MACIS 2019). Preprint [arxiv:1909.11716].
Wasserstein of Wasserstein loss for learning generative models.
Y. Dukler, W. Li, A. Lin, and G. Montufar. Proceedings of ICML 36, PMLR 97:17161725, 2019. [BibTex]. Preprint [MPI MIS preprint].
Affine Natural Proximal Learning.
Wuchen Li, Alex Lin and Guido Montufar. In Geometric Science of Information. GSI 2019. Lecture Notes in Computer Science, vol 11712, pp 705714, 2019. Preprint [RG].
Natural Gradient via Optimal Transport.
W. Li and G. Montufar. Information Geometry, vol 1, issue 2, pp 181214, 2018. Preprint [arXiv 1803.07033].
Restricted Boltzmann Machines: Introduction and Review.
G. Montufar. Information geometry and its applications, pp 75115, 2018. [BibTex]. Preprint [arXiv 1806.07066].
Mixtures and Products in two Graphical Models.
A. Seigal and G. Montufar. Journal of Algebraic Statistics, vol 9 no 1, 2018. Preprint [arXiv 1709.05276].
Computing the Unique Information.
P. K. Banerjee, J. Rauh, and G. Montufar. IEEE International Symposium on Information Theory (ISIT), pages 141145, 2018. Preprint [arXiv 1709.07487].
Geometry of Policy Improvement.
G. Montufar and J. Rauh. In Geometric Science of Information 2017, LNCS vol 10589, pages 282290, Springer, 2017. [BibTex]. Preprint [arXiv 1704.01785].
Morphological Computation: The good, the bad, and the ugly.
K. GhaziZahedi, R. Deimel, G. Montufar, V. Wall, and O. Brock. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 464469, 2017. [BibTex]. Preprint [RG].
Dimension of Marginals of Kronecker Product Models.
G. Montufar and J. Morton. SIAM Journal on Applied Algebra and Geometry (SIAGA), 1(1):126151, 2017. [BibTex]. Preprint [MPI MIS 75/2015], [arXiv 1511.03570]. Supplement [JacobianKronecker.m].
Hierarchical Models as Marginals of Hierarchical Models.
G. Montufar and J. Rauh. International Journal of Approximate Reasoning, 88:531546, 2017. [BibTeX]. Workshop version WUPES 2015, pp 131145. Preprint [MPI MIS 27/2016], [arXiv 1508.03606]. Supplement [starcover.m].
Mode Poset Probability Polytopes.
G. Montufar and J. Rauh. Journal of Algebraic Statistics, 7(1):113, 2016. [BibTeX]. Workshop version WUPES 2015, pp 147154. Preprint [MPI MIS 22/2015], [arXiv 1503.00572]
Evaluating Morphological Computation in Muscle and DCmotor Driven Models of Hopping Movements.
K. GhaziZahedi, D. Haeufle, G. Montufar, S. Schmitt, and N. Ay. Frontiers in Robotics and AI 3(42):frobt.2016.00042, 2016. [BibTeX]. Preprint [arXiv 1512.00250]
A Theory of Cheap Control in Embodied Systems.
G. Montufar, K. Zahedi, and N. Ay. PLoS Comput Biol 11(9):e1004427, 2015. [BibTeX]. Preprint [MPI MIS 70/2014], [arXiv 1407.6836]
Geometry and Expressive Power of Conditional Restricted Boltzmann Machines.
G. Montufar, N. Ay, and K. Zahedi. JMLR 16(Dec):24052436, 2015. [BibTeX]. Preprint [MPI MIS 16/2014], [arXiv 1402.3346]
Discrete Restricted Boltzmann Machines.
G. Montufar and J. Morton. JMLR 16(Apr):653672, 2015. [BibTeX]. Conference version ICLR 2013. Preprint [MPI MIS 106/2014], [arXiv 1301.3529]
When Does a Mixture of Products Contain a Product of Mixtures?
G. Montufar and J. Morton. SIAM Journal on Discrete Mathematics (SIDMA), 29(1):321347, 2015. [BibTeX]. Preprint [MPI MIS 98/2014], [arXiv 1206.0387]
Deep Narrow Boltzmann Machines are Universal Approximators.
G. Montufar. In Third International Conference on Learning Representations (ICLR 2015). [BibTeX]. Preprint [MPI MIS 113/2014], [arXiv 1411.3784]
On the Number of Linear Regions of Deep Neural Networks.
G. Montufar, R. Pascanu, K. Cho, and Y. Bengio. NIPS 27, pp. 29242932, 2014. [BibTeX]. Preprint [MPI MIS 73/2014], [arXiv 1402.1869]
On the Number of Response Regions of Deep Feedforward Networks with Piecewise Linear Activations.
R. Pascanu, G. Montufar, and Y. Bengio. In Second International Conference on Learning Representations (ICLR 2014). [BibTeX]. Preprint [MPI MIS 72/2014], [arXiv 1312.6098]
On the Fisher Information Metric of Conditional Probability Polytopes.
G. Montufar, J. Rauh, and N. Ay. Entropy 16(6):32073233, 2014. [BibTeX]. Preprint [MPI MIS 87/2014], [arXiv 1404.0198]
Scaling of Model Approximation Errors and Expected Entropy Distances.
G. Montufar and J. Rauh. Kybernetika 50(2):234245, 2014. [BibTeX]. Workshop version WUPES 2012, pp. 137148. Preprint [arXiv 1207.3399]
Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units.
G. Montufar. Neural Computation 26(7):13861407, 2014. [BibTeX]. Preprint [MPI MIS 74/2014], [arXiv 1303.7461]
Universally Typical Sets for Ergodic Sources of Multidimensional Data.
T. Krüger, G. Montufar, R. Seiler, and R. SiegmundSchultze. Kybernetika 49(6):868882, 2013. [BibTeX]. Preprint [MPI MIS 20/2011], [arXiv 1105.0393]
Mixture Decompositions of Exponential Families Using a Decomposition of their Sample Spaces.
G. Montufar. Kybernetika 49(1):2339, 2013. [BibTeX]. Preprint [MPI MIS 39/2010], [arXiv 1008.0204]
Maximal Information Divergence from Statistical Models defined by Neural Networks.
G. Montufar, J. Rauh, and N. Ay. In Geometric Science of Information LNCS Vol. 8085, pp 759766, 2013. [BibTeX]. Preprint [MPI MIS 31/2013], [arXiv 1303.0268]
Selection Criteria for Neuromanifolds of Stochastic Dynamics.
N. Ay, G. Montufar, J. Rauh. In Advances in Cognitive Neurodynamics (III), pp 147154, 2013. [BibTeX]. Preprint [MPI MIS 15/2011]
Expressive Power and Approximation Errors of Restricted Boltzmann Machines.
G. Montufar, J. Rauh, and N. Ay. NIPS 24, pp. 415423, 2011. [BibTeX]. Preprint [MPI MIS 27/2011], [arXiv 1406.3140]
Refinements of Universal Approximation Results for Restricted Boltzmann Machines and Deep Belief Networks.
G. Montufar and N. Ay. Neural Computation 23(5):13061319, 2011. [BibTeX]. Preprint [MPI MIS 23/2010], [arXiv 1005.1593] 

Wasserstein Diffusion Tikhonov Regularization.
Alex Tong Lin, Yonatan Dukler, Wuchen Li, and Guido Montufar. Presented at Optimal Transport and Machine Learning Workshop, NeurIPS 2019. Preprint [arxiv:1909.06860], [RG].
TaskAgnostic Constraining in Average Reward POMDPs.
G. Montufar, J. Rauh, N. Ay. Presented at TaskAgnostic Reinforcement Learning Workshop, ICLR 2019. Preprint [RG].
A continuity result for optimal memoryless planning in POMDPs.
J. Rauh, N. Ay, G. Montufar. Presented at The 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM) 2019. Preprint [RG].
Uncertainty and Stochasticity of Optimal Policies.
G. Montufar, J. Rauh, N. Ay. Proceedings of the 11th Workshop on Uncertainty Processing (WUPES), pp 133140, 2018. Preprint [RG].
Notes on the number of linear regions of deep neural networks.
G. Montufar. Sampling Theory and Applications (SampTA), 2017. Preprint [RG].
Stochasticity of optimal policies for POMDPs.
G. Montufar, K. GhaziZahedi, N. Ay. Reinforcement Learning and Decision Making (RLDM), 2017.
A Comparison of Neural Network Architectures.
G. Montufar. Deep Learning Workshop, ICML 2015. [pdf]. Preprint [RG].
Kernels and Submodels of Deep Belief Networks.
G. Montufar and J. Morton. Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2012. Preprint [arXiv 1211.0932]
Mixture Models and Representational Power of RBMs, DBNs and DBMs.
G. Montufar. Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2010. [pdf]. Preprint [RG]. 

Implicit bias of gradient descent for mean squared error regression with wide neural networks.
H. Jin and G. Montufar. Preprint [arXiv:2006.07356].
Stochastic Feedforward Neural Networks: Universal Approximation.
T. Merkh and G. Montufar. Preprint [arXiv:1910.09763], [RG preprint].
How Well Do WGANs Estimate the Wasserstein Metric?.
Anton Mallasto, Guido Montufar, Augusto Gerolin. Preprint [arXiv:1910.03875].
Factorized Mutual Information Maximization.
T. Merkh and G. Montufar. Preprint [arxiv:1906.05460], [RG preprint].
Wasserstein proximal of GANs.
A. Lin, W. Li, S. Osher, and G. Montufar. Preprint [CAM report 1853], [RG.2.2.30713.52320]. Poster [pdf].
Information Theoretically Aided Reinforcement Learning for Embodied Agents.
G. Montufar, K. GhaziZahedi, and N. Ay. Preprint [arXiv:1605.09735].
Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes.
G. Montufar, K. GhaziZahedi, and N. Ay. Preprint [MPI MIS 22/2016], [arXiv:1503.07206].
Sequential RecurrenceBased Multidimensional Universal Source Coding of LempelZiv Type.
T. Krueger, G. Montufar, R. Seiler, and R. SiegmundSchultze. Preprint [MPI MIS 86/2014], [arXiv:1408.4433].
Universal Approximation of Markov Kernels by Shallow Stochastic Feedforward Networks.
G. Montufar. Preprint [MPI MIS 23/2015], [arXiv:1503.07211]. 

On the Expressive Power of Discrete Mixture Models, Restricted Boltzmann Machines, and Deep Belief Networks—A Unified Mathematical Treatment.
PhD Thesis, Leipzig University, October 2012. Supervisor: N. Ay. [pdf] (14.4 MB, 155 pages, 30 figures)
Theory of Transport and PhotonStatistics in a Biased Nanostructure.
German Diplom in Physics, Institute for Theoretical Physics, TUBerlin, December 2008. Supervisor: A. Knorr and T. Brandes.
QSanov Theorem for d ≥ 2.
German Diplom in Mathematics, Institute for Mathematics, TUBerlin, August 2007. Supervisor: R. Seiler and J.D. Deuschel. 

Nina Otter. CAM Adjunct Assistant Professor UCLA. Personal.
Pradeep Banerjee. Research Scientist MPI MIS. Google Scholar.
YuGuang Wang. Research Scientist MPI MIS. Personal.
Thomas Merkh. PhD candidate UCLA Math. Personal.
Yonatan Dukler. PhD candidate UCLA Math. Personal.
Hui Jin. PhD candidate UCLA Math. Personal.
Hanna Tseran. PhD candidate MPI MIS.
Johannes Mueller. PhD candidate MPI MIS.


Stats 200A  Applied Probability, Fall 2020.
Stats 231C  Theories of Machine Learning, Spring 2020.
Stats 200A  Applied Probability, Fall 2019.
Stats 231C  Theories of Machine Learning, Spring 2019.
Math 164  Optimization, Fall 2018.
Stats 270  Mathematical Machine Learning, Spring 2018.
Math 285J  Applied Mathematics Seminar  Deep Learning Topics, Winter 2018.
Introduction to the Theory of Neural Networks, MS/PhD Lecture, Summer Term 2016, Leipzig University and MPI MIS.
Geometric Aspects of Graphical Models and Neural Networks, with N. Ay, [Abstract], MS/PhD Lecture, Winter Term 2014/2015, Leipzig University and MPI MIS.


Speaker at Algebraic Statistics Session at COMPSTAT2020, Bologna, August 2020. (Postponed due to COVID19 crisis)
Speaker at Workshop Optimal Transport, Topological Data Analysis and Applications to Shape and Machine Learning, MBI, Ohio State University, USA, July 2020. (Online)
Keynote Speaker at Algebraic Statistics 2020, University of Hawai'i at Manoa Honolulu, HI, USA, June 2020. (Postponed due to COVID19 crisis)
Keynote Speaker at The 5'th International IEEE CVPR Workshop on Differential Geometry in Computer Vision and Machine Learning, June 2020. (Online)
Invited talk at Summer School and Workshop on the Foundations of Graph and Deep Learning, Mathematical Institute for Data Science (MINDS), Johns Hopkins University, Baltimore MD, May 2020. (Postponed due to COVID19 crisis)
Invited minicourse (5 days) at Spring School on Mathematics of Data, , Hanoi, Vietnam, March 2020. (Canceled due to COVID19 crisis)
Invited Speaker at UseData 2019 Data Science Conference, Moscow, Russia, September 2019.
Plenary Speaker at 2019 Workshop in memory of Frantisek Matus, Institute of Information Theory and Automation Academy of Sciences of the Czech Republic, Prague, Czech Republic, August 2019.
Wasserstein Information Geometry for Learning from Data, Optimal Transport for Nonlinear Problems, ICIAM, Valencia, Spain, July 2019.
Markov Kernels with Deep Graphical Models, Latent Graphical Models, SIAM AG, Bern, July 2019.
Computing the Unique Information, 1st Workshop on Semantic Information, CVPR 2019, Long Beach, June 2019.
Wasserstein Information Geometry for Learning from Data, Tutorial, Geometry and Learning from Data in 3D and Beyond, IPAM, LA, March 2019. [talk at GLTUT IPAM].
Representation, Approximation, Optimization advances for Restricted Botlzmann Machines, 7th International conference on computational harmonic analysis, Vanderbilt University, May 2018.
A theory of cheap control in embodied systems, Random Structures in Neuroscience and Biology, Herrsching, Germany, March 2018.
Graphical models with hidden variables, Systematic Approaches to Deep Learning Methods for Audio, Vienna, Austria, September 2017.
On the Fisher metric of conditional probability polytopes, Geometry of Information for Neural Networks, Machine Learning, Artificial Intelligence, Topological and Geometrical Structures of Information, CIRM Marseille, France, August 2017.
Notes on the number of linear regions of deep neural networks, Mathematics of Deep Learning, Special Session at International Conference on Sampling Theory, Tallin, Estonia, July 2017.
Learning with neural networks, Tutorial, Training Networks, Signal Processing with Adaptive Sparse Structured Representations, Lisbon, Portugal, June 2017.
Geometric and Combinatorial Perspectives on Deep Neural Networks, Theory of Deep Learning, ICML 2016, New York, USA, June 2016.
Mode Poset Probability Polytopes, WUPES'15, Moninec, Czech Republic, September 18, 2015.
Hierarchical models as marginals of hierarchical models, WUPES'15, Moninec, Czech Republic, September 17, 2015.
Confining bipartite graphical models by simple classes of inequalities, Special Topics Session Algebraic and Geometric Approaches to Graphical Models, 60th World Statistics Congress  ISI 2015, Rio de Janeiro, Brazil, July 31, 2015.
Information Divergence from Statistical Models Defined by Neural Networks, Workshop: Information Geometry for Machine Learning, RIKEN BSI, Japan, December 2014.
Geometry of HiddenVisible Products of Statistical Models, Joint Workshop on Limit Theorems and Algebraic Statistics, UTIA, Prague, August 2529, 2014.
Maximal Information Divergence from Statistical Models defined by Neural Networks, GSI 2013, Mines ParisTech, Paris, France, August 29, 2013.
Discrete Restricted Boltzmann Machines, ICLR2013, Scottsdale, AZ, USA, May 2, 2013.
When Does a Mixture of Products Contain a Product of Mixtures?, Tensor network states and algebraic geometry, ISI Foundation, Torino, Italy, November 0608, 2012.
Scaling of Model Approximation Errors and Expected Entropy Distances, WUPES'12, Mariánské Lázně, Czech Republic, September 13, 2012.
Simplex packings of marginal polytopes and mixtures of exponential families, SIAM Conference on Discrete Mathematics (DM 2012), Dalhousie University, Halifax, Nova Scotia, Canada, June 1821, 2012.
On Secants of Exponential Families, Algebraic Statistics in the Alleghenies, Penn State, PA, USA, June 0815, 2012.
Geometry of Restricted Boltzmann Machines Towards Geometry of Deep Belief Networks, RIKEN Workshop on Information Geometry, RIKEN BSI, Japan, August 31, 2011.
Selection Criteria for Neuromanifolds of Stochastic Dynamics, The 3rd International Conference on Cognitive Neurodynamics, Niseko Village, Hokkaido, Japan, June 12, 2011.
Information Geometry of MeanField Methods, Fall School on Statistical Mechanics and 5th annual PhD Student Conference in Probability, MPI MIS, Leipzig, Germany, September 0712, 2009.
QuantumSanovTheorem for correlated States in multidimensional Grids, Dies Mathematicus, TUBerlin, Germany, February 2008.
QuantenSanovTheorem im mehrdimensionallen Fall, Workshop on Complexity and Information Theory, MPI MIS, Leipzig, Germany, October 2007.


On the dynamics and convergence of weight normalization training of neural networks, Deep Learning Seminar, University of Vienna, Austria, February 2020.
On the dynamics and convergence of weight normalization training of neural networks, Mathematisches Kolloquium, Bielefeld University, Germany, January 2020.
Information Geometry in Optimization and Regularization of Generative and Discriminative Models, Renyi Insitute, Budapest, Hungary, August 2019.
Mixtures and products in two graphical models, USC Probability and Statistics Seminar, USC, April 2018.
Mode poset probability polytopes, Combinatorics Seminar (Igor Pak), UCLA, February 2018.
Uncertainty and Stochasticity in POMDPs, Machine Learning Seminar (Wilfrid Gangbo), UCLA, February 2018.
Mixtures and products in two graphical models, Level Set Collective (Stanley Osher), IPAM, November 2017.
Neural networks for cheap control of embodied behavior, Peking University (Jinchao Xu), Beijing, China, July 2017.
Selected Topics in Deep Learning, Short Course, Beijing Institute for Scientific Computing, Beijing, China, July 2017.
Dimension of Marginals of Kronecker Product Models, Seminar on NonLinear Algebra, TUBerlin, Germany, November 2016.
Artificial Intelligence Overview, LikBez Seminar, MPI MIS, January 2016.
Geometric Approaches to the Design of Embodied Learning Systems, Special Symposium on Intelligent Systems, MPI for Intelligent Systems, Tuebingen, Germany, March 2016.
A Theory of Cheap Control in Embodied Systems, Montreal Institute for Learning Algorithms (MILA), University of Montreal, Canada, December 2015.
Dimension of restricted Boltzmann machines, Department of Mathematics & Statistics, York University, Toronto, Canada, December 2015.
Sequential RecurrenceBased Multidimensional Universal Source Coding, Dynamical Systems Seminar, MPI MIS, November 2015.
Cheap Control of Embodied Systems, Aalto Science Institute, Espoo, Finland, November 2015.
On the Number of Linear Regions of Deep Neural Networks, Montreal Institute for Learning Algorithms (MILA), Université de Montréal, Montreal, Canada, December 15, 2014.
Geometry of Deep Neural Networks and Cheap Design for Autonomous Learning, Google DeepMind, London, UK, October 2014.
How size and architecture determine the learning capacity of neural networks, SFI Seminar, Santa Fe, NM, USA, October 23, 2013.
Naive Bayes models, Seminario de Postgrado en Ingenieria de Sistemas, Universidad del Valle, Santiago de Cali, Colombia, May 30, 2013.
On the Expressive Power of Discrete Mixture Models, Restricted Boltzmann Machines, and Deep Belief Networks—A Unified Mathematical Treatment, PhD thesis defense, Leipzig University, October 17, 2012.
Scaling of model approximation errors and expected entropy distances, Stochastic Modelling and Computational Statistics Seminar (Murali Haran), Penn State, PA, USA, October 11, 2012.
Universally typical sets for ergodic sources of multidimensional data, Seminar on probability and its applications (Manfred Denker), Penn State, PA, USA, October 05, 2012.
Multivalued Restricted Boltzmann Machines, [Abstract], MPI MIS, Leipzig, Germany, September 19, 2012.
Approximation Errors of Deep Belief Networks, Applied Algebraic Statistics Seminar, Penn State, PA, USA, February 08, 2012.
Submodels of Deep Belief Networks, [Abstract], Berkeley Algebraic Statistics Seminar, UC Berkeley, CA, USA, December 07, 2011.
Geometry and Approximation Errors of Restricted Boltzmann Machines, The 5th Statistical Machine Learning Seminar, Institute of Statistical Mathematics, Tachikawa, Tokyo, Japan, September 02, 2011.
On Exponential Families and the Expressive Power of Related Generative Models, [Abstract], Laboratoire d'Informatique des Systèmes Adaptatifs (LISA), Université de Montréal, Canada, March 14, 2011.
Mixtures from Exponential Families, Neuronale Netze und Kognitive Systeme Seminar, MPI MIS, Leipzig, Germany, March 02, 2011.
Universal approximation results for Restricted Boltzmann Machines and Deep Belief Networks, Neuronale Netze und Kognitive Systeme Seminar, MPI MIS, Leipzig, Germany, February 16, 2011.
Necessary conditions for RBM universal approximators, Meeting of the Department of DecisionMaking Theory  Institute of Information Theory and Automation UTIA, Marianska, Czech Republic, January 18, 2011.


A comparison of neural network architectures, Deep Learning Workshop, ICML 2015.
Mode Poset Probability Polytopes, [pdf], Algebraic Statistics 2015, Department of Mathematics University of Genoa, Italy, June 811, 2015.
A Framework for Cheap Universal Approximation in Embodied Systems, Autonomous Learning: 3. Symposium DFG Priority Programme 1527, Berlin, September 89, 2014.
Geometry of hiddenvisible products of statistical models, [pdf], Algebraic Statistics at IIT, Chicago, IL, 2014.
When Does a Mixture of Products Contain a Product of Mixtures, [Abstract], NIPS 2012  Deep Learning and Unsupervised Feature Learning Workshop.
Kernels and Submodels of Deep Belief Networks, [Abstract], NIPS 2012  Deep Learning and Unsupervised Feature Learning Workshop.
Mixture Models and Representational Power of RBMs, DBNs and DBMs, NIPS 2010  Deep Learning and Unsupervised Feature Learning Workshop, Whistler, Canada.
Faces of the probability simplex contained in the closure of an exponential family and minimal mixture representations, Information Geometry and its Applications III, Leipzig, Germany, 2010.


Johannes Rauh and Pradeep Banerjee are giving talks on our joint works at the
WIAS Workshop on Mathematics of Deep Learning, Berlin, December 2019.
Together with Wuchen Li we are organizing the Wasserstein Information Geometry special session at GSI 2019, Tolouse, France, August 2019.
I am participating at the National Workshop on Data Science Education, UC Berkeley, CA, USA, June 2019.
Together with Joan Bruna, Yuguao Wang,
Nina Otter, and Zheng Ma, we had the ICERM Collaborate Group Geometry of Data and Networks, Institute for Computational and Experimental Research in Mathematics, Providence, RI, USA, June 2019.
I am a coorganizer of the Geometry of Data and Learning in 3D and Beyond, IPAM Long Program, Institute for Pure and Applied Mathematics, Los Angeles, CA, USA, March  June 2019.
We had a fantastic Deep Learning Theory Kickoff Meeting at the Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany, March 2019.
Asja Fischer, Jason Morton, and I are organizing the AIM Workshop Boltzmann Machines, American Institute of Mathematics, San Jose, CA, USA, September 2018.
Together with Christiane Goergen, Nihat Ay, and Andre Uschmajew, I am a coinitiator of the Math of Data Initiative, Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
I am giving a talk at the SIAM Annual Meeting 2018: Statistics MiniSimposium, Portland, Oregon, July 2018.
Transilvanian Machine Learning Summer School, ClujNapoca, Romania, 1622 July 2018.
Asja Fischer and I are organizing the Theory of Deep Learning Workshop, DALI 2018, Lanzarote, Spain, April 2018.
Random Structures in Neuroscience and Biology, Herrsching, March 2629, 2018.
NIPS 2017, Long Beach, CA, December 2017.
Geometric Science of Information, Paris, November 2017.
Systematic Approaches to Deep Learning Methods for Audio, Vienna, Austria, September 2017.
ICML 2017, Principled Approaches to Deep Learning, Program Committee, Sydney, Australia, August 2017.
Geometry of Information for Neural Networks, Machine Learning, Artificial Intelligence, Topological and Geometrical Structrues of Information, CIRM Marseille, France, August 2017.
Mathematics of Deep Learning, Special Session at International Conference on Sampling Theory, Tallin, Estonia, July 2017.
Training Networks, Signal Processing with Adaptive Sparse Structured Representations, Lisbon, Portugal, June 2017.
Deep Neural Netwroks: Theory and Application, Minisimposium at Applied Inverse Problems, Hangzhou, China, May 2017.
Oberwolfach Workshop Algebraic Statistics, Mathematisches Forschungsinstitut Oberwolfach, Germany, April 2017.
Santa Fe Institute, Visit for Research Collaboration (Nihat Ay), Santa Fe, NM, USA, October 2016.
ICML 2016, New York, USA.
IGAIA IV 2016, Liblice, Czech Republic.
NIPS 2015, Montréal, Canada.
WUPES'15, Monínec, Czech Republic.
60th World Statistics Congress  ISI 2015, Rio de Janeiro, Brazil.
ICML 2015, Lille, France.
Algebraic Statistics 2015, Genova, Italy.
NIPS 2014, Montréal, Canada.
Workshop: Information Geometry for Machine Learning, RIKEN BSI, Japan, December 2014.
Santa Fe Institute, Visit for Research Collaboration (Nihat Ay), Santa Fe, NM, USA, October 15November 15, 2014.
Information Geometry in Learning and Optimization, University of Copenhagen, September 2226, 2014.
Autonomous Learning: 3. Symposium DFG Priority Programme 1527, Magnus Haus Berlin, Germany, September 0809, 2014.
Autonomous Learning: Summer School, MPI MIS, September 0104, 2014.
Joint Workshop on Limit Theorems and Algebraic Statistics, UTIA, Prague, Czech Republic, August 2529, 2014.
Algebraic Statistics at IIT, Chicago, IL, USA, May 2014.
Santa Fe Institute, Visit for Research Collaboration (Nihat Ay), October 127, 2013.
SFI Working Group ``Information Theory of Sensorimotor Loops'', Santa Fe Institute, Santa Fe, NM, USA, October 811, 2013.
Pennsylvania State University, Visit for Research Collaboration (Jason Morton), PA, USA, September 2013.
GSI 2013, Paris, August 2830, 2013.
ICLR2013, Scottsdale, AZ, USA, May 24, 2013.
NIPS 2012, Deep Learning Workshop, Lake Tahoe, NV, USA, December 78, 2012.
Algebraic Statistics in Europe, IST Austria, September 2830, 2012.
WUPES'12, Marianske Lazne, Czech Republic, September 1215, 2012.
Graduate Summer School: Deep Learning, Feature Learning, IPAM  UCLA, Los Angeles, CA, USA, July 927, 2012.
SIAM Conference on Discrete Mathematics 2012, Dalhousie University, Halifax, Nova Scotia, Canada, June 1821, 2012.
Algebraic Statistics in the Alleghenies, Penn State, PA, USA, June 0815, 2012.
Singular Learning Theory, AIM Workshop, American Institute of Mathematics, Palo Alto, CA, USA, December 1216, 2011.
RIKENBSI, Laboratory for Mathematical Neuroscience (Prof. S. Amari), Internship, Hirosawa, Wako, Saitama, Japan, AugustOctober 2011.
SFI Complex Systems Summer School (CSSS11), Saint John's College, Santa Fe, NM, USA, June 8July 1, 2011.
Third International Conference of Cognitive Neurodynamics (ICCN 2011), Niseko Village, Hokkaido, Japan, May 2011.
Information Geometry and its Applications (IGAIA III), Leipzig University, Germany, August 2010.

When Does a Mixture of Products Contain a Product of Mixtures?
SIAM Journal on Discrete Mathematics, 29(1):321347, 2015. 
Ordinary and natural gradient on a 2dim surface of stochastic matrices.
Selection Criteria for Neuromanifolds of Stochastic Dynamics, 2011. 
A Theory of Cheap Control in Embodied Systems
PLOS CB, September 1, 2015. 