Conventionally, reinforcement learning algorithms are designed for searching a single optimal policy given a reward function. When applying this policy to generate new samples, it can only generate the samples with the highest reward, resulting in the lack of diversity in these samples. To overcome this issue, Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates. In this talk, I will introduce the application of GFlowNet in generating molecular graphs, and its theoretical foundation based on Markovian flow.
Constrained optimization problems involved a constraint set and an objective function. In traditional convex optimization, we hope that the objective function is convex and smooth so that we can apply algorithms like gradient descent. While it is not the case in some problem formulations (e.g. maximum likelihood estimation for Poisson inverse problem). In optimization on manifolds, we view the constraint set as a Riemannian manifold, such that we can possibly utilize the geometrical structure to show that the objective function is "geodesically" convex or smooth. In the presentation, I will give a brief introduction to optimization on manifolds, and show the analysis of the convergence rate of "manifold version" gradient descent.
A lot of practical applications of Reinforcement Learning constraints agents learn from a fixed batch of data that has already been gathered. Due to extrapolation errors, some common off-policy algorithms, such as DQN, make them ineffective for this fixed batch setting. In this presentation, I will introduce a work, Off-Policy Deep Reinforcement Learning without Exploration. It proposes a novel class of off-policy algorithms, Batch-Constrained Reinforcement Learning, which restricts the action space to force the agent towards behaving close to on-policy for a subset of the given data.
Many problems require decision making entailing an exploration-exploitation trade off over a large (and possibly infinite) action space. That can be formulated as a kernel-based bandit problem, with ubiquitous applications in parameter tuning, scientific and industrial designs, recommender systems and sensor networks. The seminal work of Srinivas et al. (2010) — recipient of ICML 2020 Test of Time award — provided a finite-time analysis of the performance of an upper confidence bound algorithm in the kernel-based bandit problem. Inspired by their analysis, several other works provided regret bounds for various other algorithms. In addition to these fundamental results, there are recent advances establishing order optimal regret bounds. In this talk we will overview both fundamentals and recent advances. We will also hint at several emerging research directions on kernel-based bandit in conjunction with sparse approximation methods, neural networks, and reinforcement learning.
I will give a gentle introduction to the convergence of stochastic processes, following up with a brief description of SGD convergence: the classical settings, assumptions, and results. I will mention briefly what one can obtain with slightly stronger assumptions / weaker requirements on the speed of convergence. We will then look at how these results apply to prove a weak result for MAML. I will outline how MAML updates fail some of the previously stated assumptions, and discuss how one can still recover a result.
The contextual bandit problem is a generalization of the multi-armed bandit problem. At each iteration, the agent receives a d-dimensional feature vector which is related to the rewards of the arms. The agent learns to choose the arm based on this feature vector to maximize the reward. In this presentation, I will introduce a work, Contextual Bandits with Linear Payoff Functions, which proposes an algorithm for contextual bandits as well as its regret bound.
Most of the theoretical insights about Neural Networks are based on the study of the asymptotic limit where the networks can be described as Gaussian processes, in Physics this corresponds to non-interacting field theories. Moving away from the asymptotic limit yields a non-Gaussian process that corresponds to turning on particle interactions in the field, allowing for the computation of correlation functions of neural network outputs with Feynman diagrams. This allows a theoretical understanding of neural networks in terms of Wilsonian effective field theory.
Conditional Generative Adversarial Networks (cGANs) are implicit generative models which allow to sample from class-conditional distributions. Existing cGANs are based on a wide range of different discriminator designs and training objectives. One popular design in earlier works is to include a classifier during training with the assumption that good classifiers can help eliminate samples generated with wrong classes. Nevertheless, including classifiers in cGANs often comes with a side effect of only generating easy-to-classify samples. Recently, some representative cGANs avoid the shortcoming and reach state-of-the-art performance without having classifiers. Somehow it remains unanswered whether the classifiers can be resurrected to design better cGANs. In this work, we demonstrate that classifiers can be properly leveraged to improve cGANs. We start by using the decomposition of the joint probability distribution to connect the goals of cGANs and classification as a unified framework. The framework, along with a classic energy model to parameterize distributions, justifies the use of classifiers for cGANs in a principled manner. It explains several popular cGAN variants, such as ACGAN, ProjGAN, and ContraGAN, as special cases with different levels of approximations, which provides a unified view and brings new insights to understanding cGANs. Experimental results demonstrate that the design inspired by the proposed framework outperforms state-of-the-art cGANs on multiple benchmark datasets, especially on the most challenging ImageNet. The code is available at this https URL.
To protect networks from adversarial examples, verifying the robustness properties of neural networks becomes a critical problem in AI security. In recent years, previous works mainly focused on verifying the ReLU-based CNNs. However, the majority of network structures for the image classification(e.g., AlexNet, LeNet, VGG) contain maxpool-based CNNs architectures. In this talk, I will first review how the dual approach solves the verification problem. Then, I will present a novel decomposition idea to handle the maxpool function. This further helps us to verify the maxpool-based CNN. Finally, I will show the performance of our proposed method in terms of precision and time efficiency.
In this talk I will describe the machinery of overparametrized neural networks, explain what is the Neural Tangent Kernel and provide mathematical context to the notion "as number of the parameters in a network grows to infinity, the training behavior resembles kernel methods." This will be followed by a brief overview of approaches to error estimates for kernel methods and the implied results for Neural Tangent Kernels, including (if time permits) a recent work by MediaTek Research.
Model-free reinforcement learning algorithms, e.g., Q-Learning and policy gradient, are widely adopted for numerous applications. While these algorithms enjoy good empirical results, the theoretical guarantee on their performance remains to be investigated. In this presentation, we will introduce the first work to prove the convergence of Q-learning, i.e., sublinear regret. This analytical result provides a better understanding of why and how Q-learning converges.
The essence of many online learning problems is manifested in the exploration and exploitation tradeoff. That is to explore the actions to learn the underlying reward models while capitalizing on the information gathered so far to exploit the best actions. Bandits are a classic mathematical abstraction of this setting with tractable analysis. There have been several fundamental results in the performance analysis of various bandit algorithms such as epsilon-greedy, UCB and Thompson sampling, under different modelling assumptions of finite and infinite action sets (linear, convex, Lipschitz, or kernelized models) in the past decades. We overview these results at the Trends in AI Theory seminar series.