The BONSAI Lab :: Project Pitches

Project ideas for prospective students

This page contains a loosely-curated set of project ideas. Prospective students interested in doing a (1-2 semester long) project in the lab may pick something on this list OR propose a new project based on things we in the lab are interested in.

Extensions to a hybrid MCMC sampling + Variational Inference method

Background: Probabilistic inference is the heart of statistics and much of machine learning. Crack open any statistics or probabilistic machine learning book, and it will tell you that there are two rather different families of approaches to doing inference in practice: Markov Chain Monte Carlo (MCMC) methods and variational inference (VI) methods. In Dr. Lange's past work $^{[1]}$ , he and colleagues questioned the dichotomy between MCMC and VI and derived a continuum of inference algorithms which provably "interpolate" between MCMC-like behavior and VI-like behavior.

We would like for some motivated students to pick up the proverbial torch and extend this work in a variety of ways:

Project idea #1: for the more theoretically inclined, there are bounds in the 2022 paper which are conjectured but unproven. There is also some speculation on how they could be sharpened.
Project idea #2: a more empirical project would be to optimize the implementation of the theory using state of the art sampling techniques like Kinetic Langevin.
Project idea #3: making the method more 'wall time' efficient rather than just more 'sample efficient' may require adapting some stochastic-likelihood sampling techniques$^{[2]$.

Suggested background reading:

Identifying the sampling/variational tradeoff in neural data

In this Nature paper, Walker et al used neural networks to nonparametrically model the probabilistic decoding of stimulus information from a non-human primate brain. Their analysis assumed a static (variational) representation of proabability over time.

Separately, Dr. Lange's work on probabilistic inference (see also the previous blurb) suggests that there is a continuum of inference algorithms between stochastic time-varying algorithms (MCMC) and static optimizers (variational inference). This leads us to wonder: what happens if we go back and re-analyze Walker et al's data but allow for time-varying encoding? Could we quantify the extent to which the primate brain is 'more like sampling' or 'more like variational inference'?

Learning to learn from counterfactuals

In the causality literature, there is a distinction between passive observation, modeling interventions, and counterfactual reasoning (some information on this in this blog post by Dr. Lange). Imagine you're designing the optimal reinforcement-learning agent who will operate in a simple POMDP. Do you give the agent the ability to plan? This would require some intervention-level model so that the agent can reason about the future consequences of its actions. There are clear well-established benefits of planning in model-based reinforcement learning (MBRL). Do you also give the agent the ability to store episodic memories? Storage and replay of past events is also a well-established strategy for training reinforcement learning agents.

To a first approximation, we can think of counterfactual reasoning as a form of planning that is initialized from past episodic experiences, so the agent imagines hypothetical alternate outcomes of past events. We hypothesize that there are two high-level strategies than an 'optimal' agent could be designed: first, an agent might never store episodic memories. If planning is relatively cheap and they're a fast and effective learner, an agent could be entirely forward-looking in this sense. Second, an agent might actually benefit from learning from revisiting past experiences counterfactually. We suspect this happens when memory storage is cheap and the agent's model of the world is sufficiently wrong.

The project would be this: set up an agent in a simple reinforcement learning environment. Also give the agent meta-actions with associated meta-costs for storing episodic memory, planning, and pausing to reflect on the past counterfactually. Use classic reinforcement learning techniques to learn the optimal meta-policy over which meta-action to take. Then, test the hypothesis that different 'strategies' emerge depending on model accuracy and memory.

Inference in Reinforcement Learning with Stochastic Predictive Coding

This project would be co-advised by Dr. Alex Ororbia and Dr. Richard Lange.

Predictive coding is a mathematical framework for information processing in brains that emphasizes the kind bi-directional information flow seen in brains (unlike deep feedforward neural nets). Alex's past work has integrated PC with reinforcement learning. However, PC computes an approximate MAP value for each variable, which should in theory be worse than computing a full posterior over latent variables. This paper shows that adding just the right amount of 'noise' to predictive coding turns it into Langevin sampling, which is a form of posterior inference.

We want to know: what happens when Langevin dynamics are used in the reinforcement learning setting, combining the above two ideas?

Assessing self-consistency and miscalibration in tree-search planners

One of the strongest modern approaches to general reinforcement-learning problems involves Monte Carlo Tree Search (MCTS) deployed with a learned probabilistic policy and a learned value function. The role of the policy is to predict useful actions in the current state or $π (a | s)$ . The role of MCTS is to refine the policy through search and simulation. The result of running $N$ searches through the tree is an updated action distribution $π_{N} (a | s)$ . The open-question to be explored is on the relationship between $π$ and $π_{N}$ . Ideally (we conjecture), you could re-run the $N$ searches infinitely many times and get the self-consistency or calibration property that $π (a | s) = E [π_{N} (a | s)]$ . This at least seems like an intuitively useful property, and part of this project will be to explore whether or not that is true. Students might take different angles depending on whether they are more interested in mathematical proofs or coding and data analysis:

Theoretical angle: a more theoretically-inclinde student could make a project out of deriving regret bounds and decomposing those bounds into terms such as the state-action information and the miscalibration (although the exact form of these bounds and what terms would be involved would be part of the question).

Empirical angle: construct a simple turn-based environment and deliberately simple policy model family. Systematically vary the complexity of the policy model (e.g. number of parameters) and train each to convergence. Across many such models, empirically compute terms like state-action information and self-consistency through tree-search. Analyze the performance of these models at different search depths.

Improved single-camera 3D pose estimation with reprojection and filtering

Note: this project is taken by a Spring 26 capstone student

We'd like to improve on state-of-the-art Computer Vision systems for human pose estimation. Pose estimation is the problem of determining where different body eypoints (elbows, knees, eyes, etc) are in space. There are some freely available models like OpenPose and MediaPipe which you can download and use, but all such systems are known to make systematic errors.

The idea of this project would be to take an off-the-shelf pretrained keypoint detection system --- any available online which produces a heatmap for each keypoint --- and generate better poses from it by simply being a bit more clever about how model outputs are post-processed. While the 2D->3D problem is notoriously under-constrained in general, there are some simple constraints that should make pose inference in videos feasible. For instance, bones don't change length over time, so if the elbow appears to move closer to the shoulder, a system should in theory "know" that the arm is actually pointing towards (or away from) the camera.

The specific proposal is to combine the concept of "reprojection" with a constrained skeleton and heatmap-style keypoint models. Project sketch:

download and benchmark some open-source 3D pose estimation system (e.g. MediaPipe)
create or download a "biophysically realistic" set of constraints for how keypoints fit together. In other words, create a data structure which represents the contstraints of the human body like fixed bone lengths.
create the forward model which "reprojects" 3D skeleton points down to 2D
create an optimization or online inference algorithm (aka a "filtering" algorithm) to update 3D positions such that (i) biophysical constraints are not violated and (ii) reprojected points maximize likelihood according to the heatmap

Exploration of neural stitching

A common problem in neuroscience and neuro-AI is to compare the internal representations across brains and models (does your brain represent things similar to mine? similar to a deep neural network? are all DNNs similar to each other for that matter?). There are many tools to do this kind of comparison $^{[1, 2]}$ . Neural Stitching $^{[3]}$ is a relatively new family of methods that we're pretty excited about; it uses the downstream part of one network to interpret the upstream part of another.

Limited studies have been done exploring various aspects of stitching. The most complete study to date is perhaps [4]. There is a lot of work to do just playing in this space and seeing what stitching can do. Some things we're thinking about:

some studies stitch A to B, creating a AB model hybrid, then maximize the performance of the hybrid. None (to our knowledge) seeks to maximize the 'match' to the original behavior of A or B. We have code to do this but have not systematically evaluated a suite of models.
what happens when you stitch together models that were originally trained on different tasks?
what happens when the 'stitching layer' itself is constrained or regularized? (some of this done by [4] already)
can we speed up the stitching optimization problem? This is currently an experimental bottleneck.
how do we stitch across architectures, e.g. convolutional models into transformer models and vice versa?
can recurrent models or models with feedback connections be stitched? how?
can we create an easy-to-use and well-documented toolkit for performing these kinds of experiments? See here for our current system which is functional but tricky to work with and not well documented yet.

References:

Klabunde, Max, Tobias Schumacher, Markus Strohmaier, and Florian Lemmerich. 2023. “Similarity of Neural Network Models: A Survey of Functional and Representational Measures.” arXiv:2305.06329. Preprint, arXiv, August 6. https://doi.org/10.48550/arXiv.2305.06329.
Sucholutsky, Ilia, Lukas Muttenthaler, Adrian Weller, et al. 2023. “Getting Aligned on Representational Alignment.” arXiv:2310.13018. Preprint, arXiv, October 18. https://doi.org/10.48550/arXiv.2310.13018.
Bansal, Yamini, Preetum Nakkiran, and Boaz Barak. 2021. “Revisiting Model Stitching to Compare Neural Representations.” arXiv:2106.07682. Preprint, arXiv, June 14. https://doi.org/10.48550/arXiv.2106.07682.
Csiszárik, Adrián, Péter Kőrösi-Szabó, Ákos K. Matszangosz, Gergely Papp, and Dániel Varga. 2021. “Similarity and Matching of Neural Network Representations.” Paper presented at Advances in Neural Information Processing Systems. November 9. https://openreview.net/forum?id=aedFIIRRfXr.

Stochastic Variational AutoEncoders (v2.0)

Former MS student Shounak Desai wrote his thesis doing some initial exploration of Stochastic Variational AutoEncoders, a fundamental extension of VAEs to use stochastic recognition models, leveraging the hybrid MCMC+VI approaches mentioned in blurbs above. This is promising initial work but we need another student to pick up where Shounak left off.