Hello, and welcome to MDAnalysis!

Please read our blog post for important official information.

Please see [[our Google Summer of Code wiki page|Google Summer of Code]] for general information, including advice on application writing, and [[our GSoC FAQ|GSoC-FAQ]] for commonly asked questions.

If you just found out about the MDAnalysis Python package from the GSoC website, you can watch the MDAnalysis 2021 Trailer to get an overview of the scope of the MDAnalysis package.

Prerequisites

MDAnalysis is a Python library for the analysis of computer simulations of many-body systems at the molecular scale, spanning use cases from interactions of drugs with proteins to novel materials. For Google Summer of Code, we are also collaborating with other organizations and software projects that use MDAnalysis. Our GSoC projects generally require a basic knowledge and hands-on experience in specific areas, so for our suggested projects, please check carefully the project descriptions to see the associated desirable skills. Broadly speaking, we found that applicants with experience in molecular dynamics (MD) simulations and the associated analyses — or equivalent experience in simulations and modeling of molecular systems (physics, biophysics, chemistry, or materials) — are very successful.

To Prospective Applicants

If you are interested in taking part, please get in touch on the GSoC with MDAnalysis Discussion Forum. Given the GSoC program structure (small, medium, and large projects), letting us know of your intentions to apply and getting acquainted with the project early will be very helpful.

To Prospective Mentors

MDAnalysis welcomes new mentors; please get in touch on the developer forum if you are interested in taking part. We typically expect mentors to be familiar with our development process, as evidenced by contributions to the code base and interactions on the developer forum.

Overview

See below for a list of projects ideas for [[Google Summer of Code 2026|Google-Summer-Of-Code]].

The currently proposed projects are:

Dashboard for tracking MD simulation progress with the new streaming interface
Better interfacing of Blender and MDAnalysis
Benchmarking and performance optimization
Lazy trajectory loading and indexing
Dashboard for tracking WESTPA simulation progress
Interface for post-simulation analysis ("crawling") of WESTPA simulations

Or work on your own idea! Get in contact with us to propose an idea and we will work with you to flesh it out into a full project. Contact us via the GSoC with MDAnalysis Discussion Forum (or if your project is a specific feature you'd want to add, raise an issue in the Issue Tracker).

Look at the list of all available mentors for MDAnalysis for potential mentors for your project. Please send all communications to the discussion forum (and don't contact mentors privately). You can certainly ask for the opinion of a specific mentor if you know that their expertise is particularly suitable for your project.

Collaborations

For GSOC, MDAnalysis collaborates with other projects that have direct links with MDAnalysis and where it is especially useful to draw on the combined mentoring expertise.

Molecular Nodes

Built as an add-on for the popular and industry-leading 3D software Blender, Molecular Nodes (MN) enables import and visualization of complex molecular datasets inside of Blender. Many formats are supported such as static structures, electron density maps, EM (electromagnetic) tomography data and importantly, molecular dynamics simulation trajectories, powered by MDAnalysis. Blender is primarily intended for use via a GUI by artists, but scripting via the python API is also possible, with many potential avenues for automated animation and rendering.

A great overview of the project is the talk given at the Blender Conference in 2022. The MN documentation includes a lot of information about how to get started, including written and video tutorials, with one specific to MD trajectory import. Workshop materials are also publicly available for an online Introduction to MDAnalysis and Molecular Nodes workshop held in February 2024, which includes an interactive tutorial for visualizing imported MDAnalysis data in Molecular Nodes.

It's important to first familiarize yourself with using Blender and Molecular Nodes via a GUI and some of the quirks that go along with it, before trying to write code for it.

WESTPA

WESTPA (The Weighted Ensemble Simulation Toolkit with Parallelization and Analysis) is a high-performance Python framework for applying the weighted ensemble (WE) path sampling strategy, which enables simulations of processes that are orders of magnitude longer than the simulations themselves. A WE simulation involves an iterative process: many MD simulations are executed in parallel and periodically evaluated to be replicated or terminated based on a set WE resampling criteria. To rigorously apply WE resampling, the MD simulations are analyzed during run time with tools such as MDAnalysis to determine the state of the simulated system.

Read the WE Overview; install the WESTPA package; and work through tutorials 7.1 and 7.5 of our tutorials suite to start learning more about WESTPA.

Project summary

The table summarizes the project ideas; long descriptions come after the table (or click on the links under each project name). The difficulty is a somewhat subjective ranking, where easy means that we know pretty much what needs to be done, medium requires some additional research into best solutions as part of the project, and hard is high risk/high reward where we think a solution exists but we will have to work with the student to find it and implement it. The project size is either 90 h (small), 175 h (medium) or 350 h (large) projects.

Each project has one primary mentor assigned and potentially multiple additional mentors. Each primary mentor will only mentor a single GSoC project, even if listed as a potential mentor for multiple projects in the table. Mentor availability will be taken into account during the project selection process.

project	name	difficulty	project size	description	skills	mentors
1	Dashboard for tracking MD simulation progress with the new streaming interface	easy/medium	175/350 hours	Create a web-based dashboard for real-time monitoring and analysis of MD simulations	Python (frontend UI, multiprocessing), Networking (TCP/IP)	@HeydenLab @amrutesht @orbeckst
2	Better interfacing of Blender and MDAnalysis	medium	350 hours	Improve how Blender and Molecular Nodes interface with MDAnalysis to import and animate MD trajectories	Python, MDAnalysis, Blender (and programming via its Python API)	@bradyajohnston @nilay-v3rma
3	Benchmarking and performance optimization	easy/medium	90/175/350 hours	Write benchmarks for automated performance analysis and address performance bottlenecks	Python/ASV, Cython	@orbeckst @yuxuanzhuang @talagayev
4	Lazy trajectory loading and indexing	medium	175/350 hours	Improve performance of trajectory reading by implementing lazy indexing	Python, trajectory I/O, performance optimization	@yuxuanzhuang @orbeckst @talagayev
5	Dashboard for tracking WESTPA simulation progress	easy	90 hours	Create a graphical user interface to report MD trajectory progress	Python (frontend UI, multiprocessing), Networking (TCP/IP)	@jeremyleung521 @ltchong @nilay-v3rma
6	Interface for post-simulation analysis ("crawling") of WESTPA simulations	easy	90 hours	Create an interface for reading, analyzing, and writing post-simulation data from WESTPA HDF5 Framework	Python (frontend UI, multiprocessing), HDF5 Format (h5py, hdf5)	@jeremyleung521 @ltchong

Project 1: Dashboard for tracking MD simulation progress with the new streaming interface

Summary

This project will develop a browser-based, real-time dashboard for molecular dynamics simulations using the new IMDv3 streaming protocol, enabling live monitoring, analysis, and visualization of running simulations. The dashboard will leverage imdclient and MDAnalysis to receive a data stream from a running simulation.

Detailed Description

Modern molecular dynamics (MD) simulations can run for days or weeks, yet most analysis workflows remain strictly post-hoc, requiring trajectory files to be written to disk and analyzed only after completion. Recently, we introduced IMDv3, a TCP/IP-based streaming protocol implemented in major MD engines such as LAMMPS, NAMD3, and GROMACS, together with a Python package, imdclient, that simplifies consuming these live data streams. In parallel, the MDAnalysis project now includes a reader (IMDReader)that can access IMDv3 streams as if they were conventional trajectory files.

The goal of this Google Summer of Code project is to build a browser-based dashboard that connects to a running MD simulation via imdclient/MDAnalysis and provides real-time feedback to users. At its most basic level, the dashboard will display simulation progress (e.g., current timestep or frame number) and connection status. Beyond this, the project will focus on interactive, live analysis: users should be able to select atoms or groups using familiar MDAnalysis selection syntax and compute properties on the fly as new frames arrive.

The project will explore both simple, frame-local analyses (such as distances, angles, or radius of gyration) and more advanced, time-dependent analyses that require buffering and processing historical data (e.g., autocorrelations or lag-time dependent observables). Results will be visualized live in the browser using plots and indicators. As an advanced extension, the project may integrate real-time 3D visualization via Blender and Molecular Nodes, enabling users to view the evolving molecular structure while analyses run in parallel.

Finally, the dashboard may include basic event detection and warning mechanisms, such as flagging unusual structural changes or simulation instabilities. The end result will be a flexible foundation for interactive, remote, and collaborative monitoring of MD simulations, tightly integrated with the Python MDAnalysis ecosystem.

Expected Outcomes

A web-based dashboard that can connect to running MD simulations via IMDv3 and imdclient
Live display of simulation status and progress information
An interactive GUI for defining and running real-time analyses using MDAnalysis
Support for simple per-frame observables and buffered, time-dependent analyses
Live visualization of analysis results (plots, indicators, status messages)
(Optional/advanced) Live 3D molecular visualization using Blender and Molecular Nodes
(Optional) Basic event detection and warning system for problematic simulation behavior
Well-documented, open-source code suitable for long-term integration into the MDAnalysis ecosystem

Relevant Skills

Python (frontend UI, multiprocessing)
Networking (TCP/IP)
Web development (for browser-based dashboard)

Possible Mentors

Expected Size of Project

Medium/Large (175/350 hours, depending on targeted features)

Difficulty Rating

Easy/Medium

Project 2: Better interfacing of Blender and MDAnalysis

Summary

Improvements to how Blender and Molecular Nodes interface with MDAnalysis which powers the import and animation of MD trajectories inside of Blender. Simple import is currently available when using the GUI in Blender, but there is still a lot of potential for improvements in scriptability, automated rendering, and using Blender as an analysis tool for MD trajectories.

Detailed Description

Blender is industry-leading 3D modeling and animation software. Through the add-on Molecular Nodes, MDAnalysis universes are able to be imported into the 3D scene, enabling advanced rendering of molecular dynamics trajectories that is not possible inside of any other molecule viewer. The ability to script and automate this rendering is possible but limited with lots of room for improvement for visualizing many common MD datasets. Blender also provides a great platform for implementing a potential GUI, to enable interactive analysis of MD trajectories with stunning visuals, all powered by MDAnalysis under the hood.

This project focuses on:

Better improving the Molecular Nodes ↔ MDAnalysis integration
Improvements on Molecular Nodes notebook rendering API
Visualizing particular analysis results inside of Blender / using Molecular Nodes rather than just structural information

Expected Outcomes

Prototype improved API for scripting and working with Molecular Nodes from Jupyter Notebooks or other similar environments
Prototyping common analysis and visualization tasks that could be performed from within Blender via the GUI

Relevant Skills

Proficiency with Python
Working knowledge of MDAnalysis
Familiarity with Blender and programming via its Python API

https://github.com/BradyAJohnston/MolecularNodes/pull/719

Possible Mentors

Expected Size of Project

Large (350 hours)

Difficulty Rating

Medium

Project 3: Benchmarking and performance optimization

Summary

The goal of this project is to increase the performance assessment coverage (using the existing ASV framework), identify code that should be improved, and optimize code.

Detailed Description

The MDAnalysis Roadmap emphasizes performance improvement. The performance of the MDAnalysis library is assessed by automated nightly benchmarks with ASV (see https://github.com/MDAnalysis/benchmarks/wiki) but coverage of the code base is low. The goal of this project is to substantially increase the performance assessment coverage, identify code that should be improved, and possibly implement performance optimizations.

Expected Outcomes

Write ASV benchmark cases for all major functionality in the core library
Write ASV benchmark cases for often-used analysis tools
Analyze performance history and generate a priority list of code that should be improved
Document writing benchmarks with a short tutorial
Optional: Optimize performance for at least one discovered performance bottleneck

Relevant Skills

Python/ASV
Cython

Possible Mentors

Expected Size of Project

Small/Medium/Large (90/175/350 hours)

(This project can be tailored in scope to the desired length. Discuss the length and scope with us while you are writing your application.)

Difficulty Rating

Easy/Medium

Project 4: Lazy trajectory loading and indexing

Summary

This project aims to improve MDAnalysis's trajectory reading performance by implementing lazy indexing for trajectory formats that currently build a complete frame index on first file open, which can take hours for large files.

Detailed Description

A general assumption in MDAnalysis has been for a long time that a trajectory reader can access arbitrary frames in a trajectory file, corresponding to the usage in MDAnalysis of ts = u.trajectory[frame]. However, many trajectory formats do not support random access natively. To get around this problem, some MDAnalysis trajectory readers (in particular the ones for XTC and TRR formats) always build a hidden frame index on first opening the file by rapidly scanning the whole file. (MDAnalysis calls this frame index offsets.) For large files, this "rapid scan" can still take hours during which the user has no idea what's happening.

This project proposes implementing "lazy trajectory loading" whereby we eschew index building if we are only iterating forward and instead resort to frame-by-frame loading (or possibly use the header-seeking trick from the index building for larger step sizes). In a way, this treats these files more like streams than random-access trajectories.

While reading we could then start building the index and possibly improve it iteratively while additional frames are being read. The index would be built right away if fancy indexing is used.

The behavior could potentially be made user-configurable at the Universe level with a kwarg like index_trajectory="always" | "never" | "lazy".

Expected Outcomes

Implementation of lazy indexing mode for XTC/TRR readers
Progressive index building during forward iteration
User-configurable indexing behavior via Universe kwarg
Performance benchmarks comparing lazy vs eager indexing
Documentation for users and developers

Relevant Skills

Python
Understanding of trajectory I/O and file formats
Performance optimization

Possible Mentors

Expected Size of Project

Medium/Large (175/350 hours)

Difficulty Rating

Medium

Project 5: Dashboard for tracking WESTPA simulation progress

Summary

WESTPA simulations involve running multiple MD trajectories in parallel, which makes it hard to track progress. This project aims to create a graphical user interface that exploits MDAnalysis’s streaming ability and WESTPA’s work managers to monitor the progress of a WESTPA simulation.

Detailed Description

While WESTPA simulations report status at regular intervals, these iterations could last minutes to hours, leaving users unsure of the intermediate progress or time estimate. The task here will involve creating a graphical user interface reporting trajectory progress and completion time estimates through MDAnalysis’s streaming abilities and extracting relevant information from WESTPA’s work managers (ZMQ, python multiprocessing) and data managers.

Expected Outcomes

New CLI tool for WESTPA tracking simulation progress
MDAnalysis module for aggregating/tracking multiple simulations

Relevant Skills

Python (frontend UI, multiprocessing)
Networking (TCP/IP)

Not applicable

Possible Mentors

Expected Size of Project

Small (90 hours)

Difficulty Rating

Easy

Project 6: Interface for post-simulation analysis ("crawling") of WESTPA simulations

Summary

WESTPA simulations involve running multiple MD trajectories in parallel. This makes post-simulation extraction of features and observables (that were not saved during the simulation) somewhat cumbersome. This project aims to create a simpler method for analyzing and extracting data from WESTPA simulation saved using the HDF5 Trajectory Storage Framework.

Detailed Description

While users who wish to extract more data from their trajectories/WE Simulation can use w_crawl, it requires writing and testing custom Python code to get it right. The task here is to simplify the process for users who have already saved their trajectory data with WESTPA's HDF5 Trajectory Storage Framework ("HDF5 Framework"), just by loading their west.h5 file. The resulting code for this project would read in the "west.h5" as a MDAnalysis universe object, allowing users to run analysis promptly and save it back into the west.h5 file as auxiliary data. Code that translates the topology in the HDF5 Framework to that of MDAnalysis has already been written and included in the source code of v2022.13.

Expected Outcomes

New CLI/Python tool for analyzing trajectory data saved with WESTPA HDF5 Trajectory Storage Framework
New MDAnalysis/MDAKit parser for WESTPA west.h5 files (turns west.h5 + related iteration files into a MDAnalysis Universe object)

Relevant Skills

Python (frontend UI, multiprocessing)
HDF5 (h5py, hdf5)

WESTPA 2.0 Paper
Tutorial 7.5 from WESTPA 2.0 Tutorials
HDF5 Framework update https://github.com/westpa/westpa/pull/484

Possible Mentors

Expected Size of Project

Small (90 hours)

Difficulty Rating

Easy

MDAnalysis — Project Ideas

Prerequisites

To Prospective Applicants

To Prospective Mentors

Overview

Collaborations

Molecular Nodes

WESTPA

Project summary

Project 1: Dashboard for tracking MD simulation progress with the new streaming interface

Summary

Detailed Description

Expected Outcomes

Relevant Skills

Related issues/PRs/etc.:

Possible Mentors

Expected Size of Project

Difficulty Rating

Project 2: Better interfacing of Blender and MDAnalysis

Summary

Detailed Description

Expected Outcomes

Relevant Skills

Related issues/PRs/etc.:

Possible Mentors

Expected Size of Project

Difficulty Rating

Project 3: Benchmarking and performance optimization

Summary

Detailed Description

Expected Outcomes

Relevant Skills

Related issues/PRs/etc.:

Possible Mentors

Expected Size of Project

Difficulty Rating

Project 4: Lazy trajectory loading and indexing

Summary

Detailed Description

Expected Outcomes

Relevant Skills

Related issues/PRs/etc.:

Possible Mentors

Expected Size of Project

Difficulty Rating

Project 5: Dashboard for tracking WESTPA simulation progress

Summary

Detailed Description

Expected Outcomes

Relevant Skills

Related issues/PRs/etc.:

Possible Mentors

Expected Size of Project

Difficulty Rating

Project 6: Interface for post-simulation analysis ("crawling") of WESTPA simulations

Summary

Detailed Description

Expected Outcomes

Relevant Skills

Related issues/PRs/etc.:

Possible Mentors

Expected Size of Project

Difficulty Rating

Related Pages

Command Palette