About
GSoC 2026 Explorer is an open-source platform that brings together all 185 organizations and their project ideas from Google Summer of Code 2026 into a single, searchable, and filterable interface.
Why use this instead of the official site?
The official GSoC website lists organizations but links out to external pages for project ideas — scattered across Google Docs, GitHub wikis, GitLab issues, and various websites. This makes it hard to search, compare, and explore what's available.
GSoC 2026 Explorer solves this by:
- Full-text search across everything — press Cmd+K (or Ctrl+K) to search across all organization names, descriptions, technologies, topics, and project ideas content at once
- Advanced combined filters — filter by technology and topic tags simultaneously, with shareable URLs (e.g.
?tech=python&topic=web) - All ideas in one place — every ideas page has been scraped and rendered in a consistent format, with table of contents for easy navigation
- LLM-ready structured data — the repository contains all data in JSON and Markdown, ready to be fed to language models for analyzing opportunities, matching skills, or generating summaries
- Fast and responsive — static site with instant filtering, dark mode, and no page reloads
Data overview
- 185 organizations
- 180 ideas pages successfully scraped
- 261 unique technologies
- 510 unique topics
How the data was collected
A Python scraper queries the GSoC public API to get all participating organizations, then fetches each organization's ideas page using multiple strategies:
- GitHub blob/wiki/gist URLs converted to raw URLs for clean Markdown
- Google Docs exported as HTML or plain text
- GitHub and GitLab issues fetched via CLI/API
- HTML pages extracted via trafilatura or markdownify
- Playwright headless browser for JavaScript-rendered pages
Source code
Both the scraper and this website are open source. The repository also contains the raw scraped data in Markdown and JSON format.
View on GitHub