GSoC 2026 Explorer

Building the Amharic DBpedia Language Chapter with Large Language Models (LLMs)

Description

DBpedia is a collaborative initiative that extracts structured information from Wikipedia and publishes it as Linked Open Data. This is a continuation of GSoC 2024 and GSoC 2025. We successfully integrated Amharic parsers and extractors into the DBpedia chapter. However, due to time constraints, we could not build a complete automation system to extract and build the artifacts. In this year’s GSOC, we would like to continue from last year’s progress.

Goal

The primary goal of this project is to enhance the existing Amharic DBpedia chapter:

Integrate an automatic extraction framework and mapping by applying LLMs
Class/Property/Relation prediction
Build a demo page
Update the home page
Deploy the knowledge graph available to end users via a web page.
Create documentation for processes, tools, and techniques used for sustainable development, following FAIR principles.

Impact

Enable users to access and utilize structured data in Amharic DBpedia more effectively.
This will promote linguistic diversity and support research, education, and applications that rely on multilingual knowledge graphs.
NLP downstream tasks: Apply knowledge graphs from DBpedia to NLP applications such as machine translation and sentiment analysis.
Community engagement: Encourage the community to contribute and collaborate to sustain and expand Amharic DBpedia.

Warmup Tasks

Read the documentation for Amharic DBpedia at

https://github.com/AmharicDBpedia/AmharicDBpediaChapter/wiki

Amharic Wikipedia

Skills Required

A good understanding of Java and Python
Optionally, good knowledge of SPARQL, RDF, and other Semantic Web technologies
Machine Learning
Good documentation and communication skills

Project Size

350 hours

Mentors

Keywords

Amharic DBpedia, Semantic Web, Extraction Framework

Building the Amharic DBpedia Language Chapter with Large Language Models (LLMs)

Command Palette