Building the Amharic DBpedia Language Chapter with Large Language Models (LLMs)
Description
DBpedia is a collaborative initiative that extracts structured information from Wikipedia and publishes it as Linked Open Data. This is a continuation of GSoC 2024 and GSoC 2025. We successfully integrated Amharic parsers and extractors into the DBpedia chapter. However, due to time constraints, we could not build a complete automation system to extract and build the artifacts. In this year’s GSOC, we would like to continue from last year’s progress.
Goal
The primary goal of this project is to enhance the existing Amharic DBpedia chapter:
-
Integrate an automatic extraction framework and mapping by applying LLMs
-
Class/Property/Relation prediction
-
Build a demo page
-
Update the home page
-
Deploy the knowledge graph available to end users via a web page.
-
Create documentation for processes, tools, and techniques used for sustainable development, following FAIR principles.
Impact
- Enable users to access and utilize structured data in Amharic DBpedia more effectively.
- This will promote linguistic diversity and support research, education, and applications that rely on multilingual knowledge graphs.
- NLP downstream tasks: Apply knowledge graphs from DBpedia to NLP applications such as machine translation and sentiment analysis.
- Community engagement: Encourage the community to contribute and collaborate to sustain and expand Amharic DBpedia.
Warmup Tasks
Read the documentation for Amharic DBpedia at
https://github.com/AmharicDBpedia/AmharicDBpediaChapter/wiki
Skills Required
- A good understanding of Java and Python
- Optionally, good knowledge of SPARQL, RDF, and other Semantic Web technologies
- Machine Learning
- Good documentation and communication skills
Project Size
350 hours
Mentors
Keywords
Amharic DBpedia, Semantic Web, Extraction Framework