WorldKG: World-Scale Completion of Geographic Knowledge (DFG, SPP VGIscience, 2019-2022), PI.

OpenStreetMap (OSM) is a rich source of openly available volunteered geographic information. However, representations of geographic entities in OSM are highly diverse and incomplete. Recently emerged knowledge graphs (i.e. graph-based knowledge repositories) such as Wikidata, EventKG, and DBpedia provide a rich source of contextual information about geographic entities and support semantic queries.  Whereas knowledge graphs provide a wide range of complementary semantic information for geographic entities, highly useful for Web applications, identity links between OSM and knowledge graphs are still rare and are mainly manually defined by volunteers. The main goal of the WordKG project is to facilitate world-scale interlinking of OSM datasets describing different geographic regions with knowledge graphs such as Wikidata, EventKG, and DBpedia as well as completion of spatial knowledge in the knowledge graphs using OSM data. 

smashHit: Smart Dispatcher for Secure and Controlled Sharing of Distributed Personal and Industrial Data (EU, H2020, 01/2020-12/2022), PI.

The objective of smashHit is to assure trusted and secure sharing of data streams from both personal and industrial platforms, needed to build sectorial and cross-sectorial services, by establishing a Framework for processing of data owner consent and legal rules and effective contracting, as well as joint security and privacy-preserving mechanisms. The vision of smashHit is to overcome obstacles in the rapidly growing Data Economy which is characterized by heterogeneous technical designs and proprietary implementations, locking business opportunities due to the inconsistent consent and legal rules among different data-sharing platforms actors and operators. The Framework will provide methods and tools, such as Smart Data Dispatcher, to assure common consent over data shared using semantic models of consent and legal rules. The new tools include traceability of use of data, data fingerprinting and automatic contracting among the data owners, data providers, service providers, and users.

d-E-mand – Vorhersage von Ladebedarf bei Elektromobilität als Business Enabler (BMWi, 01/2020-12/2022), PI.

Eine zentrale Voraussetzung für die Elektro-Mobilitätswende ist der Aufbau einer flächendeckenden Infrastruktur und datenbasierten Diensten für alle Arten von Elektrofahrzeugen. Im d-E-mand Projekt werden Lösungen entwickelt, um den Bedarf an der mobilen und stationären Ladeinfrastruktur, insbesondere bei durch Großveranstaltungen und Massenbewegungen (z. B. Ferienbeginn) verursachten Belastungsspitzen, systematisch zu prognostizieren und zu adressieren.

CampaNeo. Plattform für Echtzeit Fahrzeugdaten Kampagnen (BMWi, 07/2019-06/2022), PI.

Projektziel ist der Aufbau einer Plattform, auf welcher private und öffentliche Institutionen kampagnenbasiert und in Echtzeit Fahrzeugdaten erheben und analysieren können, sowie die Umsetzung von intelligenten Use Cases auf Basis der Kampagnendaten. Im Fokus stehen insbesondere die Data Ownerships der Fahrzeughalter sowie die Nachverfolgbarkeit der Datenverarbeitung.

Related publications: see bibsonomy.

Cleopatra. Cross-lingual Event-centric Open Analytics Research Academy (H2020-MSCA-ITN-2018). 01/2019-12/2022. PI, Project coordinator.

With a rapidly increasing degree of integration among the European countries, a rising number of events and topics strongly impact the European community and the European digital economy across language and country borders. This development results in a vast amount of event-centric multilingual information available from different communities and heterogeneous sources. This information can differ across the sources with respect to its localization, potentially reflecting specific community-relevant aspects, containing cultural references and opinions as well as possibly incomplete or biased data. The main research objective of the Cleopatra ITN is to enable effective and efficient analytics of event-centric multilingual information spread across heterogeneous sources to deliver analytics results in the way meaningful to the users, with a particular focus on the journalists, digital humanities researchers and memory institutions.

Related publications: see bibsonomy.

Simple-ML. Big Data Machine Learning Workflows leicht gemacht (BMBF). 08/2018-07/2021.  PI, Project coordinator.

Die effiziente Anwendung aktueller Machine Learning (ML)-Verfahren erfordert ein sehr hohes Maß an Expertenwissen, was einer verbreiteten Nutzung von ML-Ansätzen, insbesondere durch kleine und mittlere Unternehmen, im Wege steht. Das Ziel des Simple-ML Projekts ist daher die Benutzbarkeit von ML Verfahren signifikant zu verbessern um diese für einen breiten Anwenderkreis leichter zugänglich zu machen. Als zentraler Beitrag des Projekts wird eine domänenspezifische Sprache (DSL) definiert, die ML Arbeitsabläufe (Workflows) und deren Komponenten ganzheitlich beschreibt und sich durch textuelle und graphische Editoren spezifizieren lässt. Weiterhin leistet das Projekt Beiträge zur Robustheit der erstellten ML Workflows, Erklärbarkeit und Transparenz der erlernten Modelle, Effizienz und Skalierbarkeit der erstellten Anwendungen, sowie zur Wiederverwendbarkeit der erstellten Lösungen. Dies geschieht durch Anwendung semantischen Technologien, Weiterentwicklung von symbolischen ML-Verfahren und Aufbau auf skalierbaren ML-Frameworks. Die Ergebnisse des Simple-ML Projekts werden in den Anwendungsszenarien „Mobilität in der Stadt“ und „Logistik“ gemeinsam mit Anwendern aus der Wirtschaft validiert.

Related publications: see bibsonomy.

Multilingual Data Analytics: from 2015. PI
The amount of multilingual information regarding contemporary and historical events of global importance, such as Brexit, the 2018 Winter Olympics, and the Syrian Civil War, constantly grows on the Web, in the news sources, and within social media. Efficiently accessing and analyzing large-scale event-centric and temporal information is crucial for a variety of real-world applications in the fields of Semantic Web, NLP, and Digital Humanities. In this project, we address various aspects of multilingual data analytics, including cross-lingual text alignment by MultiWiki, creation of event-centric multilingual knowledge graphs such as EventKG, generation of cross-lingual timelines by EventKG+TL, and many others.

Related publications: see bibsonomy.

Data4UrbanMobility: Data Analytics for Mobility Services in Smart Cities (BMBF). 03/2017 – 02/2020. Project coordinator.

Data4UrbanMobility focuses on facilitating innovative mobility services and mobility-related infrastructure development in smart cities through comprehensive data analytics. The stakeholders of the Data4UrbanMobility project are city councils, mobility service providers, and city inhabitants. The methods and tools developed in the Data4UrbanMobility project aim to provide insights into the mobility demand in smart cities, facilitate efficient use of the existing mobility services and infrastructure, support the development of innovative mobility-related services as well as facilitate effective planning of the mobility-relevant city infrastructure. To achieve this goal, the Data4UrbanMobility platform will interlink and enrich heterogeneous data sources including regional data collections, open data, and social media data using targeted Information Extraction, data integration, and machine learning methods. The methods and tools developed in the Data4UrbanMobility project will be validated in pilot projects in Region Hannover and Wolfsburg.

Related publications: see bibsonomy.

WDAqua ITN: Answering Questions using Web Data (H2020-MSCA-ITN-2014). Project manager (2016). Advisory Board member (2017-2018).

WDAqua, a Marie Sklodowska Curie Innovative Training Network (ITN) 2015 – 2018, involves six academic partners, and 15 Ph.D. students in total. WDAqua focuses on analyzing data on the Web and the use of this data to enable better services, in particular in the context of the data-driven Question Answering. The key objective is to deliver precise and comprehensive answers to natural language questions primarily by making better use of heterogeneous multilingual data on the Web. In this context, my research focuses on multilingual data analytics to facilitate a better overview of the data available in different languages and enable its interlingual alignment and comparison.

Related publications: see bibsonomy.

Completed Projects:

KEYSTONE COST Action IC1302 (2013-2017): semantic KEYword-based Search on sTructured data sOurcEs. Co-PI. Management Committee member and co-chair of WG2 “Keyword search”. 

KEYSTONE is a network of researchers, practitioners, and application domain specialists from over 25 European countries that coordinates collaboration to enable joint research activities and technology transfer in the area of keyword-based search over structured data sources.

Related publications: see bibsonomy.

ALEXANDRIA (2014 – 2015): Analytics of Web archives. PostDoc research fellow.

The ALEXANDRIA project aims to develop models, tools, and techniques necessary to explore and analyze Web archives in a meaningful way. Web archives are invaluable sources to follow the traces of past events, in particular for researchers in the Digital Humanities, journalists, and historians. On the one hand, the large size of data and their distributed nature makes their analysis daunting, especially for non-computer scientists. On the other hand, most research questions only require a smaller relevant subset of Web archives such as the snapshots of Web pages describing one particular event. For example, these sub-collections can reflect the Fukushima nuclear disaster in 2011, the German federal election in 2009, or the FIFA World Cup 2006. In this context, my research is focused on the development of efficient methods to create meaningful event-centric sub-collections from large-scale Web archives using flexible re-crawling methods coupled with topical and temporal relevance estimation and light-weight indexing.

Related publications: see bibsonomy.

FID-Mathematik (2015): Citation of scientific software in the mathematical domain. In collaboration with TIB-Hannover and SUB-Göttingen. PostDoc research fellow.

In the context of FID-Mathematik, TIB-Hannover develops a software citation framework including metadata and metrics to reflect the citation impact in coordination with the mathematical community by the exploitation of participation instruments. This framework can facilitate efficient software reuse as well as significantly enhance the visibility of the author’s contributions within the mathematical community.

Related publications: see bibsonomy.

iCrawl (01.2014 – 12.2014): Focused collection and analysis of Web content. PI.

Researchers in the Digital Humanities and journalists need to monitor, collect, and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provides a rich source of fresh content, which is not used by state-of-the-art focused crawlers. In the iCrawl project, we developed methods and tools that enable the collection of fresh and relevant Web and Social Web content for a topic of interest through seamless integration of Web and Social Media in a novel integrated focused crawler. The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.

Related publications: see bibsonomy.


ARCOMEM FP7 IP (2012 – 2013): Community-driven focused collection of Web content. WP Leader of the core technical WP, including 12 partners.

Community memories largely revolve around events, as well as entities, topics, and opinions related to these events. These may be unique events, such as the first landing on the moon or a natural disaster, or regularly occurring events, such as elections or TV serials. In this context, the main logical concepts considered in ARCOMEM extraction and enrichment activities are entities, topics, opinions, and events (ETOEs). To create incrementally-enriched web archives that allow access to all sorts of web content in a structured and semantically meaningful way, extraction, enrichment, and consolidation of ETOEs are of crucial importance. In this project, my research work focused on the enrichment of the archived content with semantic information.

Related publications: see bibsonomy.

LivingKnowledge FP7 (2011): Diversity on the Web. Research Fellow.

Knowledge and its articulations are strongly influenced by diversity, e.g., cultural backgrounds, schools of thought, geographical contexts. Judgments, assessments, and opinions, which play a crucial role in many areas of democratic societies, including politics and economics, reflect this diversity in perspective and goals. For the information on the Web (including, e.g., news and blogs) diversity – implied by the ever-increasing multitude of information providers – is the reason for diverging viewpoints and conflicts. The vision inspiring LivingKnowledge is to consider diversity as an asset and to make it traceable, understandable and exploitable, with the goal to improve navigation and search in very large multimodal datasets (e.g., the Web itself). In this context, my research focused on the development of methods that enable me to obtain a better overview of the available information.

Related publications: see bibsonomy.

OKKAM FP7 (2008-2010): Creation of a global entity repository. Research Fellow.

The aim of OKKAM is to deliver a secure and privacy-aware open infrastructure to manage entity references. Just as the WWW enables a global decentralized network of documents, connected by hyperlinks, OKKAM provides a global digital space for publishing and managing information about entities, where every entity is uniquely identified, entities can be reused across digital resources and links between entities can be explicitly specified and exploited in a variety of scenarios. My research focus in OKKAM was to develop search and retrieval methods that enable intuitive access to structured entity repositories for the end-users.

Related publications: see bibsonomy.

TENCOMPETENCE FP6 IP (2006-2007): Building the European network for lifelong competence development. Research Fellow.

TENCompetence supports individuals, groups, and organizations in Europe in lifelong competence development by establishing the most appropriate technical and organizational infrastructure, using open-source, standards-based, sustainable, and innovative technology. Within the TENCompetence project, we developed and integrated models and tools into an open-source infrastructure for the creation, storage, and exchange of learning objects, suitable knowledge resources as well as learning experiences.

Related publications: see bibsonomy.

Organized Events

Cleopatra workshop series starting @ESWC 2020.

The theme of the CLEOPATRA workshop – event-centric multilingual analytics – includes a variety of interdisciplinary challenges related to analysis, interaction with, and interpretation of vast amounts of event-centric textual, semantic and visual information in multiple languages originating from different communities. The objective of the workshop is to bring together researchers and practitioners interested in the development of methods for analyzing event-centric multilingual information.

PROFILES workshop series @ISWC 2019, @WWW 2018, @ISWC 2017, @ESWC 2014 – 2016.

The web of data has seen tremendous growth recently. New forms of structured data have emerged in the form of web markup, such as and web tables. Exploiting these rich, heterogeneous, and evolving data sources has become increasingly important for many different types of applications, including dataset search, question answering, and fact verification.  The objective of the PROFILES workshop series is to bring together researchers and practitioners interested in the development of data search techniques, data profiling, and dataset retrieval on the web.

  • The 6th International Workshop on Dataset PROFlLing and Search (PROFILES 2019) @ISWC 2019.
  • International Workshop on Profiling and Searching Data on the Web (PROFILES & DATA:SEARCH 2018) @WWW 2018.
  • The 4th International Workshop on Dataset PROFIling and fEderated Search for Web Data (PROFILES 2017) @ISWC 2017.
  • The 3rd International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PROFILES 2016) @ESWC 2016.
  • The 2nd International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PROFILES 2015) @ESWC 2015.
  • The 1st International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PROFILES 2014) @ESWC 2014.

Journal Reviews

PC Membership (Conferences)

PC Membership (Workshops)

  • Scalable Question Answering Open Challenge (SQA) 2018.
  • Question Answering over Linked Data Challenge (QALD 2017).
  • Fifth International Workshop on Querying Graph Structured Data (GraphQ 2016).
  • The 5th International Workshop on Semantic Digital Archives (SDA 2015).
  • USEWOD – Usage Analysis and the Web of Data (USEWOD 2016, USEWOD 2015).
  • The 1st International Workshop on Knowledge Diversity on the Web (DiversiWeb 2011).