New Research Projects (acquired, pending)

Cleopatra. Cross-lingual Event-centric Open Analytics Research Academy (H2020-MSCA-ITN-2018). 2018/2019-2022. PI, Project coordinator.

With a rapidly increasing degree of integration among the European countries, a rising number of events and topics strongly impact the European community and the European digital economy across language and country borders. This development results in a vast amount of event-centric multilingual information available from different communities and heterogeneous sources. This information can differ across the sources with respect to its localisation, potentially reflecting specific community-relevant aspects, containing cultural references and opinions as well as possibly incomplete or biased data. The main research objective of the Cleopatra ITN is to enable effective and efficient analytics of event-centric multilingual information spread across heterogeneous sources to deliver analytics results in the way meaningful to the users, with a particular focus on the journalists, digital humanities researchers and memory institutions.

Simple-ML. Big Data Machine Learning Workflows leicht gemacht (BMBF). 08/2018-07/2021.  PI, Project coordinator.

Die effiziente Anwendung aktueller Machine Learning (ML)-Verfahren erfordert ein sehr hohes Maß an Expertenwissen, was einer verbreiteten Nutzung von ML-Ansätzen, insbesondere durch kleine und mittlere Unternehmen, im Wege steht. Das Ziel des Simple-ML Projekts ist daher die Benutzbarkeit von ML Verfahren signifikant zu verbessern um diese für einen breiten Anwenderkreis leichter zugänglich zu machen. Als zentraler Beitrag des Projekts wird eine domänenspezifische Sprache (DSL) definiert, die ML Arbeitsabläufe (Workflows) und deren Komponenten ganzheitlich beschreibt und sich durch textuelle und graphische Editoren spezifizieren lässt. Weiterhin leistet das Projekt Beiträge zur Robustheit der erstellten ML Workflows, Erklärbarkeit und Transparenz der erlernten Modelle, Effizienz und Skalierbarkeit der erstellten Anwendungen, sowie zur Wiederverwendbarkeit der erstellten Lösungen. Dies geschieht durch Anwendung semantischen Technologien, Weiterentwicklung von symbolischen ML-Verfahren und Aufbau auf skalierbaren ML-Frameworks. Die Ergebnisse des Simple-ML Projekts werden in den Anwendungsszenarien „Mobilität in der Stadt“ und „Logistik“ gemeinsam mit Anwendern aus der Wirtschaft validiert.

Running Research Projects

Multilingual Data Analytics: since 2015. PI
The amount of multilingual information regarding contemporary and historical events of global importance, such as Brexit, the 2018 Winter Olympics and the Syrian Civil War, constantly grows on the Web, in the news sources and within social media. Efficiently accessing and analyzing large-scale event-centric and temporal information is crucial for a variety of real-world applications in the fields of Semantic Web, NLP and Digital Humanities. In this project we address various aspects of multilingual data analytics, including cross-lingual text alignment by MultiWiki, creation of event-centric multilingual knowledge graphs such as EventKG, generation of cross-lingual timelines by EventKG+TL, and many others.

Related publications: see bibsonomy.

Data4UrbanMobility: Data Analytics for Mobility Services in Smart Cities (BMBF). 03/2017 – 02/2020. Project coordinator.

Data4UrbanMobility focuses on facilitating innovative mobility services and mobility-related infrastructure development in smart cities through comprehensive data analytics. The stakeholders of the Data4UrbanMobility project are city councils, mobility service providers and city inhabitants. The methods and tools developed in the Data4UrbanMobility project aim to provide insights in the mobility demand in smart cities, facilitate efficient use of the existing mobility services and infrastructure, support development of innovative mobility-related services as well as facilitate effective planning of the mobility-relevant city infrastructure. To achieve this goal, Data4UrbanMobility platform will interlink and enrich heterogeneous data sources including regional data collections, open data and social media data using targeted Information Extraction, data integration and machine learning methods. The methods and tools developed in the Data4UrbanMobility project will be validated in pilot projects in Region Hannover and Wolfsburg.

Related publications: see bibsonomy.

WDAqua ITN: Answering Questions using Web Data (H2020-MSCA-ITN-2014). Project manager (2016). Advisory Board member (2017-2018).

WDAqua, a Marie Sklodowska Curie Innovative Training Network (ITN) 2015 – 2018, involves six academic partners, and 15 PhD students in total. WDAqua focuses on analysing data on the Web and use of this data to enable better services, in particular in the context of the data-driven Question Answering. The key objective is to deliver precise and comprehensive answers to natural language questions primarily by making better use of heterogeneous multilingual data on the Web. In this context, my research focuses on multilingual data analytics to facilitate better overview of the data available in different languages and enable its interlingual alignment and comparison.

Related publications: see bibsonomy.

Completed Projects:

KEYSTONE COST Action IC1302 (2013-2017): semantic KEYword-based Search on sTructured data sOurcEs. Co-PI. Management Committee member and co-chair of WG2 “Keyword search”. 

KEYSTONE is a network of researchers, practitioners and application domain specialists from over 25 European countries that coordinates collaboration to enable joint research activities and technology transfer in the area of keyword-based search over structured data sources.

Related publications: see bibsonomy.

ALEXANDRIA (2014 – 2015): Analytics of Web archives. PostDoc research fellow.

The ALEXANDRIA project aims to develop models, tools and techniques necessary to explore and analyse Web archives in a meaningful way. Web archives are invaluable sources to follow the traces of past events, in particular for researchers in the Digital Humanities, journalists and historians. On the one hand, the large size of data and their distributed nature makes their analysis daunting, especially for non-computer scientists. On the other hand, most research questions only require a smaller relevant subset of Web archives such as the snapshots of Web pages describing one particular event. For example, these sub-collections can reflect the Fukushima nuclear disaster in 2011, the German federal election in 2009, or the FIFA World Cup 2006. In this context, my research is focused on development of efficient methods to create meaningful event-centric sub-collections from large-scale Web archives using flexible re-crawling methods coupled with topical and temporal relevance estimation and light-weight indexing.

Related publications: see bibsonomy.

FID-Mathematik (2015): Citation of scientific software in the mathematical domain. In collaboration with TIB-Hannover and SUB-Göttingen. PostDoc research fellow.

In the context of FID-Mathematik, TIB-Hannover develops a software citation framework including metadata and metrics to reflect the citation impact in coordination with the mathematical community by exploitation of participation instruments. This framework can facilitate efficient software reuse as well as significantly enhance the visibility of the author’s contributions within the mathematical community.

Related publications: see bibsonomy.

iCrawl (01.2014 – 12.2014): Focused collection and analysis of Web content. PI.

Researchers in the Digital Humanities and journalists need to monitor, collect and analyse fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers. In the iCrawl project we developed methods and tools that enable collection of fresh and relevant Web and Social Web content for a topic of interest through seamless integration of Web and Social Media in a novel integrated focused crawler. The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.

Related publications: see bibsonomy.


ARCOMEM FP7 IP (2012 – 2013): Community driven focused collection of Web content. WP Leader of the core technical WP, including 12 partners.

Community memories largely revolve around events, as well as entities, topics and opinions related to these events. These may be unique events, such as the first landing on the moon or a natural disaster, or regularly occurring events, such as elections or TV serials. In this context, the main logical concepts considered in ARCOMEM extraction and enrichment activities are entities, topics, opinions and events (ETOEs). To create incrementally-enriched web archives that allow access to all sorts of web content in a structured and semantically meaningful way, extraction, enrichment and consolidation of ETOEs are of crucial importance. In this project, my research work focused on enrichment of the archived content with semantic information.

Related publications: see bibsonomy.

LivingKnowledge FP7 (2011): Diversity on the Web. Research Fellow.

Knowledge and its articulations are strongly influenced by diversity in, e.g., cultural backgrounds, schools of thought, geographical contexts. Judgements, assessments and opinions, which play a crucial role in many areas of democratic societies, including politics and economics, reflect this diversity in perspective and goals. For the information on the Web (including, e.g., news and blogs) diversity – implied by the ever increasing multitude of information providers – is the reason for diverging viewpoints and conflicts. The vision inspiring LivingKnowledge is to consider diversity an asset and to make it traceable, understandable and exploitable, with the goal to improve navigation and search in very large multimodal datasets (e.g., the Web itself). In this context, my research focused on development of methods that enable to obtain a better overview of the available information.

Related publications: see bibsonomy.

OKKAM FP7 (2008-2010): Creation of a global entity repository. Research Fellow.

The aim of OKKAM is to deliver a secure and privacy-aware open infrastructure to manage entity references. Just as the WWW enables a global decentralised network of documents, connected by hyperlinks, OKKAM provides a global digital space for publishing and managing information about entities, where every entity is uniquely identified, entities can be reused across digital resources and links between entities can be explicitly specified and exploited in a variety of scenarios. My research focus in OKKAM was to develop search and retrieval methods that enable intuitive access to structured entity repositories for the end users.

Related publications: see bibsonomy.

TENCOMPETENCE FP6 IP (2006-2007): Building the European network for lifelong competence development. Research Fellow.

TENCompetence supports individuals, groups and organisations in Europe in lifelong competence development by establishing the most appropriate technical and organizational infrastructure, using open-source, standards-based, sustainable and innovative technology. Within the TENCompetence project we developed and integrated models and tools into an open source infrastructure for the creation, storage and exchange of learning objects, suitable knowledge resources as well as learning experiences.

Related publications: see bibsonomy.

Organised Events

PROFILES workshop series @WWW 2018, @ISWC 2017, @ESWC 2014 – 2016.

The web of data has seen tremendous growth recently. New forms of structured data have emerged in the form of web markup, such as and web tables. Exploiting these rich, heterogeneous and evolving data sources has become increasingly important for many different types of applications, including (federated) search, question answering and fact verification.  In 2018, the objective of the PROFILES & DATA:SEARCH workshop edition is to bring together researchers and practitioners interested in the development of data search techniques, data profiling, and dataset retrieval on the web.

  • International Workshop on Profiling and Searching Data on the Web (PROFILES & DATA:SEARCH 2018) @WWW 2018.
  • 4th International Workshop on Dataset PROFIling and fEderated Search for Web Data (PROFILES 2017) @ISWC 2017.
  • 3rd International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PROFILES 2016) @ESWC 2016.
  • 2nd International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PROFILES 2015) @ESWC 2015.
  • 1st International Workshop on Dataset PROFIling and fEderated Search for Linked Data (PROFILES 2014) @ESWC 2014.

Journal Reviews

PC Membership (Conferences)

PC Membership (Workshops)

  • Scalable Question Answering Open Challenge (SQA) 2018.
  • Question Answering over Linked Data Challenge (QALD 2017).
  • Fifth International Workshop on Querying Graph Structured Data (GraphQ 2016).
  • The 5th International Workshop on Semantic Digital Archives (SDA 2015).
  • USEWOD – Usage Analysis and the Web of Data (USEWOD 2016, USEWOD 2015).
  • The 1st International Workshop on Knowledge Diversity on the Web (DiversiWeb 2011).