Achtung:

Sie haben Javascript deaktiviert!
Sie haben versucht eine Funktion zu nutzen, die nur mit Javascript möglich ist. Um sämtliche Funktionalitäten unserer Internetseite zu nutzen, aktivieren Sie bitte Javascript in Ihrem Browser.

Teamfoto vom 23.11.2017 (Foto: Universität Paderborn, Astrid van Kimmenade)

Prof. Dr. Michaela Geierhos

Kontakt
Profil
Vita
Publikationen
Prof. Dr. Michaela Geierhos

Digitale Kulturwissenschaften (bis 2020)

Ehemalige

Telefon:
089/6004-7340
Web:
Auszeichnungen & Stipendien
Forschungsvorträge
  • 01/2018:
    Semantische Informationsverarbeitung im Web 2.0: Bewertungskriterien, Widersprüche und Privatsphärebrüche sichtbar machen
    (Data Science Kolloquium, Universität der Bundeswehr München)
     
  • 01/2018:
    Digital Humanities und Semantische Informationstechnologien
    (KI Workshop @ Universität Paderborn)
     
  • 10/2017:
    Digital Humanities: Mixed Methods for Non-Standard Data
    (Ringvorlesung Digitale Kultur- und Geisteswissenschaften, Universität Paderborn)
     
  • 06/2017:
    Semantische Informationsverarbeitung
    (Diskussionsforum CPSS, Universität Paderborn)
     
  • 05/2017:
    Text Mining im Web 2.0: Auf der Spur der Rezensenten: Automatische Identifikation von Bewertungskriterien, Stimmungen und Widersprüchen
    (Friedrich-Alexander Universität Erlangen-Nürnberg)
     
  • 05/2017:
    Semantische Textanalyse im Personalmanagement - Wie Anforderungen und Qualifikationen zusammengebracht werden
    (Impulsvortrag beim Themennetzwerk Informations- und Kommunikationstechnologien der acatech, TU Darmstadt)
     
  • 02/2017:
    Von der Pressemeldung zum Firmendossier: Über die Perspektiviertheit der Textanalyse
    (Antrittsvorlesung an der Fakultät für Sprach- und Literaturwissenschaften der Ludwig-Maximilians-Universität München)
     
  • 07/2016:
    Vernetzung semantischer Informationen zur intuitiven Suche
    (LIS Round Table, Universität Paderborn)
     
  • 05/2016:
    Analyse natürlichsprachlicher Anforderungsbeschreibungen im On-The-Fly Computing
    (Informatik-Kolloquium, Karlsruher Institut für Technologie)
     
  • 05/2016:
    Text Mining im Web 2.0
    (Universität Leipzig)
     
  • 11/2015:
    Understanding the Patient 2.0: Gaining Insight into Patients' Rating Behavior by User-generated Physician Review Mining
    (9th International and Interdisciplinary Conference on Modeling and Using Context (CONTEXT 2015), Larnaka, Zypern)
     
  • 10/2015:
    Social Media als zusätzlicher Kommunikationskanal im Customer Interaction Management
    (Unified Communications @ R+V Versicherung, Wiesbaden)
     
  • 07/2015:
    Semantische Informationsverarbeitung im Web 2.0 - Auf der Spur der Rezensenten: Automatische Identifikation von Bewertungskriterien, Stimmungen und Widersprüchen
    (Antrittsvorlesung an der Universität Paderborn)
     
  • 05/2015:
    I grade what I get but write what I think. Inconsistency Analysis in Patients‘ Reviews
    (23rd European Conference on Information Systems (ECIS 2015), Münster)
     
  • 09/2014:
    The same but not the same – Challenges in Comparing Patient Opinions
    (International Conference on Consumer Research (ICCR 2014), Bonn)
     
  • 09/2014:
    Was beobachtet die Forschungsethik? Eine interdisziplinäre Diskussion zwischen Computerlinguistik und qualitativ-konstruktivistischer Sozialforschung
    (Tagung für Forschungsethik in der qualitativen und quantitativen Sozialforschung, München)
     
  • 06/2014:
    Creation and Development of Local Grammars
    (NLP Lunch Talk, Centrum für Informations- und Sprachverarbeitung München)
     
  • 01/2014:
    Identifying perceived service quality dimensions in patients’ online reviews
    (1. Passauer Digital-Marketing-Konferenz)
Gremienarbeit
  • seit 03/2014:
    Gewähltes Mitglied im Beirat der „Deutschen Biographie“
    (Historische Kommission bei der Bayerischen Akademie der Wissenschaften)
     
  • 02/2014 - 09/2017:
    Mitglied der Steuerungsgruppe „Frauen gestalten die Informationsgesellschaft“
    (Universität Paderborn)
     
  • 11/2013 - 09/2017:
    Gewähltes Mitglied im wissenschaftlichen Beirat des Zentrums für Sprachlehre
    (Universität Paderborn)
     
  • 10/2013 - 09/2017:
    Gewähltes Mitglied im Fakultätsrat der Fakultät für Wirtschaftswissenschaften
    (Universität Paderborn)
     
  • 04/2012 - 09/2012:
    Stellvertretende Frauenbeauftragte der Fakultät für Sprach- und Literaturwissenschaften
    (Ludwig-Maximilians-Universität München)
Prof. Dr. Michaela Geierhos
01.04.2020 - heute

Universitätsprofessorin für Data Science

Forschungsinstitut CODE
Fakultät für Informatik
Universität der Bundeswehr München

28.07.2014 - heute

Lehrbeauftragte

Fakultät für Sprach- und Literaturwissenschaften
Ludwig-Maximilians-Universität München

07.11.2017 - 31.03.2020

Universitätsprofessorin für Digitale Kulturwissenschaften

Institut für Anglistik und Amerikanistik
Fakultät für Kulturwissenschaften
Universität Paderborn

01.10.2017 - 06.11.2017

Vertretungsprofessorin für Digitale Kulturwissenschaften

Institut für Anglistik und Amerikanistik
Fakultät für Kulturwissenschaften
Universität Paderborn

01.01.2013 - 30.09.2017

Juniorprofessorin für Wirtschaftsinformatik, insb. Semantische Informationsverarbeitung

Heinz Nixdorf Institut
Universität Paderborn

07.12.2011 - 09.11.2016

Habilitation

Venia Legendi für Computerlinguistik
Fakultät für Sprach- und Literaturwissenschaften
Ludwig-Maximilians-Universität München

01.10.2012 - 31.12.2012

BGF Habilitationsstipendium

Ludwig-Maximilians-Universität München

01.04.2012 - 30.09.2012

Junior Researcher in Residence

Center for Advanced Studies
Ludwig-Maximilians-Universität München

01.08.2006 - 30.09.2012

Wissenschaftliche Mitarbeiterin mit Lehrverpflichtung

Centrum für Informations- und Sprachverarbeitung
Ludwig-Maximilians-Universität München

07/2010

Promotion zum Dr. phil. in Computerlinguistik (summa cum laude)

Fakultät für Sprach- und Literaturwissenschaften
Ludwig-Maximilians-Universität München

07/2006

Magister Artium (M.A.) in Computerlinguistik, Informatik sowie Phonetik und Sprachliche Kommunikation (sehr gut)

Ludwig-Maximilians-Universität München


Liste im Research Information System öffnen

2022

Chatbot-Enhanced Requirements Resolution for Automated Service Compositions

J. Kersting, M. Ahmed, M. Geierhos, in: HCI International 2022 Posters, Springer International Publishing, 2022, pp. 419--426

This work addresses the automatic resolution of software requirements. In the vision of On-The-Fly Computing, software services should be composed on demand, based solely on natural language input from human users. To enable this, we build a chatbot solution that works with human-in-the-loop support to receive, analyze, correct, and complete their software requirements. The chatbot is equipped with a natural language processing pipeline and a large knowledge base, as well as sophisticated dialogue management skills to enhance the user experience. Previous solutions have focused on analyzing software requirements to point out errors such as vagueness, ambiguity, or incompleteness. Our work shows how apps can collaborate with users to efficiently produce correct requirements. We developed and compared three different chatbot apps that can work with built-in knowledge. We rely on ChatterBot, DialoGPT and Rasa for this purpose. While DialoGPT provides its own knowledge base, Rasa is the best system to combine the text mining and knowledge solutions at our disposal. The evaluation shows that users accept 73% of the suggested answers from Rasa, while they accept only 63% from DialoGPT or even 36% from ChatterBot.


2021

IN OTHER WORDS: A NAIVE APPROACH TO TEXT SPINNING

F.S. Bäumer, J. Kersting, S. Denisov, M. Geierhos, in: PROCEEDINGS OF THE INTERNATIONAL CONFERENCES ON WWW/INTERNET 2021 AND APPLIED COMPUTING 2021, IADIS, 2021, pp. 221--225

Content is the new oil. Users consume billions of terabytes a day while surfing on news sites or blogs, posting on social media sites, and sending chat messages around the globe. While content is heterogeneous, the dominant form of web content is text. There are situations where more diversity needs to be introduced into text content, for example, to reuse it on websites or to allow a chatbot to base its models on the information conveyed rather than of the language used. In order to achieve this, paraphrasing techniques have been developed: One example is Text spinning, a technique that automatically paraphrases text while leaving the intent intact. This makes it easier to reuse content, or to change the language generated by the bot more human. One method for modifying texts is a combination of translation and back-translation. This paper presents NATTS, a naive approach that uses transformer-based translation models to create diversified text, combining translation steps in one model. An advantage of this approach is that it can be fine-tuned and handle technical language.


Towards Aspect Extraction and Classification for Opinion Mining with Deep Sequence Networks

J. Kersting, M. Geierhos, in: Natural Language Processing in Artificial Intelligence -- NLPinAI 2020, Springer, 2021, pp. 163--189

This chapter concentrates on aspect-based sentiment analysis, a form of opinion mining where algorithms detect sentiments expressed about features of products, services, etc. We especially focus on novel approaches for aspect phrase extraction and classification trained on feature-rich datasets. Here, we present two new datasets, which we gathered from the linguistically rich domain of physician reviews, as other investigations have mainly concentrated on commercial reviews and social media reviews so far. To give readers a better understanding of the underlying datasets, we describe the annotation process and inter-annotator agreement in detail. In our research, we automatically assess implicit mentions or indications of specific aspects. To do this, we propose and utilize neural network models that perform the here-defined aspect phrase extraction and classification task, achieving F1-score values of about 80% and accuracy values of more than 90%. As we apply our models to a comparatively complex domain, we obtain promising results.


Well-being in Plastic Surgery: Deep Learning Reveals Patients' Evaluations

J. Kersting, M. Geierhos, in: Proceedings of the 10th International Conference on Data Science, Technology and Applications (DATA 2021), SCITEPRESS, 2021, pp. 275--284


Human Language Comprehension in Aspect Phrase Extraction with Importance Weighting

J. Kersting, M. Geierhos, in: Natural Language Processing and Information Systems, Springer, 2021, pp. 231--242

In this study, we describe a text processing pipeline that transforms user-generated text into structured data. To do this, we train neural and transformer-based models for aspect-based sentiment analysis. As most research deals with explicit aspects from product or service data, we extract and classify implicit and explicit aspect phrases from German-language physician review texts. Patients often rate on the basis of perceived friendliness or competence. The vocabulary is difficult, the topic sensitive, and the data user-generated. The aspect phrases come with various wordings using insertions and are not noun-based, which makes the presented case equally relevant and reality-based. To find complex, indirect aspect phrases, up-to-date deep learning approaches must be combined with supervised training data. We describe three aspect phrase datasets, one of them new, as well as a newly annotated aspect polarity dataset. Alongside this, we build an algorithm to rate the aspect phrase importance. All in all, we train eight transformers on the new raw data domain, compare 54 neural aspect extraction models and, based on this, create eight aspect polarity models for our pipeline. These models are evaluated by using Precision, Recall, and F-Score measures. Finally, we evaluate our aspect phrase importance measure algorithm.


2020

Tag Me If You Can: Insights into the Challenges of Supporting Unrestricted P2P News Tagging

F.S. Bäumer, J. Kersting, B. Buff, M. Geierhos, in: Information and Software Technologies, Springer, 2020, pp. 368--382

Peer-to-Peer news portals allow Internet users to write news articles and make them available online to interested readers. Despite the fact that authors are free in their choice of topics, there are a number of quality characteristics that an article must meet before it is published. In addition to meaningful titles, comprehensibly written texts and meaning- ful images, relevant tags are an important criteria for the quality of such news. In this case study, we discuss the challenges and common mistakes that Peer-to-Peer reporters face when tagging news and how incorrect information can be corrected through the orchestration of existing Natu- ral Language Processing services. Lastly, we use this illustrative example to give insight into the challenges of dealing with bottom-up taxonomies.


Aspect Phrase Extraction in Sentiment Analysis with Deep Learning

J. Kersting, M. Geierhos, in: Proceedings of the 12th International Conference on Agents and Artificial Intelligence (ICAART 2020) -- Special Session on Natural Language Processing in Artificial Intelligence (NLPinAI 2020), SCITEPRESS, 2020, pp. 391--400

This paper deals with aspect phrase extraction and classification in sentiment analysis. We summarize current approaches and datasets from the domain of aspect-based sentiment analysis. This domain detects sentiments expressed for individual aspects in unstructured text data. So far, mainly commercial user reviews for products or services such as restaurants were investigated. We here present our dataset consisting of German physician reviews, a sensitive and linguistically complex field. Furthermore, we describe the annotation process of a dataset for supervised learning with neural networks. Moreover, we introduce our model for extracting and classifying aspect phrases in one step, which obtains an F1-score of 80%. By applying it to a more complex domain, our approach and results outperform previous approaches.


Detection of Privacy Disclosure in the Medical Domain: A Survey

B. Buff, J. Kersting, M. Geierhos, in: Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2020), SCITEPRESS, 2020, pp. 630--637

When it comes to increased digitization in the health care domain, privacy is a relevant topic nowadays. This relates to patient data, electronic health records or physician reviews published online, for instance. There exist different approaches to the protection of individuals’ privacy, which focus on the anonymization and masking of personal information subsequent to their mining. In the medical domain in particular, measures to protect the privacy of patients are of high importance due to the amount of sensitive data that is involved (e.g. age, gender, illnesses, medication). While privacy breaches in structured data can be detected more easily, disclosure in written texts is more difficult to find automatically due to the unstructured nature of natural language. Therefore, we take a detailed look at existing research on areas related to privacy protection. Likewise, we review approaches to the automatic detection of privacy disclosure in different types of medical data. We provide a survey of several studies concerned with privacy breaches in the medical domain with a focus on Physician Review Websites (PRWs). Finally, we briefly develop implications and directions for further research.


Neural Learning for Aspect Phrase Extraction and Classification in Sentiment Analysis

J. Kersting, M. Geierhos, in: Proceedings of the 33rd International Florida Artificial Intelligence Research Symposium (FLAIRS) Conference, AAAI, 2020, pp. 282--285


What Reviews in Local Online Labour Markets Reveal about the Performance of Multi-Service Providers

J. Kersting, M. Geierhos, in: Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods, SCITEPRESS, 2020, pp. 263--272

This paper deals with online customer reviews of local multi-service providers. While many studies investigate product reviews and online labour markets with service providers delivering intangible products “over the wire”, we focus on websites where providers offer multiple distinct services that can be booked, paid and reviewed online but are performed locally offline. This type of service providers has so far been neglected in the literature. This paper analyses reviews and applies sentiment analysis. It aims to gain new insights into local multi-service providers’ performance. There is a broad literature range presented with regard to the topics addressed. The results show, among other things, that providers with good ratings continue to perform well over time. We find that many positive reviews seem to encourage sales. On average, quantitative star ratings and qualitative ratings in the form of review texts match. Further results are also achieved in this study.


2019

Requirements Engineering in OTF-Computing

F.S. Bäumer, M. Geierhos, in: encyclopedia.pub, MDPI, 2019


Natural Language Processing in OTF Computing: Challenges and the Need for Interactive Approaches

F.S. Bäumer, J. Kersting, M. Geierhos, Computers (2019), 8(1), 22

The vision of On-the-Fly (OTF) Computing is to compose and provide software services ad hoc, based on requirement descriptions in natural language. Since non-technical users write their software requirements themselves and in unrestricted natural language, deficits occur such as inaccuracy and incompleteness. These deficits are usually met by natural language processing methods, which have to face special challenges in OTF Computing because maximum automation is the goal. In this paper, we present current automatic approaches for solving inaccuracies and incompletenesses in natural language requirement descriptions and elaborate open challenges. In particular, we will discuss the necessity of domain-specific resources and show why, despite far-reaching automation, an intelligent and guided integration of end users into the compensation process is required. In this context, we present our idea of a chat bot that integrates users into the compensation process depending on the given circumstances.


Potentielle Privatsphäreverletzungen aufdecken und automatisiert sichtbar machen

F.S. Bäumer, B. Buff, M. Geierhos. Potentielle Privatsphäreverletzungen aufdecken und automatisiert sichtbar machen. In: DHd 2019 Digital Humanities: multimedial & multimodal., Mainz and Frankfurt am Main, Germany, 2019.


In Reviews We Trust: But Should We? Experiences with Physician Review Websites

J. Kersting, F.S. Bäumer, M. Geierhos, in: Proceedings of the 4th International Conference on Internet of Things, Big Data and Security, SCITEPRESS, 2019, pp. 147-155

The ability to openly evaluate products, locations and services is an achievement of the Web 2.0. It has never been easier to inform oneself about the quality of products or services and possible alternatives. Forming one’s own opinion based on the impressions of other people can lead to better experiences. However, this presupposes trust in one’s fellows as well as in the quality of the review platforms. In previous work on physician reviews and the corresponding websites, it was observed that there occurs faulty behavior by some reviewers and there were noteworthy differences in the technical implementation of the portals and in the efforts of site operators to maintain high quality reviews. These experiences raise new questions regarding what trust means on review platforms, how trust arises and how easily it can be destroyed.


2018

How to Deal with Inaccurate Service Descriptions in On-The-Fly Computing: Open Challenges

F.S. Bäumer, M. Geierhos, in: Proceedings of the 23rd International Conference on Natural Language and Information Systems, Springer, 2018, pp. 509-513

The vision of On-The-Fly Computing is an automatic composition of existing software services. Based on natural language software descriptions, end users will receive compositions tailored to their needs. For this reason, the quality of the initial software service description strongly determines whether a software composition really meets the expectations of end users. In this paper, we expose open NLP challenges needed to be faced for service composition in On-The-Fly Computing.


Improving Classifiers for Semantic Annotation of Software Requirements with Elaborate Syntactic Structure

Y.. Kim, S. Lee, M. Dollmann, M. Geierhos, International Journal of Advanced Science and Technology (2018), 112, pp. 123-136

A user generally writes software requirements in ambiguous and incomplete form by using natural language; therefore, a software developer may have difficulty in clearly understanding what the meanings are. To solve this problem with automation, we propose a classifier for semantic annotation with manually pre-defined semantic categories. To improve our classifier, we carefully designed syntactic features extracted by constituency and dependency parsers. Even with a small dataset and a large number of classes, our proposed classifier records an accuracy of 0.75, which outperforms the previous model, REaCT.


Freiraum zur individuellen Reflexion gemeinsamer Werte

M. Geierhos, in: Integration und Toleranz,1st ed., Klemm+Oelschläger, 2018, pp. 288-292


Rate Your Physician: Findings from a Lithuanian Physician Rating Website

F.S. Bäumer, J. Kersting, V. Kuršelis, M. Geierhos, in: Communications in Computer and Information Science, Springer, 2018, pp. 43-58

Physician review websites are known around the world. Patients review the subjectively experienced quality of medical services supplied to them and publish an overall rating on the Internet, where quantitative grades and qualitative texts come together. On the one hand, these new possibilities reduce the imbalance of power between health care providers and patients, but on the other hand, they can also damage the usually very intimate relationship between health care providers and patients. Review websites must meet these requirements with a high level of responsibility and service quality. In this paper, we look at the situation in Lithuania: Especially, we are interested in the available possibilities of evaluation and interaction, and the quality of a particular review website measured against the available data. We thereby identify quality weaknesses and lay the foundation for future research.


NLP in OTF Computing: Current Approaches and Open Challenges

F.S. Bäumer, M. Geierhos, in: Proceedings of the 24th International Conference on Information and Software Technologies (ICIST 2018), Springer, 2018, pp. 559-570

On-The-Fly Computing is the vision of covering software needs of end users by fully-automatic compositions of existing software services. End users will receive so-called service compositions tailored to their very individual needs, based on natural language software descriptions. This everyday language may contain inaccuracies and incompleteness, which are well-known challenges in requirements engineering. In addition to existing approaches that try to automatically identify and correct these deficits, there are also new trends to involve users more in the elaboration and refinement process. In this paper, we present the relevant state of the art in the field of automated detection and compensation of multiple inaccuracies in natural language service descriptions and name open challenges needed to be tackled in NL-based software service composition.


Text Broom: A ML-based Tool to Detect and Highlight Privacy Breaches in Physician Reviews: An Insight into Our Current Work

F.S. Bäumer, M. Geierhos. Text Broom: A ML-based Tool to Detect and Highlight Privacy Breaches in Physician Reviews: An Insight into Our Current Work. In: European Conference on Data Analysis (ECDA 2018), Paderborn, Germany, 2018.


Towards a Multi-Stage Approach to Detect Privacy Breaches in Physician Reviews

F.S. Bäumer, J. Kersting, M. Orlikowski, M. Geierhos, in: Proceedings of the Posters and Demos Track of the 14th International Conference on Semantic Systems co-located with the 14th International Conference on Semantic Systems (SEMANTiCS 2018), CEUR-WS.org, 2018

Physician Review Websites allow users to evaluate their experiences with health services. As these evaluations are regularly contextualized with facts from users’ private lives, they often accidentally disclose personal information on the Web. This poses a serious threat to users’ privacy. In this paper, we report on early work in progress on “Text Broom”, a tool to detect privacy breaches in user-generated texts. For this purpose, we conceptualize a pipeline which combines methods of Natural Language Processing such as Named Entity Recognition, linguistic patterns and domain-specific Machine Learning approaches which have the potential to recognize privacy violations with wide coverage. A prototypical web application is openly accesible.


Flexible Ambiguity Resolution and Incompleteness Detection in Requirements Descriptions via an Indicator-based Configuration of Text Analysis Pipelines

F.S. Bäumer, M. Geierhos, in: Proceedings of the 51st Hawaii International Conference on System Sciences, 2018, pp. 5746-5755

Natural language software requirements descriptions enable end users to formulate their wishes and expectations for a future software product without much prior knowledge in requirements engineering. However, these descriptions are susceptible to linguistic inaccuracies such as ambiguities and incompleteness that can harm the development process. There is a number of software solutions that can detect deficits in requirements descriptions and partially solve them, but they are often hard to use and not suitable for end users. For this reason, we develop a software system that helps end-users to create unambiguous and complete requirements descriptions by combining existing expert tools and controlling them using automatic compensation strategies. In order to recognize the necessity of individual compensation methods in the descriptions, we have developed linguistic indicators, which we present in this paper. Based on these indicators, the whole text analysis pipeline is ad-hoc configured and thus adapted to the individual circumstances of a requirements description.


Unschärfe bei der Interpretation natürlichsprachlicher Anforderungsbeschreibungen

M. Geierhos, in: Unschärfe - Der Umgang mit fehlender Eindeutigkeit,1st ed., Ferdinand Schöningh, 2018, pp. 111-128

Präzision ist kein Zufall. Sie wird vom Menschen herbeigeführt, indem Übereinstimmung mit einem Standard oder einem akzeptierten Wert angestrebt wird oder die Reproduzierbarkeit von Experimenten möglichst hoch sein muss. Was aber tun, wenn Präzision mangels verfügbarer Informationen nicht hergestellt werden kann? Wie gehen Wissenschaft und Kunst dann mit dieser fehlenden Eindeutigkeit um? Die Autorinnen und Autoren dieses Sammelbandes beleuchten aus der Perspektive ihrer jeweiligen Fachdisziplin die Chancen bei der Berücksichtigung von Unschärfe(n) in ihrer Forschung und Kunst. Denn Unschärfe ist Realität.


How to Deal with Inaccurate Service Requirements? Insights in Our Current Approach and New Ideas

F.S. Bäumer, M. Geierhos, in: Joint Proceedings of REFSQ-2018 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 23rd International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2018), CEUR-WS.org, 2018

The main idea in On-The-Fly Computing is to automatically compose existing software services according to the wishes of end-users. However, since user requirements are often ambiguous, vague and incomplete, the selection and composition of suitable software services is a challanging task. In this paper, we present our current approach to improve requirement descriptions before they are used for software composition. This procedure is fully automated, but also has limitations, for example, if necessary information is missing. In addition, and in response to the limitations, we provide insights into our above-mentioned current work that combines the existing optimization approach with a chatbot solution.


CORDULA: Software Requirements Extraction Utilizing Chatbot as Communication Interface

E. Friesen, F.S. Bäumer, M. Geierhos, in: Joint Proceedings of REFSQ-2018 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 23rd International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2018), CEUR-WS.org, 2018

Natural language requirement descriptions are often unstructured, contradictory and incomplete and are therefore challenging for automatic processing. Although many of these deficits can be compensated by means of Natural Language Processing, there still remain cases where interaction with end-users is necessary for clarification. In this paper, we present our idea of using chatbot technology to establish end-user communication in order to support the automatic compensation of some deficits in natural language requirement descriptions.


Back to Basics: Extracting Software Requirements with a Syntactic Approach

M. Caron, F.S. Bäumer, M. Geierhos, in: Joint Proceedings of REFSQ-2018 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 23rd International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2018), CEUR-WS.org, 2018

As our world grows in complexity, companies and employees alike need, more than ever before, solutions tailored to their exact needs. Since such tools cannot always be purchased off-the-shelf and need to be designed from the ground up, developers rely on software requirements. In this paper, we present our vision of a syntactic rule-based extraction tool for software requirements specification documents. In contrast to other methods, our tool will allow stakeholders to express their needs and wishes in unfiltered natural language, which we believe is essential for non-expert users.


Supporting the Cognitive Process in Annotation Tasks

N. Seemann, M. Geierhos, M. Merten, D. Tophinke, M.D. Wever, E. Hüllermeier. Supporting the Cognitive Process in Annotation Tasks. In: Postersession Computerlinguistik der 40. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, Stuttgart, Germany, 2018.


2017

Privacy Matters: Detecting Nocuous Patient Data Exposure in Online Physician Reviews

.F.S. Baeumer, N. Grote, J. Kersting, M. Geierhos, in: Proceedings of the 23rd International Conference on Information and Software Technologies, Communications in Computer and Information Science, Springer International Publishing, 2017, pp. 77-89


Using Sentiment Analysis on Local Up-to-the-Minute News: An Integrated Approach

J. Kersting, M. Geierhos, in: Proceedings of the 23rd International Conference on Information and Software Technologies, Communications in Computer and Information Science, Springer International Publishing, 2017, pp. 528-538


Guesswork? Resolving Vagueness in User-Generated Software Requirements

M. Geierhos, .F.S. Baeumer , in: Partiality and Underspecification in Information, Languages, and Knowledge, Cambridge Scholars Publishing, 2017, pp. 65-108


Studying Software Descriptions in SourceForge and App Stores for a better Understanding of real-life Requirements

F.S. Baeumer, M. Dollmann, M. Geierhos, in: Proceedings of the 2nd ACM SIGSOFT International Workshop on App Market Analytics, ACM, 2017, pp. 19-25


Internet of Things Architecture for Handling Stream Air Pollution Data

J. Kersting, M. Geierhos, H. Jung, T. Kim, in: Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security, SCITEPRESS, 2017, pp. 117-124

In this paper, we present an IoT architecture which handles stream sensor data of air pollution. Particle pollution is known as a serious threat to human health. Along with developments in the use of wireless sensors and the IoT, we propose an architecture that flexibly measures and processes stream data collected in real-time by movable and low-cost IoT sensors. Thus, it enables a wide-spread network of wireless sensors that can follow changes in human behavior. Apart from stating reasons for the need of such a development and its requirements, we provide a conceptual design as well as a technological design of such an architecture. The technological design consists of Kaa and Apache Storm which can collect air pollution information in real-time and solve various problems to process data such as missing data and synchronization. This enables us to add a simulation in which we provide issues that might come up when having our architecture in use. Together with these issues, we state r easons for choosing specific modules among candidates. Our architecture combines wireless sensors with the Kaa IoT framework, an Apache Kafka pipeline and an Apache Storm Data Stream Management System among others. We even provide open-government data sets that are freely available.


Annotation Challenges for Reconstructing the Structural Elaboration of Middle Low German

N. Seemann, M. Merten, M. Geierhos, D. Tophinke, E. Hüllermeier, in: Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Association for Computational Linguistics (ACL), 2017, pp. 40-45

In this paper, we present the annotation challenges we have encountered when working on a historical language that was undergoing elaboration processes. We especially focus on syntactic ambiguity and gradience in Middle Low German, which causes uncertainty to some extent. Since current annotation tools consider construction contexts and the dynamics of the grammaticalization only partially, we plan to extend CorA – a web-based annotation tool for historical and other non-standard language data – to capture elaboration phenomena and annotator unsureness. Moreover, we seek to interactively learn morphological as well as syntactic annotations.


Using Sentiment Analysis on Local Up-to-the-Minute News: An Integrated Approach

J. Kersting, M. Geierhos, in: Information and Software Technologies: 23rd International Conference, ICIST 2017, Druskininkai, Lithuania, October 12–14, 2017, Proceedings, Springer, 2017, pp. 528-538

In this paper, we present a search solution that makes local news information easily accessible. In the era of fake news, we provide an approach for accessing news information through opinion mining. This enables users to view news on the same topics from different web sources. By applying sentiment analysis on social media posts, users can better understand how issues are captured and see people’s reactions. Therefore, we provide a local search service that first localizes news articles, then visualizes their occurrence according to the frequency of mentioned topics on a heatmap and even shows the sentiment score for each text.


Privacy Matters: Detecting Nocuous Patient Data Exposure in Online Physician Reviews

F.S. Bäumer, N. Grote, J. Kersting, M. Geierhos, in: Information and Software Technologies: 23rd International Conference, ICIST 2017, Druskininkai, Lithuania, October 12–14, 2017, Proceedings, Springer, 2017, pp. 77-89

Consulting a physician was long regarded as an intimate and private matter. The physician-patient relationship was perceived as sensitive and trustful. Nowadays, there is a change, as medical procedures and physicians consultations are reviewed like other services on the Internet. To allay user’s privacy doubts, physician review websites assure anonymity and the protection of private data. However, there are hundreds of reviews that reveal private information and hence enable physicians or the public to identify patients. Thus, we draw attention to the cases when de-anonymization is possible. We therefore introduce an approach that highlights private information in physician reviews for users to avoid an accidental disclosure. For this reason, we combine established natural-language-processing techniques such as named entity recognition as well as handcrafted patterns to achieve a high detection accuracy. That way, we can help websites to increase privacy protection by recognizing and uncovering apparently uncritical information in user-generated texts.


Guesswork? Resolving Vagueness in User-Generated Software Requirements

M. Geierhos, F.S. Bäumer, in: Partiality and Underspecification in Information, Languages, and Knowledge,1st ed., Cambridge Scholars Publishing, 2017, pp. 65-108

In recent years, there has been a proliferation of technological developments that incorporate processing of human language. Hardware and software can be specialized for designated subject areas, and computational devices are designed for a widening variety of applications. At the same time, new areas and applications are emerging by demanding intelligent technology enhanced by the processing of human language. These new applications often perform tasks which handle information, and they have a capacity to reason, using both formal and human language. Many sub-areas of Artificial Intelligence demand integration of Natural Language Processing, at least to some degree. Furthermore, technologies require coverage of known as well as unknown agents, and tasks with potential variations. All of this takes place in environments with unknown factors. The book covers theoretical work, advanced applications, approaches, and techniques for computational models of information, reasoning systems, and presentation in language. The book promotes work on intelligent natural language processing and related models of information, thought, reasoning, and other cognitive processes. The topics covered by the chapters prompt further research and developments of advanced systems in the areas of logic, computability, computational linguistics, cognitive science, neuroscience of language, robotics, and artificial intelligence, among others.


Studying Software Descriptions in SourceForge and App Stores for a better Understanding of real-life Requirements

F.S. Bäumer, M. Dollmann, M. Geierhos, in: Proceedings of the 2nd ACM SIGSOFT International Workshop on App Market Analytics, ACM, 2017, pp. 19-25

Users prefer natural language software requirements because of their usability and accessibility. Many approaches exist to elaborate these requirements and to support the users during the elicitation process. But there is a lack of adequate resources, which are needed to train and evaluate approaches for requirement refinement. We are trying to close this gap by using online available software descriptions from SourceForge and app stores. Thus, we present two real-life requirements collections based on online-available software descriptions. Our goal is to show the domain-specific characteristics of content words describing functional requirements. On the one hand, we created a semantic role-labeled requirements set, which we use for requirements classification. On the other hand, we enriched software descriptions with linguistic features and dependencies to provide evidence for the context-awareness of software functionalities.


From User Demand to Software Service: Using Machine Learning to Automate the Requirements Specification Process

L. van Rooijen, F.S. Bäumer, M.C. Platenius, M. Geierhos, H. Hamann, G. Engels, in: 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW), IEEE, 2017, pp. 379-385

Bridging the gap between informal, imprecise, and vague user requirements descriptions and precise formalized specifications is the main task of requirements engineering. Techniques such as interviews or story telling are used when requirements engineers try to identify a user's needs. The requirements specification process is typically done in a dialogue between users, domain experts, and requirements engineers. In our research, we aim at automating the specification of requirements. The idea is to distinguish between untrained users and trained users, and to exploit domain knowledge learned from previous runs of our system. We let untrained users provide unstructured natural language descriptions, while we allow trained users to provide examples of behavioral descriptions. In both cases, our goal is to synthesize formal requirements models similar to statecharts. From requirements specification processes with trained users, behavioral ontologies are learned which are later used to support the requirements specification process for untrained users. Our research method is original in combining natural language processing and search-based techniques for the synthesis of requirements specifications. Our work is embedded in a larger project that aims at automating the whole software development and deployment process in envisioned future software service markets.


Semantic Annotation of Software Requirements with Language Frame

Y. Kim, S.. Lee, M. Dollmann, M. Geierhos, International Journal of Software Engineering for Smart Device (2017), 4(2), pp. 1-6

An end user generally writes down software requirements in ambiguous expressions using natural language; hence, a software developer attuned to programming language finds it difficult to understand th meaning of the requirements. To solve this problem we define semantic categories for disambiguation and classify/annotate the requirement into the categories by using machine-learning models. We extensively use a language frame closely related to such categories for designing features to overcome the problem of insufficient training data compare to the large number of classes. Our proposed model obtained a micro-average F1-score of 0.75, outperforming the previous model, REaCT.


Internet of Things Architecture for Handling Stream Air Pollution Data

J. Kersting, M. Geierhos, .H. Jung, T. Kim, in: Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security (IoTBDS 2017), SciTePress, 2017, pp. 117-124


2016

Sentiment analysis

M. Geierhos, in: Encyclopedia {\ "a} the business informatics, 9th ed., GITO-Verlag, 2016


Webmonitoring

M. Geierhos, in: Encyclopedia {\ "a} the business informatics, 9th ed., GITO-Verlag, 2016


Crawler (focused / not focused)

M. Geierhos, in: Encyclopedia {\ "a} the business informatics , 9th ed., GITO-Verlag, 2016


Text Mining

M. Geierhos, in: Encyclopedia {\ "a} the business informatics , 9th ed., GITO-Verlag, 2016


Running Out of Words: How Similar User Stories Can Help to Elaborate Individual Natural Language Requirement Descriptions

F.S. Baeumer, M. Geierhos, in: Proceedings of the 22nd Conference on Information and Software Technologies (ICIST 2016), Springer International Publishing, 2016, pp. 549-558


How to Complete Customer Requirements Using Concept Expansion for Requirement Refinement

M. Geierhos, .F.S. Baeumer, in: Proceedings of the 21st International Conference on Applications of Natural Language to Information Systems (NLDB 2016), Springer, 2016, pp. 37-47


Wie verhalten sich Aktion{\"a}re bei Unternehmenszusammenschl{\"u}ssen? Modellierung sprachlicher Muster zur Analyse treibender Faktoren bei der Berichterstattung

S.C.. Stotz, M. Geierhos, in: DHd 2016: Modellierung - Vernetzung - Visualisierung. Die Digital Humanities als f{\"a}cher{\"u}bergreifendes Forschungsparadigma. Konferenzabstracts, Universit{\"a}t Leipzig, 2016, pp. 378-381


Wie verhalten sich Aktionäre bei Unternehmenszusammenschlüssen? Modellierung sprachlicher Muster zur Analyse treibender Faktoren bei der Berichterstattung

S. Stotz, M. Geierhos, in: DHd 2016: Modellierung - Vernetzung - Visualisierung. Die Digital Humanities als fächerübergreifendes Forschungsparadigma. Konferenzabstracts, Universität Leipzig, 7. bis 12. März 2016, Nisaba-Verlag, 2016, pp. 378-381

Welche Informationen über Unternehmenszusammenschlüsse werden in Zeitungsnachrichten vermittelt, und wie können diese Informationen automatisch extrahiert werden? Dies soll am Beispiel des Verhaltens von Aktionären während eines Zusammenschlusses ermittelt werden. Dazu werden die wichtigsten Aussagen über das Votum der Aktionäre im Hinblick auf eine automatische Erkennung sprachlich analysiert. Im Fokus stehen dabei die Berichte über Aktionärsabstimmungen hinsichtlich der Annahme bzw. Ablehnung eines Übernahmeangebots.


Webmonitoring

M. Geierhos, in: Enzyklopädie der Wirtschaftsinformatik, 9th ed., GITO-Verlag, 2016


Sentimentanalyse

M. Geierhos, in: Enzyklopädie der Wirtschaftsinformatik, 9th ed., GITO-Verlag, 2016


Text Mining

M. Geierhos, in: Enzyklopädie der Wirtschaftsinformatik, 9th ed., GITO-Verlag, 2016


Crawler (fokussiert / nicht fokussiert)

M. Geierhos, in: Enzyklopädie der Wirtschaftsinformatik, 9th ed., GITO-Verlag, 2016


On- and Off-Topic Classification and Semantic Annotation of User-Generated Software Requirements

M. Dollmann, M. Geierhos, in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics (ACL), 2016, pp. 1807-1816

Users prefer natural language software requirements because of their usability and accessibility. When they describe their wishes for software development, they often provide off-topic information. We therefore present an automated approach for identifying and semantically annotating the on-topic parts of the given descriptions. It is designed to support requirement engineers in the requirement elicitation process on detecting and analyzing requirements in user-generated content. Since no lexical resources with domain-specific information about requirements are available, we created a corpus of requirements written in controlled language by instructed users and uncontrolled language by uninstructed users. We annotated these requirements regarding predicate-argument structures, conditions, priorities, motivations and semantic roles and used this information to train classifiers for information extraction purposes. The approach achieves an accuracy of 92% for the on- and off-topic classification task and an F1-measure of 72% for the semantic annotation.


How to Complete Customer Requirements: Using Concept Expansion for Requirement Refinement

M. Geierhos, F.S. Bäumer, in: Proceedings of the 21st International Conference on Applications of Natural Language to Information Systems (NLDB), Springer, 2016, pp. 37-47

One purpose of requirement refinement is that higher-level requirements have to be translated to something usable by developers. Since customer requirements are often written in natural language by end users, they lack precision, completeness and consistency. Although user stories are often used in the requirement elicitation process in order to describe the possibilities how to interact with the software, there is always something unspoken. Here, we present techniques how to automatically refine vague software descriptions. Thus, we can bridge the gap by first revising natural language utterances from higher-level to more detailed customer requirements, before functionality matters. We therefore focus on the resolution of semantically incomplete user-generated sentences (i.e. non-instantiated arguments of predicates) and provide ontology-based gap-filling suggestions how to complete unverbalized information in the user’s demand.


Running out of Words: How Similar User Stories Can Help to Elaborate Individual Natural Language Requirement Descriptions

F.S. Bäumer, M. Geierhos, in: Proceedings of the 22nd International Conference on Information and Software Technologies (ICIST), Springer, 2016, pp. 549-558

While requirements focus on how the user interacts with the system, user stories concentrate on the purpose of software features. But in practice, functional requirements are also described in user stories. For this reason, requirements clarification is needed, especially when they are written in natural language and do not stick to any templates (e.g., "as an X, I want Y so that Z ..."). However, there is a lot of implicit knowledge that is not expressed in words. As a result, natural language requirements descriptions may suffer from incompleteness. Existing approaches try to formalize natural language or focus only on entirely missing and not on deficient requirements. In this paper, we therefore present an approach to detect knowledge gaps in user-generated software requirements for interactive requirement clarification: We provide tailored suggestions to the users in order to get more precise descriptions. For this purpose, we identify not fully instantiated predicate argument structures in requirements written in natural language and use context information to realize what was meant by the user.


2015

A System for Uncovering Latent Connectivity of Health Care Providers in Online Reviews

F.S. Baeumer, M. Geierhos, S. Schulze, in: Information and Software Technologies: 21th International Conference, ICIST 2015, Druskininkai, Lithuania, October 15-16, 2015, Proceedings,1st ed., Springer International Publishing, 2015, pp. 3-15


Find a Physician by Matching Medical Needs described in your Own Words

F.S. Baeumer, M. Dollmann, M. Geierhos, Elsevier, 2015, pp. 417-424


Filtering Reviews by Random Individual Error

M. Geierhos, .F.S.. Baeumer, S. Sabine, S. Valentina, in: Current Approaches in Applied Artificial Intelligence, Lecture Notes in Computer Science, Springer International Publishing Switzerland, 2015, pp. 305-315


I grade what I get but write what I think." Inconsistency Analysis in Patients‘ Reviews

M. Geierhos, .F.S.. Baeumer, S. Schulze, .V. Stu{\ss}, in: ECIS 2015 Completed Research Papers, 2015


Identifikation kognitiver Effekte in Online-Bewertungen

M. Geierhos, V. Stu{\ss}, 2015, pp. 239-243



What did you mean? Facing the Challenges of User-generated Software Requirements

M. Geierhos, S. Schulze, .F.S. Baeumer, in: Proceedings of the 7th International Conference on Agents and Artificial Intelligence, 2015, pp. 277-283


The satisfied patient 2.0: Analysis of anonymous doctor reviews to generate a patient mood

M. Geierhos, S. Schulze, ForschungsForum Paderborn (2015), 18, pp. 14-19


The Satisfied Patient 2.0: Analysis of anonymous doctor ratings to gain insight into patient sentimen

M. Geierhos, S. Schulze, ForschungsForum Paderborn (2015), 18, pp. 14-19


Filtering Reviews by Random Individual Error

M. Geierhos, F.S. Bäumer, S. Schulze, V. Stuß, in: Proceedings of the 28th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE 2015), Springer, 2015, pp. 305-315

Opinion mining from physician rating websites depends on the quality of the extracted information. Sometimes reviews are user-error prone and the assigned stars or grades contradict the associated content. We therefore aim at detecting random individual error within reviews. Such errors comprise the disagreement in polarity of review texts and the respective ratings. The challenges that thereby arise are (1) the content and sentiment analysis of the review texts and (2) the removal of the random individual errors contained therein. To solve these tasks, we assign polarities to automatically recognized opinion phrases in reviews and then check for divergence in rating and text polarity. The novelty of our approach is that we improve user-generated data quality by excluding error-prone reviews on German physician websites from average ratings.


Identifikation kognitiver Effekte in Online-Bewertungen

V. Stuß, M. Geierhos, in: DHd 2015: Book of Abstracts, ZIM-ACDH, 2015, pp. 239-243



Der zufriedene Patient 2.0: Analyse anonymer Arztbewertungen zur Generierung eines Patientenstimmungsbildes

M. Geierhos, S. Schulze, ForschungsForum Paderborn (2015), 18, pp. 14-19

Der Erfahrungsaustausch zwischen Patienten findet heutzutage zunehmend im Internet statt. Bewertungsportale wie jameda, DocInsider oder imedo.de bieten Patienten und deren Angehörigen die Möglichkeit, anonym Beschwerden zu äußern oder Weiterempfehlungen auszusprechen. Gleichzeitig ermöglichen diese hunderttausend Individualerfahrungen die Erhebung der Patientenzufriedenheit sowie die Überprüfung bestehender Gerüchte, wie z. B. dass Privatpatienten schneller einen Arzttermin bekommen und weniger Zeit im Wartezimmer verbringen. Die Analyse anonymer Online-Arztbewertungen kann nur dann erfolgreich sein, wenn bei der Interpretation der Patientenerfahrungsberichte berücksichtigt wird, dass behandlungsqualitätsunabhängige Faktoren Auswirkungen auf die subjektive Bewertung und das Beschwerdeverhalten haben. Ein neuer Ansatz ist daher, bedeutende Indikatoren für die Patientenzufriedenheit im Web 2.0 zur Generierung eines detaillierten Erfahrungs- und Patientenstimmungsbildes unter Berücksichtigung demographischer und regionaler Einflüsse zu ermitteln.


"I grade what I get but write what I think." Inconsistency Analysis in Patients' Reviews

M. Geierhos, F.S. Bäumer, S. Schulze, V. Stuß, in: ECIS 2015 Completed Research Papers, Elsevier, 2015

Received medical services are increasingly discussed and recommended on physician rating websites (PRWs). The reviews and ratings on these platforms are valuable sources of information for patient opinion mining. In this paper, we have tackled three issues that come along with inconsistency analysis on PRWs: (1) Natural language processing of user-generated reviews, (2) the disagreement in polarity of review text and its corresponding numerical ratings (individual inconsistency) and (3) the differences in patients’ rating behavior for the same service category (e.g. ‘treatment’) expressed by varying grades on the entire data set (collective inconsistency). Thus, the basic idea is first to identify relevant opinion phrases that describe service categories and to determine their polarity. Subsequently, the particular phrase has to be assigned to its corresponding numerical rating category before checking the (dis-)agreement of polarity values. For this purpose, several local grammars for the pattern-based analysis as well as domain-specific dictionaries for the recognition of entities, aspects and polarity were applied on 593,633 physician reviews from both German PRWs jameda.de and docinsider.de. Furthermore, our research contributes to content quality improvement of PRWs because we provide a technique to detect inconsistent reviews that could be ignored for the computation of average ratings.


Der zufriedene Patient 2.0: Analyse anonymer Arztbewertungen im Web 2.0

M. Geierhos, S. Schulze, F.S. Bäumer, Verbraucherzentrale NRW/Kompetenzzentrum Verbraucherforschung NRW, 2015, pp. 18

Der Erfahrungsaustausch zwischen Patienten findet verstärkt über Arztbewertungsportale statt. Dabei ermöglicht die Anonymität des Netzes ein weitestgehend ehrliches Beschwerdeverhalten, von dem das sensible Arzt-Patienten-Vertrauensverhältnis unbeschädigt bleibt. Im Rahmen des vorliegenden Beitrags wurden anonyme Arztbewertungen im Web 2.0 automatisiert ausgewertet, um Einflussfaktoren auf das Beschwerdeverhalten deutscher Patienten zu bestimmen und in der Gesellschaft vermeintlich etablierte „Patienten-Mythen“ aufzuklären. Die Aufdeckung von Irrtümern und Zufriedenheitsindikatoren soll längerfristig dazu dienen, Patientenäußerungen differenzierter zu interpretieren und somit zu einer nachhaltigen Verbesserung der Arzt-Patienten-Beziehung beizutragen.


Find a Physician by Matching Medical Needs described in your Own Words

F.S. Bäumer, M. Dollmann, M. Geierhos, in: The 6th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2015) / The 5th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH-2015) / Affiliated Workshops, Elsevier, 2015, pp. 417-424

The individual search for information about physicians on Web 2.0 platforms can affect almost all aspects of our lives. People can directly access physician rating websites via web browsers or use any search engine to find physician reviews and ratings filtered by location resp. specialty. However, sometimes keyword search does not meet user needs because of the disagreement of users’ common terms queries for symptoms and the widespread medical terminology. In this paper, we present the prototype of a specialised search engine that overcomes this by indexing user-generated content (i.e., review texts) for physician discovery and provides automatic suggestions as well as an appropriate visualisation. On the one hand, we consider the available numeric physician ratings as sorting criterion for the ranking of query results. Furthermore, we extended existing ranking algorithms with respect to domain-specific types and physicians ratings on the other hand. We gathered more than 860,000 review texts and collected more than 213,000 physician records. A random test shows that about 19.7% of 5,100 different words in total are health- related and partly belong to consumer health vocabularies. Our evaluation results show that the query results fit user's particular health issues when seeking for physicians.


A System for Uncovering Latent Connectivity of Health Care Providers in Online Reviews

F.S. Bäumer, M. Geierhos, S. Schulze, in: Information and Software Technologies. 21st International Conference, ICIST 2015, Druskininkai, Lithuania, October 15-16, 2015. Proceedings, Springer, 2015, pp. 3-15

The contacts a health care provider (HCP), like a physician, has to other HCPs is perceived as a quality characteristic by patients. So far, only the German physician rating website jameda.de gives information about the interconnectedness of HCPs in business networks. However, this network has to be maintained manually and is thus incomplete. We therefore developed a system for uncovering latent connectivity of HCPs in online reviews to provide users with more valuable information about their HCPs. The overall goal of this approach is to extend already existing business networks of HCPs by integrating connections that are newly discovered by our system. Our most recent evaluation results are promising: 70.8 % of the connections extracted from the reviews texts were correctly identified and in total 3,788 relations were recognized that have not been displayed in jameda.de’s network before.


Understanding the Patient 2.0: Gaining Insight into Patients' Rating Behavior by User-generated Physician Review Mining

M. Geierhos, F.S. Bäumer, S. Schulze, C. Klotz, in: Modeling and Using Context. 9th International and Interdisciplinary Conference, CONTEXT 2015, Lanarca, Cyprus, November 2-6, 2015. Proceedings, Springer, 2015, pp. 159-171

Patients 2.0 increasingly inform themselves about the quality of medical services on physician rating websites. However, little is known about whether the reviews and ratings on these websites truly reflect the quality of services or whether the ratings on these websites are rather influenced by patients’ individual rating behavior. Therefore, we investigate more than 790,000 physician reviews on Germany’s most used physician rating website jameda.de. Our results show that patients’ ratings do not only reflect treatment quality but are also influenced by treatment quality independent factors like age and complaint behavior. Hence, we provide evidence that users should be well aware of user specific rating distortions when intending to make their physician choice based on these ratings.


What did you mean? Facing the Challenges of User-generated Software Requirements

M. Geierhos, S. Schulze, F.S. Bäumer, in: Proceedings of the 7th International Conference on Agents and Artificial Intelligence (ICAART), Special Session on Partiality, Underspecification, and Natural Language Processing (PUaNLP 2015), SciTePress - Science and Technology Publications, 2015, pp. 277-283

Existing approaches towards service composition demand requirements of the customers in terms of service templates, service query profiles, or partial process models. However, addressed non-expert customers may be unable to fill-in the slots of service templates as requested or to describe, for example, pre- and postconditions, or even have difficulties in formalizing their requirements. Thus, our idea is to provide non-experts with suggestions how to complete or clarify their requirement descriptions written in natural language. Two main issues have to be tackled: (1) partial or full inability (incapacity) of non-experts to specify their requirements correctly in formal and precise ways, and (2) problems in text analysis due to fuzziness in natural language. We present ideas how to face these challenges by means of requirement disambiguation and completion. Therefore, we conduct ontology-based requirement extraction and similarity retrieval based on requirement descriptions that are gathered from App marketplaces. The innovative aspect of our work is that we support users without expert knowledge in writing their requirements by simultaneously resolving ambiguity, vagueness, and underspecification in natural language.


2014

System Thinking: Crafting Scenarios for Prescriptive Analytics

J. Weber, C.. Minhee, .M. Lee, S. Song, M. Geierhos, H. Jung, in: Proceedings of the First International Workshop on Patent Mining and Its Applications (IPAMIN) 2014, CEUR-WS.org, 2014


SentiBA: Lexicon-based Sentiment Analysis on German Product Reviews

M.. Dollmann, M. Geierhos, in: Konvens 2014 Workshop Proceedings, 2014, pp. 185-191


Linked Open Data System for Scientific Data Sets

F.S. Baeumer, J. Gim, D. Jeong, M. Geierhos, H. Jung, in: Proceedings of the First International Workshop on Patent Mining and Its Applications (IPAMIN) 2014, 2014



What is research ethics observing? An interdisciplinary {\ "a} re discussion between computational linguistics and qualitative-constructivist social research}

M. Geierhos, J. Siri, in: Proceedings Research Ethics in Qualitative and Quantitative Social Research, 2014


Comparative study on disambiguating acronyms in the scientific papers using the open knowledge base

D. Jeong, J. Gim, H. Jung, M. Geierhos, F.S. Baeumer, in: Proceedings of the 9th Asia Pacific International Conference on Information Science and Technology, 2014, pp. 369-371


Towards a Local Grammar-based Persondata Generator for Wikipedia Biographies

M. Geierhos, in: Penser le Lexique-Grammaire, Honoré Champion, 2014, pp. 411-420


Towards a Local Grammar-based Persondata Generator for Wikipedia Biographies

M. Geierhos, in: Penser le Lexique-Grammaire, Honoré Champion, 2014, pp. 411-420

Finding information about people in the World Wide Web is one of the most common activities of Internet users. It is now impossible to manually analyze all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. The Wikipedia community still puts much effort in manually adding structured data to biographical articles, the so-called {{Persondata}} template. Thanks to this kind of metadata, semantically-enriched information concerning the biographee (e.g. name, date of birth, place of birth) can be extracted and processed by search engines. But it is a rather time-consuming task and users quite often forget to add this template: some biographies contain persondata, others do not. There is considerably less work done on developing approaches to automatically enhance English Wikipedia biographies with persondata and therefore improve the quality of structured user contributions. Within this paper, we describe our method to automatically generate persondata from biographical information in Wikipedia articles.


Comparative study on disambiguating acronyms in the scientific papers using the open knowledge base

D. Jeong, J. Gim, H. Jung, M. Geierhos, F.S. Bäumer, in: Conference Proceedings of the 9th Asia Pacific International Conference on Information Science and Technology (APIC-IST 2014), 2014, pp. 369-371

In this paper, we focus on the acronym representation, the concept of abbreviation of major terminology. To this end, we try to find the most efficient method to disambiguate the sense of the acronym. Comparing the various feature types, we found that using single noun (NN) overwhelmingly outperformed noun phrase (NP) base. Moreover, the result also showed that collocation information (CL) was not efficient for enhancing performance considering a huge extra data processing. We expect to apply the open knowledge base Wikipedia to scholarly service to enhance the quality of the local knowledge base and to develop value-added services.


The same but not the same - Challenges in comparing patient opinions

M. Geierhos, S. Schulze. The same but not the same - Challenges in comparing patient opinions. In: Challenges for Consumer Research and Consumer Policy in Europe, Bonn, Germany, 2014.


Was beobachtet die Forschungsethik? Eine interdisziplinäre Diskussion zwischen Computerlinguistik und qualitativ-konstruktivistischer Sozialforschung

M. Geierhos, J. Siri. Was beobachtet die Forschungsethik? Eine interdisziplinäre Diskussion zwischen Computerlinguistik und qualitativ-konstruktivistischer Sozialforschung. In: Forschungsethik in der qualitativen und quantitativen Sozialforschung, Munich, Germany, 2014.


System Thinking: Crafting Scenarios for Prescriptive Analytics

J. Weber, C. Minhee, M. Lee, S. Song, M. Geierhos, H. Jung, in: Proceedings of the First International Workshop on Patent Mining and Its Applications (IPaMin 2014) co-located with Konvens 2014, CEUR-WS.org, 2014

This paper focuses on the first step in combining prescriptive analytics with scenario techniques in order to provide strategicdevelopment after the useof InSciTe, a data prescriptive analytics application. InSciTe supports the improvement of researchers‘ individual performance by recommending new research directions. Standardized influential factors are presented as a foundation for automated scenario modelling such as the prototypical report generation function of InSciTe. Additionally, a use-case is shown which validatesthe potential of the standardized influential factors for raw scenario development.


SentiBA: Lexicon-based Sentiment Analysis on German Product Reviews

M. Dollmann, M. Geierhos, in: Workshop Proceedings of the 12th Edition of the KONVENS Conference, Universitätsverlag Hildesheim, 2014, pp. 185-191

In this paper, we describe our system developed for the GErman SenTiment AnaLysis shared Task (GESTALT) for participation in the Maintask 2: Subjective Phrase and Aspect Extraction from Product Reviews. We present a tool, which identifies subjective and aspect phrases in German product reviews. For the recognition of subjective phrases, we pursue a lexicon-based approach. For the extraction of aspect phrases from the reviews, we consider two possible ways: Besides the subjectivity and aspect look-up, we also implemented a method to establish which subjective phrase belongs to which aspect. The system achieves better results for the recognition of aspect phrases than for the subjective identification.


Linked Open Data System for Scientific Data Sets

F.S. Bäumer, J. Gim, D. Jeong, M. Geierhos, H. Jung, in: Proceedings of the First International Workshop on Patent Mining and Its Applications (IPaMin 2014) co-located with Konvens 2014, CEUR-WS.org, 2014

In this paper, we present a system which makes scientific data available following the linked open data principle using standards like RDF and URI as well as the popular D2R server (D2R) and the customizable D2RQ mapping language. Our scientific data sets include acronym data and expansions, as well as researcher data such as author name, affiliation, coauthors, and abstracts. The system can easily be extended to other records. Regarding this, a domain adaptation to patent mining seems possible. For this reason, obvious similarities and differences are presented here. The data set is collected from several different providers like publishing houses and digital libraries, which follow different standards in data format and structure. Most of them are not supporting semantic web technologies, but the legacy HTML standard. The integration of these large amounts of scientific data into the Semantic Web is challenging and it needs flexible data structures to access this information and interlink them. Based on these data sets, we will be able to derive a general technology trend as well as the individual research domain for each researcher. The goal of our Linked Open Data System for scientific data is to provide access to this data set for other researchers using the Web of Linked Data. Furthermore we implemented an application for visualization, which allows usto explorethe relations between single data sets.


2012

Scientific Literature Retrieval based on Terminological Paraphrases using Predicate Argument Tuple

S. Choi, S. Song, H. Jung, M. Geierhos, S.H. Myaeng, in: Information Science and Industrial Applications: Proceedings, International Conference, ISI 2012, Cebu, Philippines, May 2012, SERSC, 2012, pp. 371-378

The conceptual condensability of technical terms permits us to use them as effective queries to search scientific databases. However, authors often employ alternative expressions to represent the meanings of specific terms, in other words, Terminological Paraphrases (TPs) in the literature for certain reasons. In this paper, we propose an effective way to retrieve “de facto relevance documents” which only contain those TPs and cannot be searched by conventional models in an environment with only controlled vocabularies by adapting Predicate Argument Tuple (PAT). The experiment confirms that PAT-based document retrieval is an effective and promising method to search those kinds of documents and to improve terminology-based scientific information access models.


A Proof-of-Concept of D³ Record Mining using Domain-Dependent Data

Y.S. Lee, M. Geierhos, S. Song, H. Jung, in: Software Technology: Prooceedings, International Conference, SoftTech 2012, Cebu, Philippines, May 2012, SERSC, 2012, pp. 134-139

Our purpose is to perform data record extraction from onlineevent calendars exploiting sublanguage and domain characteristics. We therefore use so-called domain-dependent data (D³) completely based on language-specific key expressions and HTML patterns to recognize every single event given on the investigated web page. One of the most remarkable advantages of our method is that it does not require any additional classification steps based on machine learning algorithms or keyword extraction methods; it is a so-called one-step mining technique. Moreover, another important criteria is that our system is robust to DOM and layout modifications made by web designers. Thus, preliminary experimental results are provided to demonstrate proof-of-concept of such an approach tested on websites in the German opera domain. Furthermore, we could show that our proposed technique outperforms other data record mining applications run on event sites.


Customer Interaction Management goes Social: Getting Business Processes plugged in Social Networks

M. Geierhos, M. Ebrahim, in: Computational Social Networks: Tools, Perspectives and Applications, Springer, 2012, pp. 367-389

Within this chapter, we will describe a novel technical service dealing with the integration of social networking channels into existing business processes. Since many businesses are moving to online communities as a means of communicating directly with their customers, social media has to be explored as an additional communication channel between individuals and companies. While the English-speaking consumers on Facebook are more likely to respond to communication rather than to initiate communication with an organisation, some German companies already have regularly updated Facebook pages for customer service and support, e.g. Telekom. Therefore, the idea of classifying and evaluating public comments addressed to German companies is based on an existing demand. In order to maintain an active Facebook wall, the consumer posts have to be categorised and then automatically assigned to the corresponding business processes (e.g. the technical service, shipping, marketing, accounting, etc.). This service works like an issue tracking system sending e-mails to the corresponding person in charge of customer service and support. That way, business process management systems which are already used to e-mail communication can benefit from social media. This allows the company to follow general trends in customer opinions on the Internet; moreover it facilates the recording of two-sided communication for customer relationship management and the company’s response will be delivered through consumer’s preferred medium: Facebook.


2011

A Social Media Customer Service

M. Geierhos, Y.S. Lee, J. Schuster, D. Kobothanassi, M. Bargel, in: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, ACM, 2011


SCM - A Simple, Modular and Flexible Customer Interaction Management System

J. Schuster, Y.S. Lee, D.. Kobothanassi, M. Bargel, M. Geierhos, in: International Conference on Information Society (i-Society 2011), IEEE, 2011, pp. 153-158

SCM is a simple, modular and flexible system for web monitoring and customer interaction management. In our view, its main advantages are the following: It is completely web based. It combines all technologies, data, software agents and human agents involved in the monitoring and customer interaction process. It can be used for messages written in any natural language. Although the prototype of SCM is designed for classifying and processing messages about mobile-phone related problems in social networks, SCM can easily be adapted to other text types such as discussion board posts, blogs or emails. Unlike comparable systems, SCM uses linguistic technologies to classify messages and recognize paraphrases of product names. For two reasons, product name paraphrasing plays a major role in SCM: First, product names typically have many, sometimes hundreds or thousands of intralingual paraphrases. Secondly, product names have interlingual paraphrases: The same products are often called or spelt differently in different countries and/or languages. By mapping product name variants to an international canonical form, SCM allows for answering questions like Which statements are made about this mobile phone in which languages/in which social networks/in which countries/...? The SCM product name paraphrasing engine is designed in such a way that standard variants are assigned automatically, regular variants are assigned semiautomatically and idiosyncratic variants can be added manually. With this and similar features we try to realize our philosophy of simplicity, modularity and flexibility: Whatever can be done automatically is done automatically. But manual intervention is always possible and easy and it does not conflict in any way with the automatic functions of SCM.


Buy, Sell, or Hold? Information Extraction from Stock Analyst Reports

Y.S. Lee, M. Geierhos, in: Modeling and Using Context: 7th International and Interdisciplinary Conference, CONTEXT 2011, Karlsruhe, Germany, September 26-30, 2011, Proceedings, Springer, 2011, pp. 173-184

This paper presents a novel linguistic information extraction approach exploiting analysts’ stock ratings for statistical decision making. Over a period of one year, we gathered German stock analyst reports in order to determine market trends. Our goal is to provide business statistics over time to illustrate market trends for a user-selected company. We therefore recognize named entities within the very short stock analyst reports such as organization names (e.g. BASF, BMW, Ericsson), analyst houses (e.g. Gartner, Citigroup, Goldman Sachs), ratings (e.g. buy, sell, hold, underperform, recommended list) and price estimations by using lexicalized finite-state graphs, so-called local grammars. Then, company names and their acronyms respectively have to be cross-checked against data the analysts provide. Finally, all extracted values are compared and presented into charts with different views depending on the evaluation criteria (e.g. by time line). Thanks to this approach it will be easier and even more comfortable in the future to pay attention to analysts’ buy/sell signals without reading all their reports.


Processing Multilingual Customer Contacts via Social Media

M. Geierhos, Y.S. Lee, M. Bargel, in: Multilingual Resources, Multilingual Applications: Proceedings of the Conference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011, University of Hamburg, 2011, pp. 219-222

Within this paper, we will describe a new approach to customer interaction management by integrating social networking channels into existing business processes. Until now, contact center agents still read these messages and forward them to the persons in charge of customer’s in the company. But with the introduction of Web 2.0 and social networking clients are more likely to communicate with the companies via Facebook and Twitter instead of filling data in contact forms or sending e-mail requests. In order to maintain an active communication with international clients via social media, the multilingual consumer contacts have to be categorized and then automatically assigned to the corresponding business processes (e.g. technicalservice, shipping, marketing, and accounting). This allows the company to follow general trends in customer opinions on the Internet, but also record two-sided communication for customer relationship management.


Towards Multilingual Biographical Event Extraction

M. Geierhos, J. Bouraoui, P. Watrin, in: Multilingual Resources, Multilingual Applications. Proceedings of the Conference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011, University of Hamburg, 2011, pp. 45-50

Within this paper, we describe the special requirements of a semantic annotation scheme used for biographical event extraction in the framework of the Europeancollaborative research project Biographe. This annotationscheme supports interlingual search for people due to its multilingual support covering four languages such as English, German, French and Dutch.


Customer Interaction 2.0: Adopting Social Media as Customer Service Channel

M. Geierhos, Journal of Advances in Information Technology (2011), 2(4), pp. 222-233

Since customers first share their problems with a social networking community before directly addressing a company, social networking sites such as Facebook, Twitter, MySpace or Foursquare will be the interface between customer and company. For this reason, it is assumed that social networks will evolve into a common communication channel – not only between individuals but also between customers and companies. However, social networking has not yet been integrated into customer interaction management (CIM) tools. In general, a CIM application is used by the agents in a contact centre while communicating with the customers. Such systems handle communication across multiple different channels, such as e-mail, telephone, Instant Messaging, letter etc. What we do now is to integrate social networking into CIM applications by adding another communication channel. This allows the company to follow general trends in customer opinions on the Internet, but also record two-sided communication for customer service management and the company’s response will be delivered through the customer’s preferred social networking site.


Buy, Sell, or Hold? Information Extraction from Stock Analyst Reports

Y.S. Lee, M. Geierhos, Springer, 2011, pp. 173–184


SCM - A Simple, Modular and Flexible Customer Interaction Management System

J. Schuster, Y.S. Lee, D. Kobothanassi, M. Bargel, M. Geierhos, in: Proceedings of the International Conference on Information Society (i-Society 2011), IEEE, 2011, pp. 153-158


A Social Media Customer Service

M. Geierhos, Y.S. Lee, J. Schuster, D.. Kobothanass, M. Bargel, ACM, 2011


Die maximale Anzahl anzuzeigender Publikationen wurde erreicht - alle Publikationen finden Sie im Research Infomation System.

Liste im Research Information System öffnen

Lehrstuhlinhaberin von 11/2017 bis 03/2020

Nun tätig bei:
Universität der Bundeswehr München

Die Universität der Informationsgesellschaft