Sie haben Javascript deaktiviert!
Sie haben versucht eine Funktion zu nutzen, die nur mit Javascript möglich ist. Um sämtliche Funktionalitäten unserer Internetseite zu nutzen, aktivieren Sie bitte Javascript in Ihrem Browser.

Teamfoto vom 23.11.2017 (Foto: Universität Paderborn, Astrid van Kimmenade)

Dr. Frederik Simon Bäumer

Dr. Frederik Simon Bäumer

Digitale Kulturwissenschaften

Leiter - Postdoc - Leiter AG Semantische Informationsverarbeitung

+49 5251 60-5666
+49 5251 60-5672
Mersinweg 3
33100 Paderborn
Warburger Str. 100
33098 Paderborn


Postdoktorand am Lehrstuhl der Digitalen Kulturwissenschaften, AG Semantische Informationsverarbeitung

Leiter der Arbeitsgruppe "Semantische Informationsverarbeitung"
Universität Paderborn

01.08.2017 - 01.12.2017

Postdoktorand an der Juniorprofessur für Wirtschaftsinformatik, insb. Semantische Informationsverarbeitung

Heinz Nixdorf Institut
Universität Paderborn

01.08.2014 - 31.07.2017

Doktorand an der Juniorprofessur für Wirtschaftsinformatik, insb. Semantische Informationsverarbeitung

Heinz Nixdorf Institut
Universität Paderborn

Liste im Research Information System öffnen


Natural Language Processing in OTF Computing: Challenges and the Need for Interactive Approaches

F.S. Bäumer, J. Kersting, M. Geierhos, Computers (2019), 8(1)

The vision of On-the-Fly (OTF) Computing is to compose and provide software services ad hoc, based on requirement descriptions in natural language. Since non-technical users write their software requirements themselves and in unrestricted natural language, deficits occur such as inaccuracy and incompleteness. These deficits are usually met by natural language processing methods, which have to face special challenges in OTF Computing because maximum automation is the goal. In this paper, we present current automatic approaches for solving inaccuracies and incompletenesses in natural language requirement descriptions and elaborate open challenges. In particular, we will discuss the necessity of domain-specific resources and show why, despite far-reaching automation, an intelligent and guided integration of end users into the compensation process is required. In this context, we present our idea of a chat bot that integrates users into the compensation process depending on the given circumstances.

Potentielle Privatsphäreverletzungen aufdecken und automatisiert sichtbar machen

F.S. Bäumer, B. Buff, M. Geierhos. Potentielle Privatsphäreverletzungen aufdecken und automatisiert sichtbar machen. 2019.

Requirements Engineering in OTF-Computing

F.S. Bäumer, M. Geierhos, in:, MDPI, 2019

In Reviews We Trust: But Should We? Experiences with Physician Review Websites

J. Kersting, F.S. Bäumer, M. Geierhos, in: Proceedings of the 4th International Conference on Internet of Things, Big Data and Security, SciTePress - Science and Technology Publications, 2019, pp. 147-155

The ability to openly evaluate products, locations and services is an achievement of the Web 2.0. It has never been easier to inform oneself about the quality of products or services and possible alternatives. Forming one’s own opinion based on the impressions of other people can lead to better experiences. However, this presupposes trust in one’s fellows as well as in the quality of the review platforms. In previous work on physician reviews and the corresponding websites, it was observed that there occurs faulty behavior by some reviewers and there were noteworthy differences in the technical implementation of the portals and in the efforts of site operators to maintain high quality reviews. These experiences raise new questions regarding what trust means on review platforms, how trust arises and how easily it can be destroyed.


Towards a Multi-Stage Approach to Detect Privacy Breaches in Physician Reviews

F.S. Bäumer, J. Kersting, M. Orlikowski, M. Geierhos, in: Proceedings of the Posters and Demos Track of the 14th International Conference on Semantic Systems co-located with the 14th International Conference on Semantic Systems (SEMANTiCS 2018),, 2018

Physician Review Websites allow users to evaluate their experiences with health services. As these evaluations are regularly contextualized with facts from users’ private lives, they often accidentally disclose personal information on the Web. This poses a serious threat to users’ privacy. In this paper, we report on early work in progress on “Text Broom”, a tool to detect privacy breaches in user-generated texts. For this purpose, we conceptualize a pipeline which combines methods of Natural Language Processing such as Named Entity Recognition, linguistic patterns and domain-specific Machine Learning approaches which have the potential to recognize privacy violations with wide coverage. A prototypical web application is openly accesible.

How to Deal with Inaccurate Service Requirements? Insights in Our Current Approach and New Ideas

F.S. Bäumer, M. Geierhos, in: Joint Proceedings of REFSQ-2018 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 23rd International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2018),, 2018

The main idea in On-The-Fly Computing is to automatically compose existing software services according to the wishes of end-users. However, since user requirements are often ambiguous, vague and incomplete, the selection and composition of suitable software services is a challanging task. In this paper, we present our current approach to improve requirement descriptions before they are used for software composition. This procedure is fully automated, but also has limitations, for example, if necessary information is missing. In addition, and in response to the limitations, we provide insights into our above-mentioned current work that combines the existing optimization approach with a chatbot solution.

Rate Your Physician: Findings from a Lithuanian Physician Rating Website

F.S. Bäumer, J. Kersting, V. Kuršelis, M. Geierhos, in: Communications in Computer and Information Science, Springer, 2018, pp. 43-58

Physician review websites are known around the world. Patients review the subjectively experienced quality of medical services supplied to them and publish an overall rating on the Internet, where quantitative grades and qualitative texts come together. On the one hand, these new possibilities reduce the imbalance of power between health care providers and patients, but on the other hand, they can also damage the usually very intimate relationship between health care providers and patients. Review websites must meet these requirements with a high level of responsibility and service quality. In this paper, we look at the situation in Lithuania: Especially, we are interested in the available possibilities of evaluation and interaction, and the quality of a particular review website measured against the available data. We thereby identify quality weaknesses and lay the foundation for future research.

Flexible Ambiguity Resolution and Incompleteness Detection in Requirements Descriptions via an Indicator-based Configuration of Text Analysis Pipelines

F.S. Bäumer, M. Geierhos, in: Proceedings of the 51st Hawaii International Conference on System Sciences, 2018, pp. 5746-5755

Natural language software requirements descriptions enable end users to formulate their wishes and expectations for a future software product without much prior knowledge in requirements engineering. However, these descriptions are susceptible to linguistic inaccuracies such as ambiguities and incompleteness that can harm the development process. There is a number of software solutions that can detect deficits in requirements descriptions and partially solve them, but they are often hard to use and not suitable for end users. For this reason, we develop a software system that helps end-users to create unambiguous and complete requirements descriptions by combining existing expert tools and controlling them using automatic compensation strategies. In order to recognize the necessity of individual compensation methods in the descriptions, we have developed linguistic indicators, which we present in this paper. Based on these indicators, the whole text analysis pipeline is ad-hoc configured and thus adapted to the individual circumstances of a requirements description.

CORDULA: Software Requirements Extraction Utilizing Chatbot as Communication Interface

E. Friesen, F.S. Bäumer, M. Geierhos, in: Joint Proceedings of REFSQ-2018 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 23rd International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2018),, 2018

Natural language requirement descriptions are often unstructured, contradictory and incomplete and are therefore challenging for automatic processing. Although many of these deficits can be compensated by means of Natural Language Processing, there still remain cases where interaction with end-users is necessary for clarification. In this paper, we present our idea of using chatbot technology to establish end-user communication in order to support the automatic compensation of some deficits in natural language requirement descriptions.

Back to Basics: Extracting Software Requirements with a Syntactic Approach

M. Caron, F.S. Bäumer, M. Geierhos, in: Joint Proceedings of REFSQ-2018 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 23rd International Conference on Requirements Engineering: Foundation for Software Quality (REFSQ 2018),, 2018

As our world grows in complexity, companies and employees alike need, more than ever before, solutions tailored to their exact needs. Since such tools cannot always be purchased off-the-shelf and need to be designed from the ground up, developers rely on software requirements. In this paper, we present our vision of a syntactic rule-based extraction tool for software requirements specification documents. In contrast to other methods, our tool will allow stakeholders to express their needs and wishes in unfiltered natural language, which we believe is essential for non-expert users.

NLP in OTF Computing: Current Approaches and Open Challenges

F.S. Bäumer, M. Geierhos, in: Proceedings of the 24th International Conference on Information and Software Technologies (ICIST 2018), Springer, 2018, pp. 559-570

On-The-Fly Computing is the vision of covering software needs of end users by fully-automatic compositions of existing software services. End users will receive so-called service compositions tailored to their very individual needs, based on natural language software descriptions. This everyday language may contain inaccuracies and incompleteness, which are well-known challenges in requirements engineering. In addition to existing approaches that try to automatically identify and correct these deficits, there are also new trends to involve users more in the elaboration and refinement process. In this paper, we present the relevant state of the art in the field of automated detection and compensation of multiple inaccuracies in natural language service descriptions and name open challenges needed to be tackled in NL-based software service composition.

Text Broom: A ML-based Tool to Detect and Highlight Privacy Breaches in Physician Reviews: An Insight into Our Current Work

F.S. Bäumer, M. Geierhos. Text Broom: A ML-based Tool to Detect and Highlight Privacy Breaches in Physician Reviews: An Insight into Our Current Work. 2018.

How to Deal with Inaccurate Service Descriptions in On-The-Fly Computing: Open Challenges

F.S. Bäumer, M. Geierhos, in: Proceedings of the 23rd International Conference on Natural Language and Information Systems, Springer, 2018, pp. 509-513

The vision of On-The-Fly Computing is an automatic composition of existing software services. Based on natural language software descriptions, end users will receive compositions tailored to their needs. For this reason, the quality of the initial software service description strongly determines whether a software composition really meets the expectations of end users. In this paper, we expose open NLP challenges needed to be faced for service composition in On-The-Fly Computing.


Privacy Matters: Detecting Nocuous Patient Data Exposure in Online Physician Reviews

F.S. Bäumer, N. Grote, J. Kersting, M. Geierhos, in: Information and Software Technologies: 23rd International Conference, ICIST 2017, Druskininkai, Lithuania, October 12–14, 2017, Proceedings, Springer, 2017, pp. 77-89

Consulting a physician was long regarded as an intimate and private matter. The physician-patient relationship was perceived as sensitive and trustful. Nowadays, there is a change, as medical procedures and physicians consultations are reviewed like other services on the Internet. To allay user’s privacy doubts, physician review websites assure anonymity and the protection of private data. However, there are hundreds of reviews that reveal private information and hence enable physicians or the public to identify patients. Thus, we draw attention to the cases when de-anonymization is possible. We therefore introduce an approach that highlights private information in physician reviews for users to avoid an accidental disclosure. For this reason, we combine established natural-language-processing techniques such as named entity recognition as well as handcrafted patterns to achieve a high detection accuracy. That way, we can help websites to increase privacy protection by recognizing and uncovering apparently uncritical information in user-generated texts.

From User Demand to Software Service: Using Machine Learning to Automate the Requirements Specification Process

L. van Rooijen, F.S. Bäumer, M.C. Platenius, M. Geierhos, H. Hamann, G. Engels, in: 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW), IEEE, 2017, pp. 379-385

Bridging the gap between informal, imprecise, and vague user requirements descriptions and precise formalized specifications is the main task of requirements engineering. Techniques such as interviews or story telling are used when requirements engineers try to identify a user's needs. The requirements specification process is typically done in a dialogue between users, domain experts, and requirements engineers. In our research, we aim at automating the specification of requirements. The idea is to distinguish between untrained users and trained users, and to exploit domain knowledge learned from previous runs of our system. We let untrained users provide unstructured natural language descriptions, while we allow trained users to provide examples of behavioral descriptions. In both cases, our goal is to synthesize formal requirements models similar to statecharts. From requirements specification processes with trained users, behavioral ontologies are learned which are later used to support the requirements specification process for untrained users. Our research method is original in combining natural language processing and search-based techniques for the synthesis of requirements specifications. Our work is embedded in a larger project that aims at automating the whole software development and deployment process in envisioned future software service markets.

Guesswork? Resolving Vagueness in User-Generated Software Requirements

M. Geierhos, F.S. Bäumer, in: Partiality and Underspecification in Information, Languages, and Knowledge,1st ed., Cambridge Scholars Publishing, 2017, pp. 65-108

In recent years, there has been a proliferation of technological developments that incorporate processing of human language. Hardware and software can be specialized for designated subject areas, and computational devices are designed for a widening variety of applications. At the same time, new areas and applications are emerging by demanding intelligent technology enhanced by the processing of human language. These new applications often perform tasks which handle information, and they have a capacity to reason, using both formal and human language. Many sub-areas of Artificial Intelligence demand integration of Natural Language Processing, at least to some degree. Furthermore, technologies require coverage of known as well as unknown agents, and tasks with potential variations. All of this takes place in environments with unknown factors. The book covers theoretical work, advanced applications, approaches, and techniques for computational models of information, reasoning systems, and presentation in language. The book promotes work on intelligent natural language processing and related models of information, thought, reasoning, and other cognitive processes. The topics covered by the chapters prompt further research and developments of advanced systems in the areas of logic, computability, computational linguistics, cognitive science, neuroscience of language, robotics, and artificial intelligence, among others.

Indikatorbasierte Erkennung und Kompensation von ungenauen und unvollständig beschriebenen Softwareanforderungen

F.S. Bäumer, Universität Paderborn, 2017

Die vorliegende Dissertation ist im Rahmen des Sonderforschungsbereichs 901: On-The-Fly Computing (auch: OTF-Computing) entstanden. Die Vision des OTF-Computings sieht vor, dass zukünftig der individuelle Softwarebedarf von Endanwendern durch die automatische Komposition bestehender Softwareservices gedeckt wird. Im Fokus stehen dabei natürlichsprachliche Softwareanforderungen, die Endanwender formulieren und an OTF-Anbieter als Anforderungsbeschreibung übergeben. Sie dienen an dieser Stelle als alleinige Kompositionsgrundlage, können allerdings ungenau und unvollständig sein. Dies sind Defizite, die bislang durch Softwareentwickler im Rahmen eines bidirektionalen Konsolidierungsprozesses erkannt und behoben wurden. Allerdings ist eine solche Qualitätssicherung im OTF-Computing nicht mehr vorgesehen – der klassische Konsolidierungsprozess entfällt. Hier setzt die Dissertation an, indem sie sich mit Ungenauigkeiten frei formulierter Anforderungsbeschreibungen beim Softwareentwurf auseinandersetzt. Hierfür wird mit CORDULA (Compensation of Requirements Descriptions Using Linguistic Analysis) ein System entwickelt, dass sprachliche Unzulänglichkeiten (Ambiguität, Vagheit sowie Unvollständigkeit) in den Formulierungen unerfahrener Endanwender erkennt und kompensiert. CORDULA unterstützt dabei die Suche nach geeigneten Softwareservices zur Komposition, indem Anforderungsbeschreibungen in kanonische Kernfunktionalitäten überführt werden. Die vorliegende Arbeit leistet somit methodisch gesehen einen Beitrag zur ganzheitlichen Erfassung und Verbesserung sprachlicher Unzulänglichkeiten in nutzergenerierten Anforderungsbeschreibungen, indem erstmalig parallel und sequenziell Ambiguität, Unvollständigkeit und Vagheit behandelt werden. Erst durch den Einsatz linguistischer Indikatoren ist es möglich, datengetrieben und bedarfsorientiert die individuelle Textqualität zu optimieren, indem von der klassischen Textanalysepipeline abgewichen wurde: Die ad hoc-Konfiguration der Kompensationspipeline, ausgelöst durch die On-The-Fly festgestellten Defizite in den Anforderungsbeschreibungen der Endanwender, ist ein Alleinstellungsmerkmal.

Studying Software Descriptions in SourceForge and App Stores for a better Understanding of real-life Requirements

F.S. Bäumer, M. Dollmann, M. Geierhos, in: Proceedings of the 2nd ACM SIGSOFT International Workshop on App Market Analytics, ACM, 2017, pp. 19-25

Users prefer natural language software requirements because of their usability and accessibility. Many approaches exist to elaborate these requirements and to support the users during the elicitation process. But there is a lack of adequate resources, which are needed to train and evaluate approaches for requirement refinement. We are trying to close this gap by using online available software descriptions from SourceForge and app stores. Thus, we present two real-life requirements collections based on online-available software descriptions. Our goal is to show the domain-specific characteristics of content words describing functional requirements. On the one hand, we created a semantic role-labeled requirements set, which we use for requirements classification. On the other hand, we enriched software descriptions with linguistic features and dependencies to provide evidence for the context-awareness of software functionalities.

Indikatorbasierte Erkennung und Kompensation von ungenauen und unvollständig beschriebenen Softwareanforderungen

F.S. Bäumer, Universität Paderborn, 2017

The vision of OTF Computing is to have the software needs of end users in the future covered by an automatic composition of existing software services. Here we focus on natural language software requirements that end users formulate and submit to OTF providers as requirement specifications. These requirements serve as the sole foundation for the composition of software; but they can be inaccurate and incomplete. Up to now, software developers have identified and corrected these deficits by using a bidirectional consolidation process. However, this type of quality assurance is no longer included in OTF Computing - the classic consolidation process is dropped. This is where this work picks up, dealing with the inaccuracies of freely formulated software design requirements. To do this, we developed the CORDULA (Compensation of Requirements Descriptions Using Linguistic Analysis) system that recognizes and compensates for language deficiencies (e.g., ambiguity, vagueness and incompleteness) in requirements written by inexperienced end users. CORDULA supports the search for suitable software services that can be combined in a composition by transferring requirement specifications into canonical core functionalities. This dissertation provides the first-ever method for holistically recording and improving language deficiencies in user-generated requirement specifications by dealing with ambiguity, incompleteness and vagueness in parallel and in sequence.


How to Complete Customer Requirements: Using Concept Expansion for Requirement Refinement

M. Geierhos, F.S. Bäumer, in: Proceedings of the 21st International Conference on Applications of Natural Language to Information Systems (NLDB), Springer, 2016, pp. 37-47

One purpose of requirement refinement is that higher-level requirements have to be translated to something usable by developers. Since customer requirements are often written in natural language by end users, they lack precision, completeness and consistency. Although user stories are often used in the requirement elicitation process in order to describe the possibilities how to interact with the software, there is always something unspoken. Here, we present techniques how to automatically refine vague software descriptions. Thus, we can bridge the gap by first revising natural language utterances from higher-level to more detailed customer requirements, before functionality matters. We therefore focus on the resolution of semantically incomplete user-generated sentences (i.e. non-instantiated arguments of predicates) and provide ontology-based gap-filling suggestions how to complete unverbalized information in the user’s demand.

Running out of Words: How Similar User Stories Can Help to Elaborate Individual Natural Language Requirement Descriptions

F.S. Bäumer, M. Geierhos, in: Proceedings of the 22nd International Conference on Information and Software Technologies (ICIST), Springer, 2016, pp. 549-558

While requirements focus on how the user interacts with the system, user stories concentrate on the purpose of software features. But in practice, functional requirements are also described in user stories. For this reason, requirements clarification is needed, especially when they are written in natural language and do not stick to any templates (e.g., "as an X, I want Y so that Z ..."). However, there is a lot of implicit knowledge that is not expressed in words. As a result, natural language requirements descriptions may suffer from incompleteness. Existing approaches try to formalize natural language or focus only on entirely missing and not on deficient requirements. In this paper, we therefore present an approach to detect knowledge gaps in user-generated software requirements for interactive requirement clarification: We provide tailored suggestions to the users in order to get more precise descriptions. For this purpose, we identify not fully instantiated predicate argument structures in requirements written in natural language and use context information to realize what was meant by the user.


Erfahrungsberichte aus zweiter Hand: Erkenntnisse über die Autorschaft von Arztbewertungen in Online-Portalen

M. Geierhos, F.S. Bäumer, in: DHd 2015: Book of Abstracts, ZIM-ACDH, 2015, pp. 69-72

Der zufriedene Patient 2.0: Analyse anonymer Arztbewertungen im Web 2.0

M. Geierhos, S. Schulze, F.S. Bäumer, Verbraucherzentrale NRW/Kompetenzzentrum Verbraucherforschung NRW, 2015, pp. 18

Der Erfahrungsaustausch zwischen Patienten findet verstärkt über Arztbewertungsportale statt. Dabei ermöglicht die Anonymität des Netzes ein weitestgehend ehrliches Beschwerdeverhalten, von dem das sensible Arzt-Patienten-Vertrauensverhältnis unbeschädigt bleibt. Im Rahmen des vorliegenden Beitrags wurden anonyme Arztbewertungen im Web 2.0 automatisiert ausgewertet, um Einflussfaktoren auf das Beschwerdeverhalten deutscher Patienten zu bestimmen und in der Gesellschaft vermeintlich etablierte „Patienten-Mythen“ aufzuklären. Die Aufdeckung von Irrtümern und Zufriedenheitsindikatoren soll längerfristig dazu dienen, Patientenäußerungen differenzierter zu interpretieren und somit zu einer nachhaltigen Verbesserung der Arzt-Patienten-Beziehung beizutragen.

Find a Physician by Matching Medical Needs described in your Own Words

F.S. Bäumer, M. Dollmann, M. Geierhos, in: The 6th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2015) / The 5th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH-2015) / Affiliated Workshops, Elsevier, 2015, pp. 417-424

The individual search for information about physicians on Web 2.0 platforms can affect almost all aspects of our lives. People can directly access physician rating websites via web browsers or use any search engine to find physician reviews and ratings filtered by location resp. specialty. However, sometimes keyword search does not meet user needs because of the disagreement of users’ common terms queries for symptoms and the widespread medical terminology. In this paper, we present the prototype of a specialised search engine that overcomes this by indexing user-generated content (i.e., review texts) for physician discovery and provides automatic suggestions as well as an appropriate visualisation. On the one hand, we consider the available numeric physician ratings as sorting criterion for the ranking of query results. Furthermore, we extended existing ranking algorithms with respect to domain-specific types and physicians ratings on the other hand. We gathered more than 860,000 review texts and collected more than 213,000 physician records. A random test shows that about 19.7% of 5,100 different words in total are health- related and partly belong to consumer health vocabularies. Our evaluation results show that the query results fit user's particular health issues when seeking for physicians.

Understanding the Patient 2.0: Gaining Insight into Patients' Rating Behavior by User-generated Physician Review Mining

M. Geierhos, F.S. Bäumer, S. Schulze, C. Klotz, in: Modeling and Using Context. 9th International and Interdisciplinary Conference, CONTEXT 2015, Lanarca, Cyprus, November 2-6, 2015. Proceedings, Springer, 2015, pp. 159-171

Patients 2.0 increasingly inform themselves about the quality of medical services on physician rating websites. However, little is known about whether the reviews and ratings on these websites truly reflect the quality of services or whether the ratings on these websites are rather influenced by patients’ individual rating behavior. Therefore, we investigate more than 790,000 physician reviews on Germany’s most used physician rating website Our results show that patients’ ratings do not only reflect treatment quality but are also influenced by treatment quality independent factors like age and complaint behavior. Hence, we provide evidence that users should be well aware of user specific rating distortions when intending to make their physician choice based on these ratings.

A System for Uncovering Latent Connectivity of Health Care Providers in Online Reviews

F.S. Bäumer, M. Geierhos, S. Schulze, in: Information and Software Technologies. 21st International Conference, ICIST 2015, Druskininkai, Lithuania, October 15-16, 2015. Proceedings, Springer, 2015, pp. 3-15

The contacts a health care provider (HCP), like a physician, has to other HCPs is perceived as a quality characteristic by patients. So far, only the German physician rating website gives information about the interconnectedness of HCPs in business networks. However, this network has to be maintained manually and is thus incomplete. We therefore developed a system for uncovering latent connectivity of HCPs in online reviews to provide users with more valuable information about their HCPs. The overall goal of this approach is to extend already existing business networks of HCPs by integrating connections that are newly discovered by our system. Our most recent evaluation results are promising: 70.8 % of the connections extracted from the reviews texts were correctly identified and in total 3,788 relations were recognized that have not been displayed in’s network before.

"I grade what I get but write what I think." Inconsistency Analysis in Patients' Reviews

M. Geierhos, F.S. Bäumer, S. Schulze, V. Stuß, in: ECIS 2015 Completed Research Papers, Elsevier, 2015

Received medical services are increasingly discussed and recommended on physician rating websites (PRWs). The reviews and ratings on these platforms are valuable sources of information for patient opinion mining. In this paper, we have tackled three issues that come along with inconsistency analysis on PRWs: (1) Natural language processing of user-generated reviews, (2) the disagreement in polarity of review text and its corresponding numerical ratings (individual inconsistency) and (3) the differences in patients’ rating behavior for the same service category (e.g. ‘treatment’) expressed by varying grades on the entire data set (collective inconsistency). Thus, the basic idea is first to identify relevant opinion phrases that describe service categories and to determine their polarity. Subsequently, the particular phrase has to be assigned to its corresponding numerical rating category before checking the (dis-)agreement of polarity values. For this purpose, several local grammars for the pattern-based analysis as well as domain-specific dictionaries for the recognition of entities, aspects and polarity were applied on 593,633 physician reviews from both German PRWs and Furthermore, our research contributes to content quality improvement of PRWs because we provide a technique to detect inconsistent reviews that could be ignored for the computation of average ratings.

What did you mean? Facing the Challenges of User-generated Software Requirements

M. Geierhos, S. Schulze, F.S. Bäumer, in: Proceedings of the 7th International Conference on Agents and Artificial Intelligence (ICAART), Special Session on Partiality, Underspecification, and Natural Language Processing (PUaNLP 2015), SciTePress - Science and Technology Publications, 2015, pp. 277-283

Existing approaches towards service composition demand requirements of the customers in terms of service templates, service query profiles, or partial process models. However, addressed non-expert customers may be unable to fill-in the slots of service templates as requested or to describe, for example, pre- and postconditions, or even have difficulties in formalizing their requirements. Thus, our idea is to provide non-experts with suggestions how to complete or clarify their requirement descriptions written in natural language. Two main issues have to be tackled: (1) partial or full inability (incapacity) of non-experts to specify their requirements correctly in formal and precise ways, and (2) problems in text analysis due to fuzziness in natural language. We present ideas how to face these challenges by means of requirement disambiguation and completion. Therefore, we conduct ontology-based requirement extraction and similarity retrieval based on requirement descriptions that are gathered from App marketplaces. The innovative aspect of our work is that we support users without expert knowledge in writing their requirements by simultaneously resolving ambiguity, vagueness, and underspecification in natural language.

Filtering Reviews by Random Individual Error

M. Geierhos, F.S. Bäumer, S. Schulze, V. Stuß, in: Proceedings of the 28th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE 2015), Springer, 2015, pp. 305-315

Opinion mining from physician rating websites depends on the quality of the extracted information. Sometimes reviews are user-error prone and the assigned stars or grades contradict the associated content. We therefore aim at detecting random individual error within reviews. Such errors comprise the disagreement in polarity of review texts and the respective ratings. The challenges that thereby arise are (1) the content and sentiment analysis of the review texts and (2) the removal of the random individual errors contained therein. To solve these tasks, we assign polarities to automatically recognized opinion phrases in reviews and then check for divergence in rating and text polarity. The novelty of our approach is that we improve user-generated data quality by excluding error-prone reviews on German physician websites from average ratings.


Comparative study on disambiguating acronyms in the scientific papers using the open knowledge base

D. Jeong, J. Gim, H. Jung, M. Geierhos, F.S. Bäumer, in: Conference Proceedings of the 9th Asia Pacific International Conference on Information Science and Technology (APIC-IST 2014), 2014, pp. 369-371

In this paper, we focus on the acronym representation, the concept of abbreviation of major terminology. To this end, we try to find the most efficient method to disambiguate the sense of the acronym. Comparing the various feature types, we found that using single noun (NN) overwhelmingly outperformed noun phrase (NP) base. Moreover, the result also showed that collocation information (CL) was not efficient for enhancing performance considering a huge extra data processing. We expect to apply the open knowledge base Wikipedia to scholarly service to enhance the quality of the local knowledge base and to develop value-added services.

Linked Open Data System for Scientific Data Sets

F.S. Bäumer, J. Gim, D. Jeong, M. Geierhos, H. Jung, in: Proceedings of the First International Workshop on Patent Mining and Its Applications (IPaMin 2014) co-located with Konvens 2014,, 2014

In this paper, we present a system which makes scientific data available following the linked open data principle using standards like RDF and URI as well as the popular D2R server (D2R) and the customizable D2RQ mapping language. Our scientific data sets include acronym data and expansions, as well as researcher data such as author name, affiliation, coauthors, and abstracts. The system can easily be extended to other records. Regarding this, a domain adaptation to patent mining seems possible. For this reason, obvious similarities and differences are presented here. The data set is collected from several different providers like publishing houses and digital libraries, which follow different standards in data format and structure. Most of them are not supporting semantic web technologies, but the legacy HTML standard. The integration of these large amounts of scientific data into the Semantic Web is challenging and it needs flexible data structures to access this information and interlink them. Based on these data sets, we will be able to derive a general technology trend as well as the individual research domain for each researcher. The goal of our Linked Open Data System for scientific data is to provide access to this data set for other researchers using the Web of Linked Data. Furthermore we implemented an application for visualization, which allows usto explorethe relations between single data sets.

Liste im Research Information System öffnen

Informationen zu den Lehrveranstaltungen von
Dr. Frederik Bäumer finden Sie hier.

Um mehr über die Forschungsarbeit von
Dr. Frederik Bäumer zu erfahren, klicken Sie hier.

Hier erfahren Sie mehr zum wissenschaftlichen Werdegang von Dr. Frederik Bäumer.

Hier erfahren Sie mehr zur Arbeitsgruppe Semantische Informationsverarbeitung.

Klicken Sie hier, um die Webseiten der Professur für Digitale Kulturwissenschaften zu besuchen.

Die Universität der Informationsgesellschaft