pm4py - Process Mining for Python

PM4Py Whitepaper

PM4Py: A process mining library for Python

Berti, Alessandro, Sebastiaan J. van Zelst, and Daniel Schuster

Software Impacts (Volume 17)

2023

PM4Py is a Python library providing a comprehensive array of tools for process mining. This paper presents an in-depth overview of the PM4Py library, including its integration with other Python libraries and its latest features, such as object-centric process mining. Furthermore, we discuss the significant impact of PM4Py within academia, industry, and the open-source community, evidenced by its wide adoption and substantial evolution. In short, the PM4Py library is an essential tool for researchers and practitioners, paving the way for advancements in process mining.

Publications Using PM4Py

pm4py Web Services: Easy Development, Integration and Deployment of Process Mining Features in any Application Stack

Berti, Alessandro, Sebastiaan J. van Zelst, and Wil van der Aalst

BPM Demo Track

2019 **

In recent years, process mining emerged as a set of techniques to analyze process data, supported by different open-source and commercial solutions. Process mining tools aim to discover process models from the data, perform conformance checking, predict the future behavior of the process and/or provide other analyses that enhance the overall process knowledge. [...]

PROVED: A Tool for Graph Representation and Analysis of Uncertain Event Data

Marco Pegoraro, Merih Seran Uysal

Petri Nets

2021 **

The discipline of process mining aims to study processes in a data-driven manner by analyzing historical process executions, often employing Petri nets. Event data, extracted from information systems (e.g. SAP), serve as the starting point for process mining. Recently, novel types of event data have gathered interest among the process mining community, including uncertain event data. Uncertain events, process traces and logs contain attributes that are characterized by quantified imprecisions, e.g., a set of possible attribute values. The PROVED tool helps to explore, navigate and analyze such uncertain event data by abstracting the uncertain information using behavior graphs and nets, which have Petri nets semantics. Based on these constructs, the tool enables discovery and conformance checking.

Anti-Alignments – Measuring The Precision of Process Models and Event Logs

Chatain, Thomas and Boltenhagen, Mathilde and Carmona, Josep

Information Systems

2021 **

Processes are a crucial artefact in organizations, since they coordinate the execution of activities so that products and services are provided. The use of models to analyse the underlying processes is a wellknown practice. However, due to the complexity and continuous evolution of their processes, organizations need an effective way of analysing the relation between processes and models. Conformance checking techniques asses the suitability of a process model in representing an underlying process, observed through a collection of real executions. One important metric in conformance checking is to asses the precision of the model with respect to the observed executions, i.e., characterize the ability of the model to produce behavior unrelated to the one observed. In this paper we present the notion of anti-alignment as a concept to help unveiling runs in the model that may deviate significantly from the observed behavior. Using anti-alignments, a new metric for precision is proposed. In contrast to existing metrics, anti-alignment based precision metrics satisfy most of the required axioms highlighted in a recent publication. Moreover, a complexity analysis of the problem of computing anti-alignments is provided, which sheds light into the practicability of using anti-alignment to estimate precision. Experiments are provided that witness the validity of the concepts introduced in this paper.

Optimized SAT encoding of conformance checking artefacts

Mathilde Boltenhagen, Thomas Chatain, Josep Carmona

Computing (Journal)

2021 **

Conformance checking is a growing discipline that aims at assisting organizations in monitoring their processes. On its core, conformance checking relies on the computation of particular artefacts which enable reasoning on the relation between observed and modeled behavior. It is widely acknowledge that the computation of these artifacts is the lion’s share of conformance checking techniques. This paper shows how important conformance artefacts like alignments, anti-alignments or multi-alignments, defined over the Levenshtein edit distance, can be efficiently computed by encoding the problem as an optimized SAT instance. From a general perspective, the work advocates for a unified family of techniques that can compute conformance artefacts in the same way. The implementation of the techniques presented in this paper show capabilities for dealing with both synthetic and real-life instances, which may open the door for a fresh way of applying conformance checking in the near future.

Translating Workflow Nets to Process Trees: An Algorithmic Approach

Sebastiaan J. van Zelst, Sander J. J. Leemans

Algorithms

2020 **

Since their introduction, process trees have been frequently used as a process modeling formalism in many process mining algorithms. A process tree is a (mathematical) tree-based model of a process, in which internal vertices represent behavioral control-flow relations and leaves represent process activities. Translation of a process tree into a sound workflow net is trivial. However, the reverse is not the case. Simultaneously, an algorithm that translates a WF-net into a process tree is of great interest, e.g., the explicit knowledge of the control-flow hierarchy in a WF-net allows one to reason on its behavior more easily. Hence, in this paper, we present such an algorithm, i.e., it detects whether a WF-net corresponds to a process tree, and, if so, constructs it. We prove that, if the algorithm finds a process tree, the language of the process tree is equal to the language of the original WF-net. The experiments conducted show that the algorithm’s corresponding implementation has a quadratic time complexity in the size of the WF-net. Furthermore, the experiments show strong evidence of process tree rediscoverability.

An Open-Source Integration of Process Mining Features into the Camunda Workflow Engine: Data Extraction and Challenges

Alessandro Berti, Wil M.P. van der Aalst

ICPM Demo Track

2020 **

Process mining provides techniques to improve the performance and compliance of operational processes. Although sometimes the term “workflow mining” is used, the application in the context of Workflow Management (WFM) and Business Process Management (BPM) systems is limited. The main reason is that WFM/BPM systems control the process, leaving less room for flexibility and the corresponding deviations. However, as this paper shows, it is easy to extract event data from systems like Camunda, one of the leading open-source WFM/BPM systems. Moreover, although the respective process engines control the process flow, process mining is still able to provide valuable insights, such as the analysis of the performance of the paths and the mining of the decision rules. This demo paper presents a process mining connector to Camunda that extracts event logs and process models, allowing for the application of existing process mining tools. We also analyzed the added value of different process mining techniques in the context of Camunda. We discuss a subset of process mining techniques that nicely complements the process intelligence capabilities of Camunda. Through this demo paper, we hope to boost the use of process mining among Camunda users.

Discovering Object-centric Petri Nets

Wil M.P. van der Aalst, Alessandro Berti

Fundamenta Informaticae

2020

Techniques to discover Petri nets from event data assume precisely one case identifier per event. These case identifiers are used to correlate events, and the resulting discovered Petri net aims to describe the life-cycle of individual cases. In reality, there is not one possible case notion, but multiple intertwined case notions. For example, events may refer to mixtures of orders, items, packages, customers, and products. A package may refer to multiple items, multiple products, one order, and one customer. Therefore, we need to assume that each event refers to a collection of objects, each having a type (instead of a single case identifier). Such object-centric event logs are closer to data in real-life information systems. From an object-centric event log, we want to discover an object-centric Petri net with places that correspond to object types and transitions that may consume and produce collections of objects of different types. Object-centric Petri nets visualize the complex relationships among objects from different types. This paper discusses a novel process discovery approach implemented in pm4py. As will be demonstrated, it is indeed feasible to discover holistic process models that can be used to drill-down into specific viewpoints if needed.

Towards Process Mining Utilization in Insider Threat Detection from Audit Logs

Martin Macak, Ivan Vanat et al.

SNAMS

2020

Nowadays, insider threats are one of the most significant cybersecurity threats. They are much more difficult to detect than external threats since insiders are authorized employees with legitimate access to the organization's resources. Malicious insider knows the organization and can act inconspicuously. Furthermore, threats do not even have to be intentional. Therefore, there can be a complicated background of malicious insider behavior, making it challenging to react adequately to these threats. In this paper, we propose to utilize process mining for insider threat detection using the organization's audit logs. We present the three different types of process mining utilization for insider threat detection from audit logs and discuss their usefulness, namely visual analysis, conformance checking, and declarative conformance checking. Lastly, we give recommendations for future work in this area based on our experience.

Investigating Business Trips Arrangement Process at the Eindhoven University of Technology (TU/e)

Anastasiya Pakileva, Ekaterina Skvortsova et al.

ICPM (BPI Challenge)

2020

A clear process of papers submission for business travels is a key point in effective use of resources. As such automated processes are usually logged, there is an opportunity to conduct a complex analysis and evaluate the system health and find areas for potential improvements. The main purpose of this research is to conduct a detailed analysis of the provided logged data from the Eindhoven University of Technology (TU/e) in order to highlight bottlenecks and deviations in the existing process of travel arranging and costs reimbursement and to propose relevant solutions for the current difficulties elimination and costs reduction. In order to perform a deep analysis we used the state-of-the-art tools of data and process mining combining two approaces: out-of-box solutions such as Disco and Einstein Analytics Studio and python libraries pm4py and sklearn. An approach based on statistical tests and time-series analytics accompanying the analysis with the explicit visualizations helps researchers to deduce valuable business and technical inferences. The gained results show that the processes that consist of several etalon traces require the clarification among the university staff members and as far as the technical part is concerned, there should be system restrictions installed in order to escape the confusions.

Multi-perspective Analysis of Approval Processes based on Multiple Event Logs

Dorina Bano, Maximilian Volker et al.

ICPM (BPI Challenge)

2020

Over the years, the Business Process Intelligence (BPI) Challenge has become integral part of the activities of the process mining community. In this paper, we report on our results for the BPI Challenge 2020. This year’s challenge contains data about the process related to travel expense claims. A notable characteristic of this data set is that it consists of five different event logs. To successfully analyze this data set, we used a variety of techniques and tools including Python together with the pm4py framework, SQL Server, and the process mining tools Disco and ProM. The analysis we conducted includes all three main process mining perspectives, i.e., discovery, conformance, and performance. Among others, we identified concept drift, performance variations during holiday seasons, and potential conformance issues related to double payments and reimbursements without a permit.

Model-based trace variant analysis of event logs

Mathilde Boltenhagen et al.

Information Systems

2020 **

The comparison of trace variants of business processes opens the door for a fine-grained analysis of the distinctive features inherent in the executions of a process in an organization. The current approaches for trace variant analysis do not consider the situation where a process model is present, and therefore, it can guide the derivation of the trace variants by considering high-level structures present in the process model. In this paper we propose a fresh alternative to trace variant analysis, which proposes a generalized notion of trace variant that incorporates concurrency and iteration. This way, the analyst may be relieved from analyzing trace variants that are essentially the same, if these aspects are disregarded. We propose a general algorithm for model based trace variant analysis which is grounded in encoding the problem into SAT, and a family of heuristic alternatives including a very light sampling technique that represents a good trade-off between quality of the trace variants identified, and the complexity of the analysis. All the techniques of the paper are implemented in two open-source tools, and experiments with publicly available benchmarks are reported.

VDD: A visual drift detection system for process mining

Anton Yeshchenko, Jan Mendling et al.

ICPM Demo

2020 **

A Paradigm-Shifting From Domain-Driven Data Mining Frameworks to Process-Based Domain-Driven Data Mining-Actionable Knowledge Discovery Framework

Fakeeha Fatima, Ramzan Talib et al.

IEEE Access

2020 **

The success of data mining learned rules highly depends on its actionability: how useful it is to perform suitable actions in any real business environment. To improve rule actionability, different researchers have initially presented various Data Mining (DM) frameworks by focusing on different factors only from the business domain dataset. Afterward, different Domain-Driven Data Mining (D3M) frameworks were introduced by focusing on domain knowledge factors from the context of the overall business environment. Despite considering these several dataset factors and domain knowledge factors in different phases of their frameworks, the learned rules still lacked actionability. The objective of our research is to improve the learned rules’ actionability. For this purpose, we have analyzed: (1) what overall actions or tasks are being performed in the overall business process, (2) in which sequence different tasks are being performed, (3) under what certain conditions these tasks are being performed, (4) by whom the tasks are being performed (5) what data is provided and produced in performing these tasks. We observed that the inclusion of rule learning factors only from dataset or from domain knowledge is not sufficient. Our Process-based Domain-Driven Data Mining-Actionable Knowledge Discovery (PD3M-AKD) framework explains its different phases to consider and include additional factors from five perspectives of the business process. This PD3M-AKD framework is also in line with the existing phases of current DM and D3M frameworks for considering and including dataset and domain knowledge accordingly. Finally, we evaluated and validated our case study results from different real-life scenarios from education, engineering, and business process domains at the end.

Automated Business Process Discovery from Unstructured Natural-Language Documents

Alexander J. Chambers, Amy M. Stringfellow et al.

BPM (Workshop)

2020 **

Understanding the processes followed by organizations is important to ensure business outcomes are achieved in an optimal, efficient and compliant manner. Process mining techniques rely on the existence of structured event logs captured by process management systems. These systems are not always employed and may not capture all process steps, leaving out those that occur through emails and chat software or edits to documents and knowledge-management systems. Here we present an algorithm for the automated extraction of processes from unstructured natural-language documents. Action and topic analysis is used to generate an event log, from which process models are mined using standard techniques. We show the algorithm is capable of generating consistent software-development processes from an Apache Camel email dataset.

Privacy-Preserving Data Publishing in Process Mining

Majid Rafiei, Wil M. P. van der Aalst

BPM Forum

2020 **

Process mining aims to provide insights into the actual processes based on event data. These data are often recorded by information systems and are widely available. However, they often contain sensitive private information that should be analyzed responsibly. Therefore, privacy issues in process mining are recently receiving more attention. Privacy preservation techniques obviously need to modify the original data, yet, at the same time, they are supposed to preserve the data utility. Privacy-preserving transformations of the data may lead to incorrect or misleading analysis results. Hence, new infrastructures need to be designed for publishing the privacy-aware event data whose aim is to provide metadata regarding the privacy-related transformations on event data without revealing details of privacy preservation techniques or the protected information. In this paper, we provide formal definitions for the main anonymization operations, used by privacy models in process mining. These are used to create an infrastructure for recording the privacy metadata. We advocate the proposed privacy metadata in practice by designing a privacy extension for the XES standard and a general data structure for event data which are not in the form of standard event logs.

Conformance Checking Using Activity and Trace Embeddings

Jari Peeperkorn, Seppe vanden Broucke et al.

BPM Forum

2020 **

Conformance checking describes process mining techniques used to compare an event log and a corresponding process model. In this paper, we propose an entirely new approach to conformance checking based on neural network-based embeddings. These embeddings are vector representations of every activity/task present in the model and log, obtained via act2vec, a Word2vec based model. Our novel conformance checking approach applies the Word Mover’s Distance to the activity embeddings of traces in order to measure fitness and precision. In addition, we investigate a more efficiently calculated lower bound of the former metric, i.e. the Iterative Constrained Transfers measure. An alternative method using trace2vec, a Doc2vec based model, to train and compare vector representations of the process instances themselves is also introduced. These methods are tested in different settings and compared to other conformance checking techniques, showing promising results.

A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement

Alessandro Berti, Wil van der Aalst

ToPNoC (Transactions on Petri Nets and other models of Concurrency)

2020 **

Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., "token flooding") and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the pm4py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics. %Moreover, a revision of an existing precision measure (ETConformance) will be proposed through integration with the token-based replayer.

Process Mining for Production Processes in the Automotive Industry

Merih Seran Uysal, Sebastiaan van Zelst, Tobias Brockhoff, Anahita Farhang Ghahfarokhi, Mahsa Pourbafrani, Ruben Schumacher, Sebastian Junglas, Guenther Schuh and Wil van der Aalst

Industrial Track, International Conference on Business Process Management

2020

The increasing digitization of organizations leads to unprecedented amounts of data capturing the behavior of operational processes. On the basis of such data, process mining techniques allow us to obtain a holistic picture of the execution of a company’s processes, and their related events. In particular, production companies aiming at reducing the production cycle time and ensuring a high product quality show an increased interest in utilizing process mining in order to identify deviations and bottlenecks in their production processes. In this paper, we present a use case study in which we rigorously investigate how process mining techniques can successfully be applied to real-world data of the car production company e.GO Mobile AG. Furthermore, we present our results facilitating more transparency and valuable insights into the real processes of the company.

Online Process Monitoring Using Incremental State-Space Expansion: An Exact Algorithm

Daniel Schuster, Sebastiaan van Zelst

Main track, International Conference in Business Process Management

2020 **

The execution of (business) processes generates valuable traces of event data in the information systems employed within companies. Recently, approaches for monitoring the correctness of the execution of running processes have been developed in the area of process mining, i.e., online conformance checking. The advantages of monitoring a process’ conformity during its execution are clear, i.e., deviations are detected as soon as they occur and countermeasures can immediately be initiated to reduce the possible negative effects caused by process deviations. Existing work in online conformance checking only allows for obtaining approximations of non-conformity, e.g., overestimating the actual severity of the deviation. In this paper, we present an exact, parameter-free, online conformance checking algorithm that computes conformance checking results on the fly. Our algorithm exploits the fact that the conformance checking problem can be reduced to a shortest path problem, by incrementally expanding the search space and reusing previously computed intermediate results. Our experiments show that our algorithm is able to outperform comparable state-of-the-art approximation algorithms.

PRIPEL: Privacy-Preserving Event Log Publishing Including Contextual Information

Stephan A. Fahrenkrog-Petersen, Han van der Aa, Matthias Weidlich

International Conference on Business Process Management

2020 **

Event logs capture the execution of business processes in terms of executed activities and their execution context. Since logs contain potentially sensitive information about the individuals involved in the process, they should be pre-processed before being published to preserve the individuals' privacy. However, existing techniques for such pre-processing are limited to a process' control-flow and neglect contextual information, such as attribute values and durations. This thus precludes any form of process analysis that involves contextual factors. To bridge this gap, we introduce PRIPEL, a framework for privacy-aware event log publishing. Compared to existing work, PRIPEL takes a fundamentally different angle and ensures privacy on the level of individual cases instead of the complete log. This way, contextual information as well as the long tail process behaviour are preserved, which enables the application of a rich set of process analysis techniques. We demonstrate the feasibility of our framework in a case study with a real-world event log.

Process mining-based approach for investigating malicious login events

Sofiane Lagraa, Radu State

NOMS 2020

2020 **

A large body of research has been accomplished on prevention and detection of malicious events, attacks, threats, or botnets. However, there is a lack of automatic and sophisticated methods for investigating malicious events/users, understanding the root cause of attacks, and discovering what is really happening before an attack. In this paper, we propose an attack model discovery approach for investigating and mining malicious authentication events across user accounts. The approach is based on process mining techniques on event logs reaching attacks in order to extract the behavior of malicious users. The evaluation is performed on a publicly large dataset, where we extract models of the behavior of malicious users via authentication events. The results are useful for security experts in order to improve defense tools by making them robust and develop attack simulations.

Incremental Discovery of Hierarchical Process Models

Daniel Schuster, Sebastiaan van Zelst, Wil van der Aalst

RCIS 2020

2020 **

Many of today’s information systems record the execution of (business) processes in great detail. Process mining utilizes such data and aims to extract valuable insights. Process discovery, a key research area in process mining, deals with the construction of process models based on recorded process behavior. Existing process discovery algorithms aim to provide a “push-button-technology”, i.e., the algorithms discover a process model in a completely automated fashion. However, real data often contain noisy and/or infrequent complex behavioral patterns. As a result, the incorporation of all behavior leads to very imprecise or overly complex process models. At the same time, data pre-processing techniques have shown to be able to improve the precision of process models, i.e., without explicitly using domain knowledge. Yet, to obtain superior process discovery results, human input is still required. Therefore, we propose a discovery algorithm that allows a user to incrementally extend a process model by new behavior. The proposed algorithm is designed to localize and repair nonconforming process model parts by exploiting the hierarchical structure of the given process model. The evaluation shows that the process models obtained with our algorithm, which allows for incremental extension of a process model, have, in many cases, superior characteristics in comparison to process models obtained by using existing process discovery and model repair techniques.

Truncated Trace Classifier. Removal of Incomplete Traces from Event Logs

Gael Bernard, Periklis Andritsos

BPMDS-EMMSAD 2020

2020 **

We consider truncated traces, which are incomplete sequences of events. This typically happens when dealing with streaming data or when the event log extraction process cuts the end of the trace. The existence of truncated traces in event logs and their negative impacts on process mining outcomes have been widely acknowledged in the literature. Still, there is a lack of research on algorithms to detect them. We propose the Truncated Trace Classifier (TTC), an algorithm that distinguishes truncated traces from the ones that are not truncated. We benchmark 5 TTC implementations that use either LSTM or XGBOOST on 13 real-life event logs. Accurate TTCs have great potential. In fact, filtering truncated traces before applying a process discovery algorithm greatly improves the precision of the discovered process models, by 9.1%. Moreover, we show that TTCs increase the accuracy of a next event prediction algorithm by up to 7.5%.

A Generic Framework for Attribute-Driven Hierarchical Trace Clustering

Sebastiaan van Zelst, Yukun Cao

BPI Workshop 2020, International Conference on Business Process Management

2020 **

The execution of business processes often entails a specific process execution context, e.g. a customer, service or product. Often, the corresponding event data logs indicators of such an execution context, e.g., a customer type (bronze, silver, gold or platinum). Typically, variations in the execution of a process exist for the different execution context of a process. To gain a better understanding of the global process execution, it is interesting to study the behavioral (dis)similarity between different execution contexts of a process. However, in real business settings, the exact number of execution contexts might be too large to analyze manually. At the same time, current trace clustering techniques do not take process type information into account, i.e., they are solely behaviorally driven. Hence, in this paper, we present a hierarchical data-attribute-driven trace clustering framework that allows us to compare the behavior of different groups of traces. Our evaluation shows that the incorporation of data-attributes in trace clustering yields interesting novel process insights.

Efficient Time and Space Representation of Uncertain Event Data

Marco Pegoraro, Merih Seran Uysal, Wil van der Aalst

Algorithms

2020 **

Process mining is a discipline which concerns the analysis of execution data of operational processes, the extraction of models from event data, the measurement of the conformance between event data and normative models, and the enhancement of all aspects of processes. Most approaches assume that event data is accurately captured behavior. However, this is not realistic in many applications: data can contain uncertainty, generated from errors in recording, imprecise measurements, and other factors. Recently, new methods have been developed to analyze event data containing uncertainty; these techniques prominently rely on representing uncertain event data by means of graph-based models explicitly capturing uncertainty. In this paper, we introduce a new approach to efficiently calculate a graph representation of the behavior contained in an uncertain process trace. We present our novel algorithm, prove its asymptotic time complexity, and show experimental results that highlight order-of-magnitude performance improvements for the behavior graph construction.

Efficient Construction of Behavior Graphs for Uncertain Event Data

Marco Pegoraro, Merih Seran Uysal, Wil van der Aalst

BIS 2020

2020 **

The discipline of process mining deals with analyzing execution data of operational processes, extracting models from event data, checking the conformance between event data and normative models, and enhancing all aspects of processes. Recently, new techniques have been developed to analyze event data containing uncertainty; these techniques strongly rely on representing uncertain event data through graph-based models capturing uncertainty. In this paper we present a novel approach to efficiently compute a graph representation of the behavior contained in an uncertain process trace. We present our new algorithm, analyze its time complexity, and report experimental results showing order-of-magnitude performance improvements for behavior graph construction.

VTMine to Visio: Graphical Tool for Modeling in Process Mining

SA Shershakov

Information Systems

2020 **

Process-Aware Information Systems (PAIS) is a special class of the IS intended for the support the tasks of initialization, end-to-end management and completion of business processes. During the operation such systems accumulate a large number of data that are recorded in the form of the event logs. Event logs are a valuable source of knowledge about the actual behavior of a system. For example, there can be found information about the discrepancy between the real and the prescribed behavior of the system; to identify bottlenecks and performance issues; to detect anti-patterns of building a business system. These problems are studied by the discipline called Process Mining. The practical application of the process mining methods and practices is carried out using the specialized software for data analysts. The subject area of the process analysis involves the work of an analyst with a large number of graphical models. Such work will be more efficient with a convenient graphical modeling tool. The paper discusses the principles of building a graphical tool “VTMine for Visio” for the process modeling, based on the widespread application for business intelligence Microsoft Visio. Here are presented features of the architecture design of the software extension for application in the process mining domain and integration with the existing libraries and tools for working with data. The application of the developed tool for solving various types of tasks for modeling and analysis of processes is demonstrated on a set of experimental schemes

Evaluating the Effectiveness of Interactive Process Discovery in Healthcare: A Case Study

Elisabetta Benevento, Prabhakar M. Dixit, M. F. Sani, Davide Aloini, Wil van der Aalst

POD4H Workshop, International Conference on Business Process Management

2019 **

This work aims at investigating the effectiveness and suitability of Interactive Process Discovery, an innovative Process Mining technique, to model healthcare processes in a data-driven manner. Interactive Process Discovery allows the analyst to interactively discover the process model, exploiting his domain knowledge along with the event log. In so doing, a comparative evaluation against the traditional automated discovery techniques is carried out to assess the potential benefits that domain knowledge brings in improving both the quality and the understandability of the process model. The comparison is performed by using a real dataset from an Italian Hospital, in collaboration with the medical staff. Preliminary results show that Interactive Process Discovery allows to obtain an accurate and fully compliant with clinical guidelines process model with respect to the automated discovery techniques. Discovering an accurate and comprehensible process model is an important starting point for subsequent process analysis and improvement steps, especially in complex environments, such as healthcare.

Automated Generation of Business Process Models using Constraint Logic Programming in Python

Tymoteusz Paszun, Piotr Wiśniewski, Krzysztof Kluza, Antoni Lige

FedCSIS 2019

2019

High complexity of business processes in real-life organizations is a constantly rising issue. In consequence, modeling a workflow is a challenge for process stakeholders. Yet, to facilitate this task, new methods can be implemented to automate the phase of process design. As a main contribution of this paper, we propose an approach to generate process models based on activities performed by the participants, where the exact order of execution does not need to be specified. Nevertheless, the goal of our method is to generate artificial workflow traces of a process using Constraint Programming and a set of predefined rules. As a final step, the approach was implemented as a dedicated tool and evaluated on a set of test examples that prove that our method is capable of creating correct process models.

Cherry-Picking from Spaghetti: Multi-range Filtering of Event Logs

Maxim Vidgof, Djordje Djurica, Saimir Bala, Jan Mendling

BPMDS-EMMSAD 2020

2020 **

Mining real-life event logs results into process models which provide little value to the process analyst without support for handling complexity. Filtering techniques are specifically helpful to tackle this problem. These techniques have been focusing on leaving out infrequent aspects of the process which are considered outliers. However, it is exactly in these outliers where it is possible to gather important insights on the process. This paper addresses this problem by defining multi-range filtering. Our technique not only allows to combine both frequent and non-frequent aspects of the process but it supports any user-defined intervals of frequency of activities and variants. We evaluate our approach through a prototype based on the pm4py library and show the benefits in comparison to existing filtering techniques.

DeepAlign: Alignment-Based Process Anomaly Correction Using Recurrent Neural Networks

Timo Nolle, Alexander Seeliger, Nils Thoma, Max Mühlhäuser

CAiSE 2020: Advanced Information Systems Engineering pp 319-333

2020 **

In this paper, we propose DeepAlign, a novel approach to multi-perspective process anomaly correction, based on recurrent neural networks and bidirectional beam search. At the core of the DeepAlign algorithm are two recurrent neural networks trained to predict the next event. One is reading sequences of process executions from left to right, while the other is reading the sequences from right to left. By combining the predictive capabilities of both neural networks, we show that it is possible to calculate sequence alignments, which are used to detect and correct anomalies. DeepAlign utilizes the case-level and event-level attributes to closely model the decisions within a process. We evaluate the performance of our approach on an elaborate data corpus of 252 realistic synthetic event logs and compare it to three state-of-the-art conformance checking methods. DeepAlign produces better corrections than the rest of the field reaching an overall F1 score of 0.9572 across all datasets, whereas the best comparable state-of-the-art method reaches 0.6411.

An interdisciplinary comparison of sequence modeling methods for next-element prediction

Niek Tax, Irene Teinemaa, Sebastiaan van Zelst

Software and Systems Modeling

2020 **

Data of sequential nature arise in many application domains in the form of, e.g., textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) In the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide range of tasks, (ii) in process mining process discovery methods aim to generate human-interpretable descriptive models, and (iii) in the grammar inference field the focus is on finding descriptive models in the form of formal grammars. Despite their different focuses, these fields share a common goal: learning a model that accurately captures the sequential behavior in the underlying data. Those sequence models are generative, i.e., they are able to predict what elements are likely to occur after a given incomplete sequence. So far, these fields have developed mainly in isolation from each other and no comparison exists. This paper presents an interdisciplinary experimental evaluation that compares sequence modeling methods on the task of next-element prediction on four real-life sequence datasets. The results indicate that machine learning methods, which generally do not aim at model interpretability, tend to outperform methods from the process mining and grammar inference fields in terms of accuracy.

Automated simulation and verification of process models discovered by process mining

Ivona Zakarija, Frano Škopljanac-Mačina, Bruno Blašković

Automatika, volume 61, pages 312-324

2020 **

This paper presents a novel approach for automated analysis of process models discovered using process mining techniques. Process mining explores underlying processes hidden in the event data generated by various devices. Our proposed Inductive machine learning method was used to build business process models based on actual event log data obtained from a hotel’s Property Management System (PMS). The PMS can be considered as a Multi Agent System (MAS) because it is integrated with a variety of external systems and IoT devices. Collected event log combines data on guests stay recorded by hotel staff, as well as data streams captured from telephone exchange and other external IoT devices. Next, we performed automated analysis of the discovered process models using formal methods. Spin model checker was used to simulate process model executions and automatically verify the process model. We proposed an algorithm for the automatic transformation of the discovered process model into a verification model. Additionally, we developed a generator of positive and negative examples. In the verification stage, we have also used Linear temporal logic (LTL) to define requested system specifications. We find that the analysis results will be well suited for process model repair.

Adversarial System Variant Approximation to Quantify Process Model Generalization

Julian Theis, Houshang Darabi

IEEE Access

2020 **

In process mining, process models are extracted from event logs using process discovery algorithms and are commonly assessed using multiple quality metrics. While the metrics that measure the relationship of an extracted process model to its event log are well-studied, quantifying the level by which a process model can describe the unobserved behavior of its underlying system falls short in the literature. In this paper, a novel deep learning-based methodology called Adversarial System Variant Approximation (AVATAR) is proposed to overcome this issue. Sequence Generative Adversarial Networks are trained on the variants contained in an event log with the intention to approximate the underlying variant distribution of the system behavior. Unobserved realistic variants are sampled either directly from the Sequence Generative Adversarial Network or by leveraging the Metropolis-Hastings algorithm. The degree by which a process model relates to its underlying unknown system behavior is then quantified based on the realistic observed and estimated unobserved variants using established process model quality metrics. Significant performance improvements in revealing realistic unobserved variants are demonstrated in a controlled experiment on 15 ground truth systems. Additionally, the proposed methodology is experimentally tested and evaluated to quantify the generalization of 60 discovered process models with respect to their systems.

Discovering Process Models from Uncertain Event Data

Pegoraro, Marco, Merih Seran Uysal, and Wil van der Aalst

BPI Workshop 2019, International Conference on Business Process Management. Springer, Cham

2019 **

Modern information systems are able to collect event data in the form of event logs. Process mining techniques allow to discover a model from event data, to check the conformance of an event log against a reference model, and to perform further process-centric analyses. In this paper, we consider uncertain event logs, where data is recorded together with explicit uncertainty information. We describe a technique to discover a directly-follows graph from such event data which retains information about the uncertainty in the process. We then present experimental results of performing inductive mining over the directly-follows graph to obtain models representing the certain and uncertain part of the process.

Extracting Multiple Viewpoint Models from Relational Databases

Alessandro Berti, Wil van der Aalst

SIMPDA 2018 postproceedings

2019

Much time in process mining projects is spent on finding and understanding data sources and extracting the event data needed. As a result, only a fraction of time is spent actually applying techniques to discover, control and predict the business process. Moreover, current process mining techniques assume a single case notion. However, in real-life processes often different case notions are intertwined. For example, events of the same order handling process may refer to customers, orders, order lines, deliveries, and payments. Therefore, we propose to use Multiple Viewpoint (MVP) models that relate events through objects and that relate activities through classes. The required event data are much closer to existing relational databases. MVP models provide a holistic view on the process, but also allow for the extraction of classical event logs using different viewpoints. This way existing process mining techniques can be used for each viewpoint without the need for new data extractions and transformations. We provide a toolchain allowing for the discovery of MVP models (annotated with performance and frequency information) from relational databases. Moreover, we demonstrate that classical process mining techniques can be applied to any selected viewpoint.

Integrated, Ubiquitous and Collaborative Process Mining with Chat Bots

Burattin, Andrea

BPI 2019 Demos; 17th Int. Conference on Business Process Management. CEUR-WS

2019 **

Within the process mining field we are witnessing a tremendous growth of applications and development frameworks available to perform data analyses. Such growth, which is very positive and desirable, comes with the cost of learning each new tool and difficulties in integrating different systems in order to complement the analyses. In addition, we are noticing the lack of tools enabling collaboration among the users involved in a project. Finally, we think it would be highly recommended to enable ubiquitous processing of data. This paper proposes a solution to all these issues by presenting a chat bot which can be included in discussions to enable the execution of process mining directly from the chat.

Evaluating the Effectiveness of Interactive Process Discovery in Healthcare: A Case Study

Benevento, E., Dixit, P. M., Sani, M. F., Aloini, D., & van der Aalst, W. M.

International Conference on Business Process Management, Springer, Cham

2019 **

This work aims at investigating the effectiveness and suitability of Interactive Process Discovery, an innovative Process Mining technique, to model healthcare processes in a data-driven manner. Interactive Process Discovery allows the analyst to interactively discover the process model, exploiting his domain knowledge along with the event log. In so doing, a comparative evaluation against the traditional automated discovery techniques is carried out to assess the potential benefits that domain knowledge brings in improving both the accuracy and the understandability of the process model. The comparison is performed by using a real dataset from an Italian Hospital, in collaboration with the medical staff. Preliminary results show that Interactive Process Discovery allows to obtain an accurate and fully compliant with clinical guidelines process model with respect to the automated discovery techniques. Discovering an accurate and understandable process model is an important starting point for subsequent process analysis and improvement steps, especially in complex environments, such as healthcare.

Reviving Token-based Replay: Increasing Speed While Improving Diagnostics

Berti, Alessandro, and Wil van der Aalst

Algorithms & Theories for the Analysis of Event Data (ATAED’2019)

2019

Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., “token ﬂooding”) and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the pm4py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics.