This page lists publications about PM4Py, and, publications that describe approaches using PM4Py. We first list works describing PM4Py, after which we present work building on top of it (chronologically).

Publications about PM4Py

Process Mining for Python (PM4Py): Bridging the Gap between Process-and Data Science
Berti, Alessandro, Sebastiaan J. van Zelst, and Wil van der Aalst
Proceedings of the ICPM Demo Track 2019, co-located with 1st International Conference on Process Mining (ICPM 2019)
2019
Process mining, i.e., a sub-field of data science focusing on the analysis of event data generated during the execution of (business) processes, has seen a tremendous change over the past two decades. Starting off in the early 2000’s, with limited to no tool support, nowadays, several software tools, i.e., both open-source, e.g., ProM and Apromore, and commercial, e.g., Disco, Celonis, ProcessGold, etc., exist. [...]
PM4Py Web Services: Easy Development, Integration and Deployment of Process Mining Features in any Application Stack
Berti, Alessandro, Sebastiaan J. van Zelst, and Wil van der Aalst
BPM Demo Track
2019
In recent years, process mining emerged as a set of techniques to analyze process data, supported by different open-source and commercial solutions. Process mining tools aim to discover process models from the data, perform conformance checking, predict the future behavior of the process and/or provide other analyses that enhance the overall process knowledge. [...]

Publications using PM4Py

Is your publication missing? Just let us know via the following form:

Send us your publication

To write us to add your publication, please use the form below
(or drop an email at pm4py@fit.fraunhofer.de).
A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement
Alessandro Berti, Wil van der Aalst
ToPNoC (Transactions on Petri Nets and other models of Concurrency)
2020
Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., "token flooding") and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the PM4Py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics. %Moreover, a revision of an existing precision measure (ETConformance) will be proposed through integration with the token-based replayer.
Process Mining for Production Processes in the Automotive Industry
Merih Seran Uysal, Sebastiaan van Zelst, Tobias Brockhoff, Anahita Farhang Ghahfarokhi, Mahsa Pourbafrani, Ruben Schumacher, Sebastian Junglas, Guenther Schuh and Wil van der Aalst
Industrial Track, International Conference on Business Process Management
2020
The increasing digitization of organizations leads to unprecedented amounts of data capturing the behavior of operational processes. On the basis of such data, process mining techniques allow us to obtain a holistic picture of the execution of a company’s processes, and their related events. In particular, production companies aiming at reducing the production cycle time and ensuring a high product quality show an increased interest in utilizing process mining in order to identify deviations and bottlenecks in their production processes. In this paper, we present a use case study in which we rigorously investigate how process mining techniques can successfully be applied to real-world data of the car production company e.GO Mobile AG. Furthermore, we present our results facilitating more transparency and valuable insights into the real processes of the company.
Online Process Monitoring Using Incremental State-Space Expansion: An Exact Algorithm
Daniel Schuster, Sebastiaan van Zelst
Main track, International Conference in Business Process Management
2020
The execution of (business) processes generates valuable traces of event data in the information systems employed within companies. Recently, approaches for monitoring the correctness of the execution of running processes have been developed in the area of process mining, i.e., online conformance checking. The advantages of monitoring a process’ conformity during its execution are clear, i.e., deviations are detected as soon as they occur and countermeasures can immediately be initiated to reduce the possible negative effects caused by process deviations. Existing work in online conformance checking only allows for obtaining approximations of non-conformity, e.g., overestimating the actual severity of the deviation. In this paper, we present an exact, parameter-free, online conformance checking algorithm that computes conformance checking results on the fly. Our algorithm exploits the fact that the conformance checking problem can be reduced to a shortest path problem, by incrementally expanding the search space and reusing previously computed intermediate results. Our experiments show that our algorithm is able to outperform comparable state-of-the-art approximation algorithms.
PRIPEL: Privacy-Preserving Event Log Publishing Including Contextual Information
Stephan A. Fahrenkrog-Petersen, Han van der Aa, Matthias Weidlich
International Conference on Business Process Management
2020
Event logs capture the execution of business processes in terms of executed activities and their execution context. Since logs contain potentially sensitive information about the individuals involved in the process, they should be pre-processed before being published to preserve the individuals' privacy. However, existing techniques for such pre-processing are limited to a process' control-flow and neglect contextual information, such as attribute values and durations. This thus precludes any form of process analysis that involves contextual factors. To bridge this gap, we introduce PRIPEL, a framework for privacy-aware event log publishing. Compared to existing work, PRIPEL takes a fundamentally different angle and ensures privacy on the level of individual cases instead of the complete log. This way, contextual information as well as the long tail process behaviour are preserved, which enables the application of a rich set of process analysis techniques. We demonstrate the feasibility of our framework in a case study with a real-world event log.
Optimized SAT encoding of conformance checking artefacts
Mathilde Boltenhagen, Thomas Chatain, Josep Carmona
Computing (Journal)
2020
Conformance checking is a growing discipline that aims at assisting organizations in monitoring their processes. On its core, conformance checking relies on the computation of particular artefacts which enable reasoning on the relation between observed and modeled behavior. It is widely acknowledge that the computation of these artifacts is the lion’s share of conformance checking techniques. This paper shows how important conformance artefacts like alignments, anti-alignments or multi-alignments, defined over the Levenshtein edit distance, can be efficiently computed by encoding the problem as an optimized SAT instance. From a general perspective, the work advocates for a unified family of techniques that can compute conformance artefacts in the same way. The implementation of the techniques presented in this paper show capabilities for dealing with both synthetic and real-life instances, which may open the door for a fresh way of applying conformance checking in the near future.
Process mining-based approach for investigating malicious login events
Sofiane Lagraa, Radu State
NOMS 2020
2020
A large body of research has been accomplished on prevention and detection of malicious events, attacks, threats, or botnets. However, there is a lack of automatic and sophisticated methods for investigating malicious events/users, understanding the root cause of attacks, and discovering what is really happening before an attack. In this paper, we propose an attack model discovery approach for investigating and mining malicious authentication events across user accounts. The approach is based on process mining techniques on event logs reaching attacks in order to extract the behavior of malicious users. The evaluation is performed on a publicly large dataset, where we extract models of the behavior of malicious users via authentication events. The results are useful for security experts in order to improve defense tools by making them robust and develop attack simulations.
Incremental Discovery of Hierarchical Process Models
Daniel Schuster, Sebastiaan van Zelst, Wil van der Aalst
RCIS 2020
2020
Many of today’s information systems record the execution of (business) processes in great detail. Process mining utilizes such data and aims to extract valuable insights. Process discovery, a key research area in process mining, deals with the construction of process models based on recorded process behavior. Existing process discovery algorithms aim to provide a “push-button-technology”, i.e., the algorithms discover a process model in a completely automated fashion. However, real data often contain noisy and/or infrequent complex behavioral patterns. As a result, the incorporation of all behavior leads to very imprecise or overly complex process models. At the same time, data pre-processing techniques have shown to be able to improve the precision of process models, i.e., without explicitly using domain knowledge. Yet, to obtain superior process discovery results, human input is still required. Therefore, we propose a discovery algorithm that allows a user to incrementally extend a process model by new behavior. The proposed algorithm is designed to localize and repair nonconforming process model parts by exploiting the hierarchical structure of the given process model. The evaluation shows that the process models obtained with our algorithm, which allows for incremental extension of a process model, have, in many cases, superior characteristics in comparison to process models obtained by using existing process discovery and model repair techniques.
Truncated Trace Classifier. Removal of Incomplete Traces from Event Logs
Gael Bernard, Periklis Andritsos
BPMDS-EMMSAD 2020
2020
We consider truncated traces, which are incomplete sequences of events. This typically happens when dealing with streaming data or when the event log extraction process cuts the end of the trace. The existence of truncated traces in event logs and their negative impacts on process mining outcomes have been widely acknowledged in the literature. Still, there is a lack of research on algorithms to detect them. We propose the Truncated Trace Classifier (TTC), an algorithm that distinguishes truncated traces from the ones that are not truncated. We benchmark 5 TTC implementations that use either LSTM or XGBOOST on 13 real-life event logs. Accurate TTCs have great potential. In fact, filtering truncated traces before applying a process discovery algorithm greatly improves the precision of the discovered process models, by 9.1%. Moreover, we show that TTCs increase the accuracy of a next event prediction algorithm by up to 7.5%.
A Generic Framework for Attribute-Driven Hierarchical Trace Clustering
Sebastiaan van Zelst, Yukun Cao
BPI Workshop 2020, International Conference on Business Process Management
2020
The execution of business processes often entails a specific process execution context, e.g. a customer, service or product. Often, the corresponding event data logs indicators of such an execution context, e.g., a customer type (bronze, silver, gold or platinum). Typically, variations in the execution of a process exist for the different execution context of a process. To gain a better understanding of the global process execution, it is interesting to study the behavioral (dis)similarity between different execution contexts of a process. However, in real business settings, the exact number of execution contexts might be too large to analyze manually. At the same time, current trace clustering techniques do not take process type information into account, i.e., they are solely behaviorally driven. Hence, in this paper, we present a hierarchical data-attribute-driven trace clustering framework that allows us to compare the behavior of different groups of traces. Our evaluation shows that the incorporation of data-attributes in trace clustering yields interesting novel process insights.
Efficient Construction of Behavior Graphs for Uncertain Event Data
Marco Pegoraro, Merih Seran Uysal, Wil van der Aalst
BIS 2020
2020
The discipline of process mining deals with analyzing execution data of operational processes, extracting models from event data, checking the conformance between event data and normative models, and enhancing all aspects of processes. Recently, new techniques have been developed to analyze event data containing uncertainty; these techniques strongly rely on representing uncertain event data through graph-based models capturing uncertainty. In this paper we present a novel approach to efficiently compute a graph representation of the behavior contained in an uncertain process trace. We present our new algorithm, analyze its time complexity, and report experimental results showing order-of-magnitude performance improvements for behavior graph construction.
Evaluating the Effectiveness of Interactive Process Discovery in Healthcare: A Case Study
Elisabetta Benevento, Prabhakar M. Dixit, M. F. Sani, Davide Aloini, Wil van der Aalst
POD4H Workshop, International Conference on Business Process Management
2019
This work aims at investigating the effectiveness and suitability of Interactive Process Discovery, an innovative Process Mining technique, to model healthcare processes in a data-driven manner. Interactive Process Discovery allows the analyst to interactively discover the process model, exploiting his domain knowledge along with the event log. In so doing, a comparative evaluation against the traditional automated discovery techniques is carried out to assess the potential benefits that domain knowledge brings in improving both the quality and the understandability of the process model. The comparison is performed by using a real dataset from an Italian Hospital, in collaboration with the medical staff. Preliminary results show that Interactive Process Discovery allows to obtain an accurate and fully compliant with clinical guidelines process model with respect to the automated discovery techniques. Discovering an accurate and comprehensible process model is an important starting point for subsequent process analysis and improvement steps, especially in complex environments, such as healthcare.
Automated Generation of Business Process Models using Constraint Logic Programming in Python
Tymoteusz Paszun, Piotr Wiśniewski, Krzysztof Kluza, Antoni Lige
FedCSIS 2019
2019
High complexity of business processes in real-life organizations is a constantly rising issue. In consequence, modeling a workflow is a challenge for process stakeholders. Yet, to facilitate this task, new methods can be implemented to automate the phase of process design. As a main contribution of this paper, we propose an approach to generate process models based on activities performed by the participants, where the exact order of execution does not need to be specified. Nevertheless, the goal of our method is to generate artificial workflow traces of a process using Constraint Programming and a set of predefined rules. As a final step, the approach was implemented as a dedicated tool and evaluated on a set of test examples that prove that our method is capable of creating correct process models.
Cherry-Picking from Spaghetti: Multi-range Filtering of Event Logs
Maxim Vidgof, Djordje Djurica, Saimir Bala, Jan Mendling
BPMDS-EMMSAD 2020
2020
Mining real-life event logs results into process models which provide little value to the process analyst without support for handling complexity. Filtering techniques are specifically helpful to tackle this problem. These techniques have been focusing on leaving out infrequent aspects of the process which are considered outliers. However, it is exactly in these outliers where it is possible to gather important insights on the process. This paper addresses this problem by defining multi-range filtering. Our technique not only allows to combine both frequent and non-frequent aspects of the process but it supports any user-defined intervals of frequency of activities and variants. We evaluate our approach through a prototype based on the PM4Py library and show the benefits in comparison to existing filtering techniques.
DeepAlign: Alignment-Based Process Anomaly Correction Using Recurrent Neural Networks
Timo Nolle, Alexander Seeliger, Nils Thoma, Max Mühlhäuser
CAiSE 2020: Advanced Information Systems Engineering pp 319-333
2020
In this paper, we propose DeepAlign, a novel approach to multi-perspective process anomaly correction, based on recurrent neural networks and bidirectional beam search. At the core of the DeepAlign algorithm are two recurrent neural networks trained to predict the next event. One is reading sequences of process executions from left to right, while the other is reading the sequences from right to left. By combining the predictive capabilities of both neural networks, we show that it is possible to calculate sequence alignments, which are used to detect and correct anomalies. DeepAlign utilizes the case-level and event-level attributes to closely model the decisions within a process. We evaluate the performance of our approach on an elaborate data corpus of 252 realistic synthetic event logs and compare it to three state-of-the-art conformance checking methods. DeepAlign produces better corrections than the rest of the field reaching an overall F1 score of 0.9572 across all datasets, whereas the best comparable state-of-the-art method reaches 0.6411.
An interdisciplinary comparison of sequence modeling methods for next-element prediction
Niek Tax, Irene Teinemaa, Sebastiaan van Zelst
Software and Systems Modeling
2020
Data of sequential nature arise in many application domains in the form of, e.g., textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) In the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide range of tasks, (ii) in process mining process discovery methods aim to generate human-interpretable descriptive models, and (iii) in the grammar inference field the focus is on finding descriptive models in the form of formal grammars. Despite their different focuses, these fields share a common goal: learning a model that accurately captures the sequential behavior in the underlying data. Those sequence models are generative, i.e., they are able to predict what elements are likely to occur after a given incomplete sequence. So far, these fields have developed mainly in isolation from each other and no comparison exists. This paper presents an interdisciplinary experimental evaluation that compares sequence modeling methods on the task of next-element prediction on four real-life sequence datasets. The results indicate that machine learning methods, which generally do not aim at model interpretability, tend to outperform methods from the process mining and grammar inference fields in terms of accuracy.
Automated simulation and verification of process models discovered by process mining
Ivona Zakarija, Frano Škopljanac-Mačina, Bruno Blašković
Automatika, volume 61, pages 312-324
2020
This paper presents a novel approach for automated analysis of process models discovered using process mining techniques. Process mining explores underlying processes hidden in the event data generated by various devices. Our proposed Inductive machine learning method was used to build business process models based on actual event log data obtained from a hotel’s Property Management System (PMS). The PMS can be considered as a Multi Agent System (MAS) because it is integrated with a variety of external systems and IoT devices. Collected event log combines data on guests stay recorded by hotel staff, as well as data streams captured from telephone exchange and other external IoT devices. Next, we performed automated analysis of the discovered process models using formal methods. Spin model checker was used to simulate process model executions and automatically verify the process model. We proposed an algorithm for the automatic transformation of the discovered process model into a verification model. Additionally, we developed a generator of positive and negative examples. In the verification stage, we have also used Linear temporal logic (LTL) to define requested system specifications. We find that the analysis results will be well suited for process model repair.
Adversarial System Variant Approximation to Quantify Process Model Generalization
Julian Theis, Houshang Darabi
arXiv preprint arXiv:2003.12168
2020
In process mining, process models are extracted from event logs using process discovery algorithms and are commonly assessed using multiple quality metrics. While the metrics that measure the relationship of an extracted process model to its event log are well-studied, quantifying the level by which a process model can describe the unobserved behavior of its underlying system falls short in the literature. In this paper, a novel deep learning-based methodology called Adversarial System Variant Approximation (AVATAR) is proposed to overcome this issue. Sequence Generative Adversarial Networks are trained on the variants contained in an event log with the intention to approximate the underlying variant distribution of the system behavior. Unobserved realistic variants are sampled either directly from the Sequence Generative Adversarial Network or by leveraging the Metropolis-Hastings algorithm. The degree by which a process model relates to its underlying unknown system behavior is then quantified based on the realistic observed and estimated unobserved variants using established process model quality metrics. Significant performance improvements in revealing realistic unobserved variants are demonstrated in a controlled experiment on 15 ground truth systems. Additionally, the proposed methodology is experimentally tested and evaluated to quantify the generalization of 60 discovered process models with respect to their systems.
Discovering Process Models from Uncertain Event Data
Pegoraro, Marco, Merih Seran Uysal, and Wil van der Aalst
BPI Workshop 2019, International Conference on Business Process Management. Springer, Cham
2019
Modern information systems are able to collect event data in the form of event logs. Process mining techniques allow to discover a model from event data, to check the conformance of an event log against a reference model, and to perform further process-centric analyses. In this paper, we consider uncertain event logs, where data is recorded together with explicit uncertainty information. We describe a technique to discover a directly-follows graph from such event data which retains information about the uncertainty in the process. We then present experimental results of performing inductive mining over the directly-follows graph to obtain models representing the certain and uncertain part of the process.
Extracting Multiple Viewpoint Models from Relational Databases
Alessandro Berti, Wil van der Aalst
SIMPDA 2018 postproceedings
2019
Much time in process mining projects is spent on finding and understanding data sources and extracting the event data needed. As a result, only a fraction of time is spent actually applying techniques to discover, control and predict the business process. Moreover, current process mining techniques assume a single case notion. However, in real-life processes often different case notions are intertwined. For example, events of the same order handling process may refer to customers, orders, order lines, deliveries, and payments. Therefore, we propose to use Multiple Viewpoint (MVP) models that relate events through objects and that relate activities through classes. The required event data are much closer to existing relational databases. MVP models provide a holistic view on the process, but also allow for the extraction of classical event logs using different viewpoints. This way existing process mining techniques can be used for each viewpoint without the need for new data extractions and transformations. We provide a toolchain allowing for the discovery of MVP models (annotated with performance and frequency information) from relational databases. Moreover, we demonstrate that classical process mining techniques can be applied to any selected viewpoint.
Increasing Scalability of Process Mining using Event Dataframes: How Data Structure Matters
Berti, Alessandro
arXiv preprint arXiv:1907.12817
2019
Process Mining is a branch of Data Science that aims to extract process-related information from event data contained in information systems, that is steadily increasing in amount. Many algorithms, and a general-purpose open source framework (ProM 6), have been developed in the last years for process discovery, conformance checking, machine learning on event data. However, in very few cases scalability has been a target, prioritizing the quality of the output over the execution speed and the optimization of resources. This is making progressively more difficult to apply process mining with mainstream workstations on reallife event data with any open source process mining framework. Hence, exploring more scalable storage techniques, in-memory data structures, more performant algorithms is a strictly incumbent need. In this paper, we propose the usage of mainstream columnar storages and dataframes to increase the scalability of process mining. These can replace the classic event log structures in most tasks, but require completely different implementations with regards to mainstream process mining algorithms. Dataframes will be defined, some algorithms on such structures will be presented and their complexity will be calculated.
Integrated, Ubiquitous and Collaborative Process Mining with Chat Bots
Burattin, Andrea
BPI 2019 Demos; 17th Int. Conference on Business Process Management. CEUR-WS
2019
Within the process mining field we are witnessing a tremendous growth of applications and development frameworks available to perform data analyses. Such growth, which is very positive and desirable, comes with the cost of learning each new tool and difficulties in integrating different systems in order to complement the analyses. In addition, we are noticing the lack of tools enabling collaboration among the users involved in a project. Finally, we think it would be highly recommended to enable ubiquitous processing of data. This paper proposes a solution to all these issues by presenting a chat bot which can be included in discussions to enable the execution of process mining directly from the chat.
Evaluating the Effectiveness of Interactive Process Discovery in Healthcare: A Case Study
Benevento, E., Dixit, P. M., Sani, M. F., Aloini, D., & van der Aalst, W. M.
International Conference on Business Process Management, Springer, Cham
2019
This work aims at investigating the effectiveness and suitability of Interactive Process Discovery, an innovative Process Mining technique, to model healthcare processes in a data-driven manner. Interactive Process Discovery allows the analyst to interactively discover the process model, exploiting his domain knowledge along with the event log. In so doing, a comparative evaluation against the traditional automated discovery techniques is carried out to assess the potential benefits that domain knowledge brings in improving both the accuracy and the understandability of the process model. The comparison is performed by using a real dataset from an Italian Hospital, in collaboration with the medical staff. Preliminary results show that Interactive Process Discovery allows to obtain an accurate and fully compliant with clinical guidelines process model with respect to the automated discovery techniques. Discovering an accurate and understandable process model is an important starting point for subsequent process analysis and improvement steps, especially in complex environments, such as healthcare.
Reviving Token-based Replay: Increasing Speed While Improving Diagnostics
Berti, Alessandro, and Wil van der Aalst
Algorithms & Theories for the Analysis of Event Data (ATAED’2019)
2019
Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., “token flooding”) and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the PM4Py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics.
Anti-Alignments – Measuring The Precision of Process Models and Event Logs
Chatain, Thomas and Boltenhagen, Mathilde and Carmona, Josep
arXiv preprint arXiv:1912.05907
2019
Processes are a crucial artefact in organizations, since they coordinate the execution of activities so that products and services are provided. The use of models to analyse the underlying processes is a wellknown practice. However, due to the complexity and continuous evolution of their processes, organizations need an effective way of analysing the relation between processes and models. Conformance checking techniques asses the suitability of a process model in representing an underlying process, observed through a collection of real executions. One important metric in conformance checking is to asses the precision of the model with respect to the observed executions, i.e., characterize the ability of the model to produce behavior unrelated to the one observed. In this paper we present the notion of anti-alignment as a concept to help unveiling runs in the model that may deviate significantly from the observed behavior. Using anti-alignments, a new metric for precision is proposed. In contrast to existing metrics, anti-alignment based precision metrics satisfy most of the required axioms highlighted in a recent publication. Moreover, a complexity analysis of the problem of computing anti-alignments is provided, which sheds light into the practicability of using anti-alignment to estimate precision. Experiments are provided that witness the validity of the concepts introduced in this paper.