This page lists publications about PM4Py, and, publications that describe approaches using PM4Py. We first list works describing PM4Py, after which we present work building on top of it (chronologically).

Publications about PM4Py

Process Mining for Python (PM4Py): Bridging the Gap between Process-and Data Science
Berti, Alessandro, Sebastiaan J. van Zelst, and Wil van der Aalst
Proceedings of the ICPM Demo Track 2019, co-located with 1st International Conference on Process Mining (ICPM 2019)
Process mining, i.e., a sub-field of data science focusing on the analysis of event data generated during the execution of (business) processes, has seen a tremendous change over the past two decades. Starting off in the early 2000’s, with limited to no tool support, nowadays, several software tools, i.e., both open-source, e.g., ProM and Apromore, and commercial, e.g., Disco, Celonis, ProcessGold, etc., exist. [...]
PM4Py Web Services: Easy Development, Integration and Deployment of Process Mining Features in any Application Stack
Berti, Alessandro, Sebastiaan J. van Zelst, and Wil van der Aalst
BPM Demo Track
In recent years, process mining emerged as a set of techniques to analyze process data, supported by different open-source and commercial solutions. Process mining tools aim to discover process models from the data, perform conformance checking, predict the future behavior of the process and/or provide other analyses that enhance the overall process knowledge. [...]

Publications using PM4Py

Is your publication missing? Just let us know via the following form:

Send us your publication

To write us to add your publication, please use the form below
(or drop an email at
Automated simulation and verification of process models discovered by process mining
Ivona Zakarija, Frano Škopljanac-Mačina, Bruno Blašković
Automatika, volume 61, pages 312-324
This paper presents a novel approach for automated analysis of process models discovered using process mining techniques. Process mining explores underlying processes hidden in the event data generated by various devices. Our proposed Inductive machine learning method was used to build business process models based on actual event log data obtained from a hotel’s Property Management System (PMS). The PMS can be considered as a Multi Agent System (MAS) because it is integrated with a variety of external systems and IoT devices. Collected event log combines data on guests stay recorded by hotel staff, as well as data streams captured from telephone exchange and other external IoT devices. Next, we performed automated analysis of the discovered process models using formal methods. Spin model checker was used to simulate process model executions and automatically verify the process model. We proposed an algorithm for the automatic transformation of the discovered process model into a verification model. Additionally, we developed a generator of positive and negative examples. In the verification stage, we have also used Linear temporal logic (LTL) to define requested system specifications. We find that the analysis results will be well suited for process model repair.
Adversarial System Variant Approximation to Quantify Process Model Generalization
Julian Theis, Houshang Darabi
arXiv preprint arXiv:2003.12168
In process mining, process models are extracted from event logs using process discovery algorithms and are commonly assessed using multiple quality metrics. While the metrics that measure the relationship of an extracted process model to its event log are well-studied, quantifying the level by which a process model can describe the unobserved behavior of its underlying system falls short in the literature. In this paper, a novel deep learning-based methodology called Adversarial System Variant Approximation (AVATAR) is proposed to overcome this issue. Sequence Generative Adversarial Networks are trained on the variants contained in an event log with the intention to approximate the underlying variant distribution of the system behavior. Unobserved realistic variants are sampled either directly from the Sequence Generative Adversarial Network or by leveraging the Metropolis-Hastings algorithm. The degree by which a process model relates to its underlying unknown system behavior is then quantified based on the realistic observed and estimated unobserved variants using established process model quality metrics. Significant performance improvements in revealing realistic unobserved variants are demonstrated in a controlled experiment on 15 ground truth systems. Additionally, the proposed methodology is experimentally tested and evaluated to quantify the generalization of 60 discovered process models with respect to their systems.
Discovering Process Models from Uncertain Event Data
Pegoraro, Marco, Merih Seran Uysal, and Wil M.P. van der Aalst
International Conference on Business Process Management. Springer, Cham
Modern information systems are able to collect event data in the form of event logs. Process mining techniques allow to discover a model from event data, to check the conformance of an event log against a reference model, and to perform further process-centric analyses. In this paper, we consider uncertain event logs, where data is recorded together with explicit uncertainty information. We describe a technique to discover a directly-follows graph from such event data which retains information about the uncertainty in the process. We then present experimental results of performing inductive mining over the directly-follows graph to obtain models representing the certain and uncertain part of the process.
Increasing Scalability of Process Mining using Event Dataframes: How Data Structure Matters
Berti, Alessandro
arXiv preprint arXiv:1907.12817
Process Mining is a branch of Data Science that aims to extract process-related information from event data contained in information systems, that is steadily increasing in amount. Many algorithms, and a general-purpose open source framework (ProM 6), have been developed in the last years for process discovery, conformance checking, machine learning on event data. However, in very few cases scalability has been a target, prioritizing the quality of the output over the execution speed and the optimization of resources. This is making progressively more difficult to apply process mining with mainstream workstations on reallife event data with any open source process mining framework. Hence, exploring more scalable storage techniques, in-memory data structures, more performant algorithms is a strictly incumbent need. In this paper, we propose the usage of mainstream columnar storages and dataframes to increase the scalability of process mining. These can replace the classic event log structures in most tasks, but require completely different implementations with regards to mainstream process mining algorithms. Dataframes will be defined, some algorithms on such structures will be presented and their complexity will be calculated.
Integrated, Ubiquitous and Collaborative Process Mining with Chat Bots
Burattin, Andrea
17th Int. Conference on Business Process Management. CEUR-WS
Within the process mining field we are witnessing a tremendous growth of applications and development frameworks available to perform data analyses. Such growth, which is very positive and desirable, comes with the cost of learning each new tool and difficulties in integrating different systems in order to complement the analyses. In addition, we are noticing the lack of tools enabling collaboration among the users involved in a project. Finally, we think it would be highly recommended to enable ubiquitous processing of data. This paper proposes a solution to all these issues by presenting a chat bot which can be included in discussions to enable the execution of process mining directly from the chat.
Evaluating the Effectiveness of Interactive Process Discovery in Healthcare: A Case Study
Benevento, E., Dixit, P. M., Sani, M. F., Aloini, D., & van der Aalst, W. M.
International Conference on Business Process Management, Springer, Cham
This work aims at investigating the effectiveness and suitability of Interactive Process Discovery, an innovative Process Mining technique, to model healthcare processes in a data-driven manner. Interactive Process Discovery allows the analyst to interactively discover the process model, exploiting his domain knowledge along with the event log. In so doing, a comparative evaluation against the traditional automated discovery techniques is carried out to assess the potential benefits that domain knowledge brings in improving both the accuracy and the understandability of the process model. The comparison is performed by using a real dataset from an Italian Hospital, in collaboration with the medical staff. Preliminary results show that Interactive Process Discovery allows to obtain an accurate and fully compliant with clinical guidelines process model with respect to the automated discovery techniques. Discovering an accurate and understandable process model is an important starting point for subsequent process analysis and improvement steps, especially in complex environments, such as healthcare.
Reviving Token-based Replay: Increasing Speed While Improving Diagnostics
Berti, Alessandro, and Wil van der Aalst
Algorithms & Theories for the Analysis of Event Data (ATAED’2019)
Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., “token flooding”) and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the PM4Py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics.
Anti-Alignments – Measuring The Precision of Process Models and Event Logs
Chatain, Thomas and Boltenhagen, Mathilde and Carmona, Josep
arXiv preprint arXiv:1912.05907
Processes are a crucial artefact in organizations, since they coordinate the execution of activities so that products and services are provided. The use of models to analyse the underlying processes is a wellknown practice. However, due to the complexity and continuous evolution of their processes, organizations need an effective way of analysing the relation between processes and models. Conformance checking techniques asses the suitability of a process model in representing an underlying process, observed through a collection of real executions. One important metric in conformance checking is to asses the precision of the model with respect to the observed executions, i.e., characterize the ability of the model to produce behavior unrelated to the one observed. In this paper we present the notion of anti-alignment as a concept to help unveiling runs in the model that may deviate significantly from the observed behavior. Using anti-alignments, a new metric for precision is proposed. In contrast to existing metrics, anti-alignment based precision metrics satisfy most of the required axioms highlighted in a recent publication. Moreover, a complexity analysis of the problem of computing anti-alignments is provided, which sheds light into the practicability of using anti-alignment to estimate precision. Experiments are provided that witness the validity of the concepts introduced in this paper.