pm4py - Process Mining for Python

pm4py implements state-of-the-art process mining algorithms, i.e., 'fresh of the academical press'. Here, you can find an overview of the different algorithms implemented in pm4py and the papers describing them.

Process Discovery

Inductive Miner

Discovering block-structured process models from event logs-a constructive approach

Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst

International conference on applications and theory of Petri nets and concurrency

2013

Process discovery is the problem of, given a log of observed behaviour, finding a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm guarantees to return a fitting model (i.e., able to reproduce all observed behaviour) that is sound (free of deadlocks and other anomalies) in finite time. We present an extensible framework to discover from any given log a set of block-structured process models that are sound and fit the observed behaviour. In addition we characterise the minimal information required in the log to rediscover a particular process model. We then provide a polynomial-time algorithm for discovering a sound, fitting, block-structured model from any given log; we give sufficient conditions on the log for which our algorithm returns a model that is language-equivalent to the process model underlying the log, including unseen behaviour. The technique is implemented in a prototypical tool.

Inductive Miner Infrequent

Discovering block-structured process models from event logs containing infrequent behaviour

Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst

International conference on business process management

2013

Given an event log describing observed behaviour, process discovery aims to find a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm returns a sound model in all cases (free of deadlocks and other anomalies), handles infrequent behaviour well and finishes quickly. We present a technique able to cope with infrequent behaviour and large event logs, while ensuring soundness. The technique has been implemented in ProM and we compare the technique with existing approaches in terms of quality and performance.

Inductive Miner Directly-Follows

Scalable process discovery and conformance checking

Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst

Software & Systems Modeling 17.2

2018

Considerable amounts of data, including process events, are collected and stored by organisations nowadays. Discovering a process model from such event data and verification of the quality of discovered models are important steps in process mining. Many discovery techniques have been proposed, but none of them combines scalability with strong quality guarantees. We would like such techniques to handle billions of events or thousands of activities, to produce sound models (without deadlocks and other anomalies), and to guarantee that the underlying process can be rediscovered when sufficient information is available. In this paper, we introduce a framework for process discovery that ensures these properties while passing over the log only once and introduce three algorithms using the framework. To measure the quality of discovered models for such large logs, we introduce a model–model and model–log comparison framework that applies a divide-and-conquer strategy to measure recall, fitness, and precision. We experimentally show that these discovery and measuring techniques sacrifice little compared to other algorithms, while gaining the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities on a standard computer.

Heuristics Miner

Process mining with the heuristics miner-algorithm

Weijters, A. J. M. M., Wil MP van Der Aalst, and AK Alves De Medeiros

Technische Universiteit Eindhoven, Tech. Rep. WP 166

2006

The basic idea of process mining is to extract knowledge from event logs recorded by an information system. Until recently, the information in these event logs was rarely used to analyze the underlying processes. Process mining aims at improving this by providing techniques and tools for discovering process, organizational, social, and performance information from event logs. Fuelled by the omnipresence of event logs in transactional information systems (cf. WFM, ERP, CRM, SCM, and B2B systems), process mining has become a vivid research area [1, 2]. In this paper we introduce the challenging process mining domain and discuss a heuristics driven process mining algorithm; the so-called “HeuristicsMiner” in detail. HeuristicsMiner is a practical applicable mining algorithm that can deal with noise, and can be used to express the main behavior (i.e. not all details and exceptions) registered in an event log. In the experimental section of this paper we introduce benchmark material (12.000 different event logs) and measurements by which the performance of process mining algorithms can be measured.

Correlation Miner

Correlation miner: mining business process models and event correlations without case identifiers

Pourmirza, Shaya, Remco Dijkman, and Paul Grefen

International Journal of Cooperative Information Systems 26.02

2017

Process discovery algorithms aim to capture process models from event logs. These algorithms have been designed for logs in which the events that belong to the same case are related to each other — and to that case — by means of a unique case identifier. However, in service-oriented systems, these case identifiers are rarely stored beyond request-response pairs, which makes it hard to relate events that belong to the same case. This is known as the correlation challenge. This paper addresses the correlation challenge by introducing a technique, called the correlation miner, that facilitates discovery of business process models when events are not associated with a case identifier. It extends previous work on the correlation miner, by not only enabling the discovery of the process model, but also detecting which events belong to the same case. Experiments performed on both synthetic and real-world event logs show the applicability of the correlation miner. The resulting technique enables us to observe a service-oriented system and determine — with high accuracy — which request-response pairs sent by different communicating parties are related to each other.

Prefix Tree

Skeletal algorithms in process mining

Przybylek, Michal R

Springer, Berlin, Heidelberg

2013

This paper studies sample applications of skeletal algorithm to process mining and automata discovery. The basic idea behind the skeletal algorithm is to express a problem in terms of congruences on a structure, build an initial set of congruences, and improve it by taking limited unions/intersections, until a suitable condition is reached. Skeletal algorithms naturally arise in the context of process mining and automata discovery, where the skeleton is the “free” structure on initial data and a congruence corresponds to similarities in data. In such a context, skeletal algorithms come equipped with fitness functions measuring the complexity of a model. We examine two fitness functions for our sample problem — one based on Minimum Description Length Principle, and the other based on Bayesian Interpretation.

Conformance Checking

Token-based Replay

A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement

Alessandro Berti, Wil van der Aalst

ToPNoC (Transactions on Petri Nets and other models of Concurrency)

2020

Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., "token flooding") and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the pm4py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics. %Moreover, a revision of an existing precision measure (ETConformance) will be proposed through integration with the token-based replayer.

Alignments

Conformance checking using cost-based fitness analysis

Adriansyah, Arya, Boudewijn F. van Dongen, and Wil van der Aalst

2011 IEEE 15th International Enterprise Distributed Object Computing Conference

2011

The growing complexity of processes in many organizations stimulates the adoption of business process analysis techniques. Typically, such techniques are based on process models and assume that the operational processes in reality conform to these models. However, experience shows that reality often deviates from hand-made models. Therefore, the problem of checking to what extent the operational process conforms to the process model is important for process management, process improvement, and compliance. In this paper, we present a robust replay analysis technique that is able to measure the conformance of an event log for a given process model. The approach quantifies conformance and provides intuitive diagnostics (skipped and inserted activities). Our technique has been implemented in the ProM 6 framework. Comparative evaluations show that the approach overcomes many of the limitations of existing conformance checking techniques.

Decomposed/Recomposed Alignments

Recomposing conformance: Closing the circle on decomposed alignment-based conformance checking in process mining

Lee, Wai Lam Jonathan, et al.

Information Sciences 466

2018

In the area of process mining, efficient conformance checking is one of the main challenges. Several process mining vendors are in the process of implementing conformance checking in their tools to allow the user to check how well a model fits an event log. Current approaches for conformance checking are monolithic and compute exact fitness values but this may take excessive time. Alternatively, one can use a decomposition approach, which runs much faster but does not always compute an exact fitness value. This paper introduces a recomposition approach that takes the best of both: it returns the exact fitness value by using the decomposition approach in an iterative manner. Results show that similar speedups can be obtained as by using the decomposition approach, but now the exact fitness value is guaranteed. Even better, this approach supports a configurable time-bound: “Give me the best fitness estimation you can find within 10 min.” In such a case, the approach returns an interval that contains the exact fitness value. If such an interval is sufficiently narrow, there is no need to spend unnecessary time to compute the exact value.

Log Skeleton

Log skeletons: A classification approach to process discovery

Verbeek, H. M. W., and R. Medeiros de Carvalho

arXiv preprint arXiv:1806.08247 (2018).

2018

To test the effectiveness of process discovery algorithms, a Process Discovery Contest (PDC) has been set up. This PDC uses a classification approach to measure this effectiveness: The better the discovered model can classify whether or not a new trace conforms to the event log, the better the discovery algorithm is supposed to be. Unfortunately, even the state-of-the-art fully-automated discovery algorithms score poorly on this classification. Even the best of these algorithms, the Inductive Miner, scored only 147 correct classified traces out of 200 traces on the PDC of 2017. This paper introduces the rule-based log skeleton model, which is closely related to the Declare constraint model, together with a way to classify traces using this model. This classification using log skeletons is shown to score better on the PDC of 2017 than state-of-the-art discovery algorithms: 194 out of 200. As a result, one can argue that the fully-automated algorithm to construct (or: discover) a log skeleton from an event log outperforms existing state-of-the-art fully-automated discovery algorithms.

Alignments Approximation

Alignment Approximation for Process Trees

Daniel Schuster, Sebastiaan J. van Zelst, et al.

Process Querying, Manipulation, and Intelligence Workshop

2020

Comparing observed behavior (event data generated during process executions) with modeled behavior (process models), is an essential step in process mining analyses. Alignments are the de-facto standard technique for calculating conformance checking statistics. However, the calculation of alignments is computationally complex since a shortest path problem must be solved on a state space which grows non-linearly with the size of the model and the observed behavior, leading to the well-known state space explosion problem. In this paper, we present a novel framework to approximate alignments on process trees by exploiting their hierarchical structure. Process trees are an important process model formalism used by state-of-the-art process mining techniques such as the inductive mining approaches. Our approach exploits structural properties of a given process tree and splits the alignment computation problem into smaller sub-problems. Finally, sub-results are composed to obtain an alignment. Our experiments show that our approach provides a good balance between accuracy and computation time.

Temporal Profile

Temporal Conformance Checking at Runtime based on Time-infused Process Models

Stertz, Florian, Jürgen Mangler, and Stefanie Rinderle-Ma

arXiv preprint arXiv:2008.07262

2020

Conformance checking quantifies the deviations between a set of traces in a given process log and a set of possible traces defined by a process model. Current approaches mostly focus on added or missing events. Lately, multi-perspective mining has provided means to check for conformance with time and resource constraints encoded as data elements. This paper presents an approach for quantifying temporal deviations in conformance checking based on infusing the input process model with a temporal profile. The temporal profile is calculated based on an associated process log considering task durations and the temporal distance between events. Moreover, a simple semantic annotation on tasks in the process model signifies their importance with respect to time. During runtime, deviations between an event stream and the process model with the temporal profile are quantified through a cost function for temporal deviations. The evaluation of the approach shows that the results for two real-world data sets from the financial and a manufacturing domain hold the promise to improve runtime process monitoring and control capabilities.

Extended Marking Equation

Efficiently computing alignments: using the extended marking equation

Boudewijn F. van Dongen

International Conference on Business Process Management

2018

Conformance checking is considered to be anything where observed behaviour needs to be related to already modelled behaviour. Fundamental to conformance checking are alignments which provide a precise relation between a sequence of activities observed in an event log and a execution sequence of a model. However, computing alignments is a complex task, both in time and memory, especially when models contain large amounts of parallelism. When computing alignments for Petri nets, (Integer) Linear Programming problems based on the marking equation are typically used to guide the search. Solving such problems is the main driver for the time complexity of alignments. In this paper, we adopt existing work in such a way that (a) the extended marking equation is used rather than the marking equation and (b) the number of linear problems that is solved is kept at a minimum. To do so, we exploit fundamental properties of the Petri nets and we show that we are able to compute optimal alignments for models for which this was previously infeasible. Furthermore, using a large collection of benchmark models, we empirically show that we improve on the state-of-the-art in terms of time and memory complexity.

Implemented Approaches

Process Discovery

Inductive Miner

Discovering block-structured process models from event logs-a constructive approach

Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst

International conference on applications and theory of Petri nets and concurrency

2013

Inductive Miner Infrequent

Discovering block-structured process models from event logs containing infrequent behaviour

Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst

International conference on business process management

2013

Inductive Miner Directly-Follows

Scalable process discovery and conformance checking

Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst

Software & Systems Modeling 17.2

2018

Heuristics Miner

Process mining with the heuristics miner-algorithm

Weijters, A. J. M. M., Wil MP van Der Aalst, and AK Alves De Medeiros

Technische Universiteit Eindhoven, Tech. Rep. WP 166

2006

Correlation Miner

Correlation miner: mining business process models and event correlations without case identifiers

Pourmirza, Shaya, Remco Dijkman, and Paul Grefen

International Journal of Cooperative Information Systems 26.02

2017

Prefix Tree

Skeletal algorithms in process mining

Przybylek, Michal R

Springer, Berlin, Heidelberg

2013

Conformance Checking

Token-based Replay

A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement

Alessandro Berti, Wil van der Aalst

ToPNoC (Transactions on Petri Nets and other models of Concurrency)

2020

Alignments

Conformance checking using cost-based fitness analysis

Adriansyah, Arya, Boudewijn F. van Dongen, and Wil van der Aalst

2011 IEEE 15th International Enterprise Distributed Object Computing Conference

2011

Decomposed/Recomposed Alignments

Recomposing conformance: Closing the circle on decomposed alignment-based conformance checking in process mining

Lee, Wai Lam Jonathan, et al.

Information Sciences 466

2018

Log Skeleton

Log skeletons: A classification approach to process discovery

Verbeek, H. M. W., and R. Medeiros de Carvalho

arXiv preprint arXiv:1806.08247 (2018).

2018

Alignments Approximation

Alignment Approximation for Process Trees

Daniel Schuster, Sebastiaan J. van Zelst, et al.

Process Querying, Manipulation, and Intelligence Workshop

2020

Temporal Profile

Temporal Conformance Checking at Runtime based on Time-infused Process Models

Stertz, Florian, Jürgen Mangler, and Stefanie Rinderle-Ma

arXiv preprint arXiv:2008.07262

2020

Extended Marking Equation

Efficiently computing alignments: using the extended marking equation

Boudewijn F. van Dongen

International Conference on Business Process Management

2018

Evaluation Log-Model

Fitness (Token-based Replay)

A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement

Alessandro Berti, Wil van der Aalst

ToPNoC (Transactions on Petri Nets and other models of Concurrency)

2020

Fitness (Alignments)

Quality dimensions in process discovery: The importance of fitness, precision, generalization and simplicity

Buijs, Joos CAM, Boudewijn F. van Dongen, and Wil MP van der Aalst

International Journal of Cooperative Information Systems 23.01

2014

ETConformance precision