PM4Py implements state-of-the-art process mining algorithms, i.e., 'fresh of the academical press'. Here, you can find an overview of the different algorithms implemented in PM4Py and the papers describing them.

Process Discovery

Inductive Miner


Discovering block-structured process models from event logs-a constructive approach
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
International conference on applications and theory of Petri nets and concurrency
2013
Process discovery is the problem of, given a log of observed behaviour, finding a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm guarantees to return a fitting model (i.e., able to reproduce all observed behaviour) that is sound (free of deadlocks and other anomalies) in finite time. We present an extensible framework to discover from any given log a set of block-structured process models that are sound and fit the observed behaviour. In addition we characterise the minimal information required in the log to rediscover a particular process model. We then provide a polynomial-time algorithm for discovering a sound, fitting, block-structured model from any given log; we give sufficient conditions on the log for which our algorithm returns a model that is language-equivalent to the process model underlying the log, including unseen behaviour. The technique is implemented in a prototypical tool.

Inductive Miner Infrequent


Discovering block-structured process models from event logs containing infrequent behaviour
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
International conference on business process management
2013
Given an event log describing observed behaviour, process discovery aims to find a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm returns a sound model in all cases (free of deadlocks and other anomalies), handles infrequent behaviour well and finishes quickly. We present a technique able to cope with infrequent behaviour and large event logs, while ensuring soundness. The technique has been implemented in ProM and we compare the technique with existing approaches in terms of quality and performance.

Inductive Miner Directly-Follows


Scalable process discovery and conformance checking
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
Software & Systems Modeling 17.2
2018
Considerable amounts of data, including process events, are collected and stored by organisations nowadays. Discovering a process model from such event data and verification of the quality of discovered models are important steps in process mining. Many discovery techniques have been proposed, but none of them combines scalability with strong quality guarantees. We would like such techniques to handle billions of events or thousands of activities, to produce sound models (without deadlocks and other anomalies), and to guarantee that the underlying process can be rediscovered when sufficient information is available. In this paper, we introduce a framework for process discovery that ensures these properties while passing over the log only once and introduce three algorithms using the framework. To measure the quality of discovered models for such large logs, we introduce a model–model and model–log comparison framework that applies a divide-and-conquer strategy to measure recall, fitness, and precision. We experimentally show that these discovery and measuring techniques sacrifice little compared to other algorithms, while gaining the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities on a standard computer.

Heuristics Miner


Process mining with the heuristics miner-algorithm
Weijters, A. J. M. M., Wil MP van Der Aalst, and AK Alves De Medeiros
Technische Universiteit Eindhoven, Tech. Rep. WP 166
2006
The basic idea of process mining is to extract knowledge from event logs recorded by an information system. Until recently, the information in these event logs was rarely used to analyze the underlying processes. Process mining aims at improving this by providing techniques and tools for discovering process, organizational, social, and performance information from event logs. Fuelled by the omnipresence of event logs in transactional information systems (cf. WFM, ERP, CRM, SCM, and B2B systems), process mining has become a vivid research area [1, 2]. In this paper we introduce the challenging process mining domain and discuss a heuristics driven process mining algorithm; the so-called “HeuristicsMiner” in detail. HeuristicsMiner is a practical applicable mining algorithm that can deal with noise, and can be used to express the main behavior (i.e. not all details and exceptions) registered in an event log. In the experimental section of this paper we introduce benchmark material (12.000 different event logs) and measurements by which the performance of process mining algorithms can be measured.

Correlation Miner


Correlation miner: mining business process models and event correlations without case identifiers
Pourmirza, Shaya, Remco Dijkman, and Paul Grefen
International Journal of Cooperative Information Systems 26.02
2017
Process discovery algorithms aim to capture process models from event logs. These algorithms have been designed for logs in which the events that belong to the same case are related to each other — and to that case — by means of a unique case identifier. However, in service-oriented systems, these case identifiers are rarely stored beyond request-response pairs, which makes it hard to relate events that belong to the same case. This is known as the correlation challenge. This paper addresses the correlation challenge by introducing a technique, called the correlation miner, that facilitates discovery of business process models when events are not associated with a case identifier. It extends previous work on the correlation miner, by not only enabling the discovery of the process model, but also detecting which events belong to the same case. Experiments performed on both synthetic and real-world event logs show the applicability of the correlation miner. The resulting technique enables us to observe a service-oriented system and determine — with high accuracy — which request-response pairs sent by different communicating parties are related to each other.

Conformance Checking

Token-based Replay


A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement
Alessandro Berti, Wil van der Aalst
ToPNoC (Transactions on Petri Nets and other models of Concurrency)
2020
Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., "token flooding") and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the PM4Py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics. %Moreover, a revision of an existing precision measure (ETConformance) will be proposed through integration with the token-based replayer.

Alignments


Conformance checking using cost-based fitness analysis
Adriansyah, Arya, Boudewijn F. van Dongen, and Wil van der Aalst
2011 IEEE 15th International Enterprise Distributed Object Computing Conference
2011
The growing complexity of processes in many organizations stimulates the adoption of business process analysis techniques. Typically, such techniques are based on process models and assume that the operational processes in reality conform to these models. However, experience shows that reality often deviates from hand-made models. Therefore, the problem of checking to what extent the operational process conforms to the process model is important for process management, process improvement, and compliance. In this paper, we present a robust replay analysis technique that is able to measure the conformance of an event log for a given process model. The approach quantifies conformance and provides intuitive diagnostics (skipped and inserted activities). Our technique has been implemented in the ProM 6 framework. Comparative evaluations show that the approach overcomes many of the limitations of existing conformance checking techniques.

Decomposed/Recomposed Alignments


Recomposing conformance: Closing the circle on decomposed alignment-based conformance checking in process mining
Lee, Wai Lam Jonathan, et al.
Information Sciences 466
2018
In the area of process mining, efficient conformance checking is one of the main challenges. Several process mining vendors are in the process of implementing conformance checking in their tools to allow the user to check how well a model fits an event log. Current approaches for conformance checking are monolithic and compute exact fitness values but this may take excessive time. Alternatively, one can use a decomposition approach, which runs much faster but does not always compute an exact fitness value. This paper introduces a recomposition approach that takes the best of both: it returns the exact fitness value by using the decomposition approach in an iterative manner. Results show that similar speedups can be obtained as by using the decomposition approach, but now the exact fitness value is guaranteed. Even better, this approach supports a configurable time-bound: “Give me the best fitness estimation you can find within 10 min.” In such a case, the approach returns an interval that contains the exact fitness value. If such an interval is sufficiently narrow, there is no need to spend unnecessary time to compute the exact value.

Log Skeleton


Log skeletons: A classification approach to process discovery
Verbeek, H. M. W., and R. Medeiros de Carvalho
arXiv preprint arXiv:1806.08247 (2018).
2018
To test the effectiveness of process discovery algorithms, a Process Discovery Contest (PDC) has been set up. This PDC uses a classification approach to measure this effectiveness: The better the discovered model can classify whether or not a new trace conforms to the event log, the better the discovery algorithm is supposed to be. Unfortunately, even the state-of-the-art fully-automated discovery algorithms score poorly on this classification. Even the best of these algorithms, the Inductive Miner, scored only 147 correct classified traces out of 200 traces on the PDC of 2017. This paper introduces the rule-based log skeleton model, which is closely related to the Declare constraint model, together with a way to classify traces using this model. This classification using log skeletons is shown to score better on the PDC of 2017 than state-of-the-art discovery algorithms: 194 out of 200. As a result, one can argue that the fully-automated algorithm to construct (or: discover) a log skeleton from an event log outperforms existing state-of-the-art fully-automated discovery algorithms.

Evaluation Log-Model

Fitness (Token-based Replay)


A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement
Alessandro Berti, Wil van der Aalst
ToPNoC (Transactions on Petri Nets and other models of Concurrency)
2020
Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., "token flooding") and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the PM4Py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics. %Moreover, a revision of an existing precision measure (ETConformance) will be proposed through integration with the token-based replayer.

Fitness (Alignments)


Quality dimensions in process discovery: The importance of fitness, precision, generalization and simplicity
Buijs, Joos CAM, Boudewijn F. van Dongen, and Wil MP van der Aalst
International Journal of Cooperative Information Systems 23.01
2014
Process discovery algorithms typically aim at discovering process models from event logs that best describe the recorded behavior. Often, the quality of a process discovery algorithm is measured by quantifying to what extent the resulting model can reproduce the behavior in the log, i.e. replay fitness. At the same time, there are other measures that compare a model with recorded behavior in terms of the precision of the model and the extent to which the model generalizes the behavior in the log. Furthermore, many measures exist to express the complexity of a model irrespective of the log. In this paper, we first discuss several quality dimensions related to process discovery. We further show that existing process discovery algorithms typically consider at most two out of the four main quality dimensions: replay fitness, precision, generalization and simplicity. Moreover, existing approaches cannot steer the discovery process based on user-defined weights for the four quality dimensions. This paper presents the ETM algorithm which allows the user to seamlessly steer the discovery process based on preferences with respect to the four quality dimensions. We show that all dimensions are important for process discovery. However, it only makes sense to consider precision, generalization and simplicity if the replay fitness is acceptable.

ETConformance precision


A fresh look at precision in process conformance
Muñoz-Gama, Jorge, and Josep Carmona
International Conference on Business Process Management
2010
Process Conformance is a crucial step in the area of Process Mining: the adequacy of a model derived from applying a discovery algorithm to a log must be certified before making further decisions that affect the system under consideration. Among the different conformance dimensions, in this paper we propose a novel measure for precision, based on the simple idea of counting these situations were the model deviates from the log. Moreover, a log-based traversal of the model that avoids inspecting its whole behavior is presented. Experimental results show a significant improvement when compared to current approaches for the same task. Finally, the detection of the shortest traces in the model that lead to discrepancies is presented.

Align-ETConformance precision


Measuring precision of modeled behavior
Adriansyah, Arya, et al.
Information systems and e-Business Management 13.1
2015
Conformance checking techniques compare observed behavior (i.e., event logs) with modeled behavior for a variety of reasons. For example, discrepancies between a normative process model and recorded behavior may point to fraud or inefficiencies. The resulting diagnostics can be used for auditing and compliance management. Conformance checking can also be used to judge a process model automatically discovered from an event log. Models discovered using different process discovery techniques need to be compared objectively. These examples illustrate just a few of the many use cases for aligning observed and modeled behavior. Thus far, most conformance checking techniques focused on replay fitness, i.e., the ability to reproduce the event log. However, it is easy to construct models that allow for lots of behavior (including the observed behavior) without being precise. In this paper, we propose a method to measure precision of process models, given their event logs by first aligning the logs to the models. This way, the measurement is not sensitive to non-fitting executions and more accurate values can be obtained for non-fitting logs. Furthermore, we introduce several variants of the technique to deal better with incomplete logs and reduce possible bias due to behavioral property of process models. The approach has been implemented in the ProM 6 framework and tested against both artificial and real-life cases. Experiments show that the approach is robust to noise and applicable to handle logs and models of real-life complexity.

Generalization


Quality dimensions in process discovery: The importance of fitness, precision, generalization and simplicity
Buijs, Joos CAM, Boudewijn F. van Dongen, and Wil MP van der Aalst
International Journal of Cooperative Information Systems 23.01
2014
Process discovery algorithms typically aim at discovering process models from event logs that best describe the recorded behavior. Often, the quality of a process discovery algorithm is measured by quantifying to what extent the resulting model can reproduce the behavior in the log, i.e. replay fitness. At the same time, there are other measures that compare a model with recorded behavior in terms of the precision of the model and the extent to which the model generalizes the behavior in the log. Furthermore, many measures exist to express the complexity of a model irrespective of the log. In this paper, we first discuss several quality dimensions related to process discovery. We further show that existing process discovery algorithms typically consider at most two out of the four main quality dimensions: replay fitness, precision, generalization and simplicity. Moreover, existing approaches cannot steer the discovery process based on user-defined weights for the four quality dimensions. This paper presents the ETM algorithm which allows the user to seamlessly steer the discovery process based on preferences with respect to the four quality dimensions. We show that all dimensions are important for process discovery. However, it only makes sense to consider precision, generalization and simplicity if the replay fitness is acceptable.

Simplicity


ProDiGen: Mining complete, precise and minimal structure process models with a genetic algorithm
Vázquez-Barreiros, Borja, Manuel Mucientes, and Manuel Lama
Information Sciences 294
2015
Process discovery techniques automatically extract the real workflow of a process by analyzing the events that are collected and stored in log files. Although in the last years several process discovery algorithms have been presented, none of them guarantees to find complete, precise and simple models for all the given logs. In this paper we address the problem of process discovery through a genetic algorithm with a new fitness function that takes into account both completeness, precision and simplicity. ProDiGen (Process Discovery through a Genetic algorithm) includes new definitions for precision and simplicity, and specific crossover and mutation operators. The proposal has been validated with 39 process models and several noise levels, giving a total of 111 different logs. We have compared our approach with the state of the art algorithms; non-parametric statistical tests show that our algorithm outperforms the other approaches, and that the difference is statistically significant.

Other Approaches

Process Tree Generation


PTandLogGenerator: A Generator for Artificial Event Data
Jouck, Toon, and Benoît Depaire
BPM (Demos) 1789
2016
The empirical analysis of process discovery algorithms has recently gained more attention. An important step within such an analysis is the acquisition of the appropriate test event data, i.e. event logs and reference models. This requires an implemented framework that supports the random and automated generation of event data based on user specifications. This paper presents a tool for generating artificial process trees and event logs that can be used to study and compare the empirical workings of process discovery algorithms. It extends current tools by giving users full control over an extensive set of process control-flow constructs included in the final models and event logs. Additionally, it is integrated within the ProM framework that offers a plethora of process discovery algorithms and evaluation metrics which are required during empirical analysis.

Decision Mining


A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs
De Leoni, Massimiliano, Wil MP van der Aalst, and Marcus Dees
Information Systems 56
2016
Process mining can be viewed as the missing link between model-based process analysis and data-oriented analysis techniques. Lion׳s share of process mining research has been focusing on process discovery (creating process models from raw data) and replay techniques to check conformance and analyze bottlenecks. These techniques have helped organizations to address compliance and performance problems. However, for a more refined analysis, it is essential to correlate different process characteristics. For example, do deviations from the normative process cause additional delays and costs? Are rejected cases handled differently in the initial phases of the process? What is the influence of a doctor׳s experience on treatment process? These and other questions may involve process characteristics related to different perspectives (control-flow, data-flow, time, organization, cost, compliance, etc.). Specific questions (e.g., predicting the remaining processing time) have been investigated before, but a generic approach was missing thus far. The proposed framework unifies a number of approaches for correlation analysis proposed in literature, proposing a general solution that can perform those analyses and many more. The approach has been implemented in ProM and combines process and data mining techniques. In this paper, we also demonstrate the applicability using a case study conducted with the UWV (Employee Insurance Agency), one of the largest “administrative factories” in The Netherlands.

Soudness Checking (WOFLAN)


Diagnosing workflow processes using Woflan
Verbeek, Henricus MW, Twan Basten, and Wil MP van der Aalst
The computer journal 44.4
2001
Workflow management technology promises a flexible solution for business-process support facilitating the easy creation of new business processes and modification of existing processes. Unfortunately, today's workflow products have no support for workflow verification. Errors made at design-time are not detected and result in very costly failures at run-time. This paper presents the verification tool Woflan. Woflan analyzes workflow process definitions downloaded from commercial workflow products using state-of-the-art Petri-net-based analysis techniques. This paper describes the functionality of Woflan emphasizing diagnostics to locate the source of a design error. Woflan is evaluated via two case studies, one involving 20 groups of students designing a complex workflow process and one involving an industrial workflow process designed by Staffware Benelux. The results are encouraging and show that Woflan guides the user in finding and correcting errors in the design of workflows.

Social Network Analysis


Discovering social networks from event logs
Van Der Aalst, Wil MP, Hajo A. Reijers, and Minseok Song
Computer Supported Cooperative Work
2005
Process mining techniques allow for the discovery of knowledge based on so-called “event logs”, i.e., a log recording the execution of activities in some business process. Many information systems provide such logs, e.g., most WFM, ERP, CRM, SCM, and B2B systems record transactions in a systematic way. Process mining techniques typically focus on performance and control-flow issues. However, event logs typically also log the performer, e.g., the person initiating or completing some activity. This paper focuses on mining social networks using this information. For example, it is possible to build a social network based on the hand-over of work from one performer to the next. By combining concepts from workflow management and social network analysis, it is possible to discover and analyze social networks. This paper defines metrics, presents a tool, and applies these to a real event log within the setting of a large Dutch organization.

Roles Discovery


Business models enhancement through discovery of roles.
Burattin, Andrea, Alessandro Sperduti, and Marco Veluscek
CIDM
2013
The term process mining refers to a family of techniques which does not involve only the control flow discovery. This work proposes an approach to enhance a business process models with information on roles. Specifically, the identification of those roles is based on the detection of handover of roles. In this paper we first discuss the general problem of grouping activities in roles, and then we propose metrics for the identification of handover. An approach to automatically enumerate all the significant partitionings and measures for the evaluation are proposed too. The entire contribution has been implemented in the ProM toolkit, and experiments on several logs are reported.