PM4Py implements state-of-the-art process mining algorithms, i.e., 'fresh of the academical press'. Here, you can find an overview of the different algorithms implemented in PM4Py and the papers describing them.
Discovering block-structured process models from event logs-a constructive approach
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
International conference on applications and theory of Petri nets and concurrency
Process discovery is the problem of, given a log of observed behaviour, finding a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm guarantees to return a fitting model (i.e., able to reproduce all observed behaviour) that is sound (free of deadlocks and other anomalies) in finite time. We present an extensible framework to discover from any given log a set of block-structured process models that are sound and fit the observed behaviour. In addition we characterise the minimal information required in the log to rediscover a particular process model. We then provide a polynomial-time algorithm for discovering a sound, fitting, block-structured model from any given log; we give sufficient conditions on the log for which our algorithm returns a model that is language-equivalent to the process model underlying the log, including unseen behaviour. The technique is implemented in a prototypical tool.
Inductive Miner Infrequent
Discovering block-structured process models from event logs containing infrequent behaviour
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
International conference on business process management
Given an event log describing observed behaviour, process discovery aims to find a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm returns a sound model in all cases (free of deadlocks and other anomalies), handles infrequent behaviour well and finishes quickly. We present a technique able to cope with infrequent behaviour and large event logs, while ensuring soundness. The technique has been implemented in ProM and we compare the technique with existing approaches in terms of quality and performance.
Inductive Miner Directly-Follows
Scalable process discovery and conformance checking
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
Software & Systems Modeling 17.2
Considerable amounts of data, including process events, are collected and stored by organisations nowadays. Discovering a process model from such event data and verification of the quality of discovered models are important steps in process mining. Many discovery techniques have been proposed, but none of them combines scalability with strong quality guarantees. We would like such techniques to handle billions of events or thousands of activities, to produce sound models (without deadlocks and other anomalies), and to guarantee that the underlying process can be rediscovered when sufficient information is available. In this paper, we introduce a framework for process discovery that ensures these properties while passing over the log only once and introduce three algorithms using the framework. To measure the quality of discovered models for such large logs, we introduce a model–model and model–log comparison framework that applies a divide-and-conquer strategy to measure recall, fitness, and precision. We experimentally show that these discovery and measuring techniques sacrifice little compared to other algorithms, while gaining the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities on a standard computer.
Process mining with the heuristics miner-algorithm
Weijters, A. J. M. M., Wil MP van Der Aalst, and AK Alves De Medeiros
Technische Universiteit Eindhoven, Tech. Rep. WP 166
The basic idea of process mining is to extract knowledge from event logs recorded by an information system. Until recently, the information in these event logs was rarely used to analyze the underlying processes. Process mining aims at improving this by providing techniques and tools for discovering process, organizational, social, and performance information from event logs. Fuelled by the omnipresence of event logs in transactional information systems (cf. WFM, ERP, CRM, SCM, and B2B systems), process mining has become a vivid research area [1, 2]. In this paper we introduce the challenging process mining domain and discuss a heuristics driven process mining algorithm; the so-called “HeuristicsMiner” in detail. HeuristicsMiner is a practical applicable mining algorithm that can deal with noise, and can be used to express the main behavior (i.e. not all details and exceptions) registered in an event log. In the experimental section of this paper we introduce benchmark material (12.000 different event logs) and measurements by which the performance of process mining algorithms can be measured.
Correlation miner: mining business process models and event correlations without case identifiers
Pourmirza, Shaya, Remco Dijkman, and Paul Grefen
International Journal of Cooperative Information Systems 26.02
Process discovery algorithms aim to capture process models from event logs. These algorithms have been designed for logs in which the events that belong to the same case are related to each other — and to that case — by means of a unique case identifier. However, in service-oriented systems, these case identifiers are rarely stored beyond request-response pairs, which makes it hard to relate events that belong to the same case. This is known as the correlation challenge. This paper addresses the correlation challenge by introducing a technique, called the correlation miner, that facilitates discovery of business process models when events are not associated with a case identifier. It extends previous work on the correlation miner, by not only enabling the discovery of the process model, but also detecting which events belong to the same case. Experiments performed on both synthetic and real-world event logs show the applicability of the correlation miner. The resulting technique enables us to observe a service-oriented system and determine — with high accuracy — which request-response pairs sent by different communicating parties are related to each other.
A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement
Alessandro Berti, Wil van der Aalst
ToPNoC (Transactions on Petri Nets and other models of Concurrency)
Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., "token flooding") and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the PM4Py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics. %Moreover, a revision of an existing precision measure (ETConformance) will be proposed through integration with the token-based replayer.
Conformance checking using cost-based fitness analysis
Adriansyah, Arya, Boudewijn F. van Dongen, and Wil van der Aalst
2011 IEEE 15th International Enterprise Distributed Object Computing Conference
The growing complexity of processes in many organizations stimulates the adoption of business process analysis techniques. Typically, such techniques are based on process models and assume that the operational processes in reality conform to these models. However, experience shows that reality often deviates from hand-made models. Therefore, the problem of checking to what extent the operational process conforms to the process model is important for process management, process improvement, and compliance. In this paper, we present a robust replay analysis technique that is able to measure the conformance of an event log for a given process model. The approach quantifies conformance and provides intuitive diagnostics (skipped and inserted activities). Our technique has been implemented in the ProM 6 framework. Comparative evaluations show that the approach overcomes many of the limitations of existing conformance checking techniques.
Recomposing conformance: Closing the circle on decomposed alignment-based conformance checking in process mining
Lee, Wai Lam Jonathan, et al.
Information Sciences 466
In the area of process mining, efficient conformance checking is one of the main challenges. Several process mining vendors are in the process of implementing conformance checking in their tools to allow the user to check how well a model fits an event log. Current approaches for conformance checking are monolithic and compute exact fitness values but this may take excessive time. Alternatively, one can use a decomposition approach, which runs much faster but does not always compute an exact fitness value. This paper introduces a recomposition approach that takes the best of both: it returns the exact fitness value by using the decomposition approach in an iterative manner. Results show that similar speedups can be obtained as by using the decomposition approach, but now the exact fitness value is guaranteed. Even better, this approach supports a configurable time-bound: “Give me the best fitness estimation you can find within 10 min.” In such a case, the approach returns an interval that contains the exact fitness value. If such an interval is sufficiently narrow, there is no need to spend unnecessary time to compute the exact value.
Log skeletons: A classification approach to process discovery
Verbeek, H. M. W., and R. Medeiros de Carvalho
arXiv preprint arXiv:1806.08247 (2018).
To test the effectiveness of process discovery algorithms, a Process Discovery Contest (PDC) has been set up. This PDC uses a classification approach to measure this effectiveness: The better the discovered model can classify whether or not a new trace conforms to the event log, the better the discovery algorithm is supposed to be. Unfortunately, even the state-of-the-art fully-automated discovery algorithms score poorly on this classification. Even the best of these algorithms, the Inductive Miner, scored only 147 correct classified traces out of 200 traces on the PDC of 2017. This paper introduces the rule-based log skeleton model, which is closely related to the Declare constraint model, together with a way to classify traces using this model. This classification using log skeletons is shown to score better on the PDC of 2017 than state-of-the-art discovery algorithms: 194 out of 200. As a result, one can argue that the fully-automated algorithm to construct (or: discover) a log skeleton from an event log outperforms existing state-of-the-art fully-automated discovery algorithms.
Alignment Approximation for Process Trees
Daniel Schuster, Sebastiaan J. van Zelst, et al.
Process Querying, Manipulation, and Intelligence Workshop
Comparing observed behavior (event data generated during process executions) with modeled behavior (process models), is an essential step in process mining analyses. Alignments are the de-facto standard technique for calculating conformance checking statistics. However, the calculation of alignments is computationally complex since a shortest path problem must be solved on a state space which grows non-linearly with the size of the model and the observed behavior, leading to the well-known state space explosion problem. In this paper, we present a novel framework to approximate alignments on process trees by exploiting their hierarchical structure. Process trees are an important process model formalism used by state-of-the-art process mining techniques such as the inductive mining approaches. Our approach exploits structural properties of a given process tree and splits the alignment computation problem into smaller sub-problems. Finally, sub-results are composed to obtain an alignment. Our experiments show that our approach provides a good balance between accuracy and computation time.
Temporal Conformance Checking at Runtime based on Time-infused Process Models
Stertz, Florian, Jürgen Mangler, and Stefanie Rinderle-Ma
arXiv preprint arXiv:2008.07262
Conformance checking quantifies the deviations between a set of traces in a given process log and a set of possible traces defined by a process model. Current approaches mostly focus on added or missing events. Lately, multi-perspective mining has provided means to check for conformance with time and resource constraints encoded as data elements. This paper presents an approach for quantifying temporal deviations in conformance checking based on infusing the input process model with a temporal profile. The temporal profile is calculated based on an associated process log considering task durations and the temporal distance between events. Moreover, a simple semantic annotation on tasks in the process model signifies their importance with respect to time. During runtime, deviations between an event stream and the process model with the temporal profile are quantified through a cost function for temporal deviations. The evaluation of the approach shows that the results for two real-world data sets from the financial and a manufacturing domain hold the promise to improve runtime process monitoring and control capabilities.
Extended Marking Equation
Efficiently computing alignments: using the extended marking equation
Boudewijn F. van Dongen
International Conference on Business Process Management
Conformance checking is considered to be anything where observed behaviour needs to be related to already modelled behaviour. Fundamental to conformance checking are alignments which provide a precise relation between a sequence of activities observed in an event log and a execution sequence of a model. However, computing alignments is a complex task, both in time and memory, especially when models contain large amounts of parallelism. When computing alignments for Petri nets, (Integer) Linear Programming problems based on the marking equation are typically used to guide the search. Solving such problems is the main driver for the time complexity of alignments. In this paper, we adopt existing work in such a way that (a) the extended marking equation is used rather than the marking equation and (b) the number of linear problems that is solved is kept at a minimum. To do so, we exploit fundamental properties of the Petri nets and we show that we are able to compute optimal alignments for models for which this was previously infeasible. Furthermore, using a large collection of benchmark models, we empirically show that we improve on the state-of-the-art in terms of time and memory complexity.