pm4py implements state-of-the-art process mining algorithms, i.e., 'fresh of the academical press'. Here, you can find an overview of the different algorithms implemented in pm4py and the papers describing them.
Process Discovery
Inductive Miner
Discovering block-structured process models from event logs-a constructive approach
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
International conference on applications and theory of Petri nets and concurrency
2013
Process discovery is the problem of, given a log of observed behaviour,
finding a process model that ‘best’ describes this behaviour. A large
variety of process discovery algorithms has been proposed. However, no
existing algorithm guarantees to return a fitting model (i.e., able to
reproduce all observed behaviour) that is sound (free of deadlocks and other
anomalies) in finite time. We present an extensible framework to discover
from any given log a set of block-structured process models that are sound
and fit the observed behaviour. In addition we characterise the minimal
information required in the log to rediscover a particular process model. We
then provide a polynomial-time algorithm for discovering a sound, fitting,
block-structured model from any given log; we give sufficient conditions on
the log for which our algorithm returns a model that is language-equivalent
to the process model underlying the log, including unseen behaviour. The
technique is implemented in a prototypical tool.
Inductive Miner Infrequent
Discovering block-structured process models from event logs containing infrequent behaviour
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
International conference on business process management
2013
Given an event log describing observed behaviour, process discovery aims to
find a process model that ‘best’ describes this behaviour. A large variety
of process discovery algorithms has been proposed. However, no existing
algorithm returns a sound model in all cases (free of deadlocks and other
anomalies), handles infrequent behaviour well and finishes quickly. We
present a technique able to cope with infrequent behaviour and large event
logs, while ensuring soundness. The technique has been implemented in ProM
and we compare the technique with existing approaches in terms of quality
and performance.
Inductive Miner Directly-Follows
Scalable process discovery and conformance checking
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
Software & Systems Modeling 17.2
2018
Considerable amounts of data, including process events, are collected and
stored by organisations nowadays. Discovering a process model from such
event data and verification of the quality of discovered models are
important steps in process mining. Many discovery techniques have been
proposed, but none of them combines scalability with strong quality
guarantees. We would like such techniques to handle billions of events or
thousands of activities, to produce sound models (without deadlocks and
other anomalies), and to guarantee that the underlying process can be
rediscovered when sufficient information is available. In this paper, we
introduce a framework for process discovery that ensures these properties
while passing over the log only once and introduce three algorithms using
the framework. To measure the quality of discovered models for such large
logs, we introduce a model–model and model–log comparison framework that
applies a divide-and-conquer strategy to measure recall, fitness, and
precision. We experimentally show that these discovery and measuring
techniques sacrifice little compared to other algorithms, while gaining the
ability to cope with event logs of 100,000,000 traces and processes of
10,000 activities on a standard computer.
Heuristics Miner
Process mining with the heuristics miner-algorithm
Weijters, A. J. M. M., Wil MP van Der Aalst, and AK Alves De Medeiros
Technische Universiteit Eindhoven, Tech. Rep. WP 166
2006
The basic idea of process mining is to extract knowledge
from event logs recorded by an information system. Until recently, the
information in these event logs was rarely used to analyze the underlying
processes. Process mining aims at improving this by providing techniques and
tools for discovering process, organizational, social, and performance
information from event logs. Fuelled by the omnipresence of
event logs in transactional information systems (cf. WFM, ERP, CRM,
SCM, and B2B systems), process mining has become a vivid research
area [1, 2]. In this paper we introduce the challenging process mining
domain and discuss a heuristics driven process mining algorithm; the
so-called “HeuristicsMiner” in detail. HeuristicsMiner is a practical
applicable mining algorithm that can deal with noise, and can be used to
express the main behavior (i.e. not all details and exceptions) registered
in an event log. In the experimental section of this paper we introduce
benchmark material (12.000 different event logs) and measurements by
which the performance of process mining algorithms can be measured.
Correlation Miner
Correlation miner: mining business process models and event correlations without case identifiers
Pourmirza, Shaya, Remco Dijkman, and Paul Grefen
International Journal of Cooperative Information Systems 26.02
2017
Process discovery algorithms aim to capture process models from event logs.
These algorithms have been designed for logs in which the events that belong
to the same case are related to each other — and to that case — by means of
a unique case identifier. However, in service-oriented systems, these case
identifiers are rarely stored beyond request-response pairs, which makes it
hard to relate events that belong to the same case. This is known as the
correlation challenge. This paper addresses the correlation challenge by
introducing a technique, called the correlation miner, that facilitates
discovery of business process models when events are not associated with a
case identifier. It extends previous work on the correlation miner, by not
only enabling the discovery of the process model, but also detecting which
events belong to the same case.
Experiments performed on both synthetic and real-world event logs show the
applicability of the correlation miner. The resulting technique enables us
to observe a service-oriented system and determine — with high accuracy —
which request-response pairs sent by different communicating parties are
related to each other.
Prefix Tree
Skeletal algorithms in process mining
Przybylek, Michal R
Springer, Berlin, Heidelberg
2013
This paper studies sample applications of skeletal algorithm to process mining and automata discovery. The basic idea behind the skeletal algorithm is to express a problem in terms of congruences on a structure, build an initial set of congruences, and improve it by taking limited unions/intersections, until a suitable condition is reached. Skeletal algorithms naturally arise in the context of process mining and automata discovery, where the skeleton is the “free” structure on initial data and a congruence corresponds to similarities in data. In such a context, skeletal algorithms come equipped with fitness functions measuring the complexity of a model. We examine two fitness functions for our sample problem — one based on Minimum Description Length Principle, and the other based on Bayesian Interpretation.
Conformance Checking
Token-based Replay
A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement
Alessandro Berti, Wil van der Aalst
ToPNoC (Transactions on Petri Nets and other models of Concurrency)
2020
Token-based replay used to be the standard way to conduct conformance
checking. With the uptake of more advanced techniques (e.g., alignment
based), token-based replay got abandoned. However, despite decomposition
approaches and heuristics to speed-up computation, the more advanced
conformance checking techniques have limited scalability, especially when
traces get longer and process models more complex. This paper presents an
improved token-based replay approach that is much faster and scalable.
Moreover, the approach provides more accurate diagnostics that avoid known
problems (e.g., "token flooding") and help to pinpoint compliance problems.
The novel token-based replay technique has been implemented in the pm4py
process mining library. We will show that the replay technique outperforms
state-of-the-art techniques in terms of speed and/or diagnostics. %Moreover,
a revision of an existing precision measure (ETConformance) will be proposed
through integration with the token-based replayer.
Alignments
Conformance checking using cost-based fitness analysis
Adriansyah, Arya, Boudewijn F. van Dongen, and Wil van der Aalst
2011 IEEE 15th International Enterprise Distributed Object Computing Conference
2011
The growing complexity of processes in many organizations stimulates the
adoption of business process analysis techniques. Typically, such techniques
are based on process models and assume that the operational processes in
reality conform to these models. However, experience shows that reality
often deviates from hand-made models. Therefore, the problem of checking to
what extent the operational process conforms to the process model is
important for process management, process improvement, and compliance. In
this paper, we present a robust replay analysis technique that is able to
measure the conformance of an event log for a given process model. The
approach quantifies conformance and provides intuitive diagnostics (skipped
and inserted activities). Our technique has been implemented in the ProM 6
framework. Comparative evaluations show that the approach overcomes many of
the limitations of existing conformance checking techniques.
Decomposed/Recomposed Alignments
Recomposing conformance: Closing the circle on decomposed alignment-based conformance checking in process mining
Lee, Wai Lam Jonathan, et al.
Information Sciences 466
2018
In the area of process mining, efficient conformance checking is one of the
main challenges. Several process mining vendors are in the process of
implementing conformance checking in their tools to allow the user to check
how well a model fits an event log. Current approaches for conformance
checking are monolithic and compute exact fitness values but this may take
excessive time. Alternatively, one can use a decomposition approach, which
runs much faster but does not always compute an exact fitness value.
This paper introduces a recomposition approach that takes the best of both:
it returns the exact fitness value by using the decomposition approach in an
iterative manner. Results show that similar speedups can be obtained as by
using the decomposition approach, but now the exact fitness value is
guaranteed. Even better, this approach supports a configurable time-bound:
“Give me the best fitness estimation you can find within 10 min.” In such a
case, the approach returns an interval that contains the exact fitness
value. If such an interval is sufficiently narrow, there is no need to spend
unnecessary time to compute the exact value.
Log Skeleton
Log skeletons: A classification approach to process discovery
Verbeek, H. M. W., and R. Medeiros de Carvalho
arXiv preprint arXiv:1806.08247 (2018).
2018
To test the effectiveness of process discovery algorithms, a Process
Discovery Contest (PDC) has been set up. This PDC uses a classification
approach to measure this effectiveness: The better the discovered model can
classify whether or not a new trace conforms to the event log, the better
the discovery algorithm is supposed to be. Unfortunately, even the
state-of-the-art fully-automated discovery algorithms score poorly on this
classification. Even the best of these algorithms, the Inductive Miner,
scored only 147 correct classified traces out of 200 traces on the PDC of
2017. This paper introduces the rule-based log skeleton model, which is
closely related to the Declare constraint model, together with a way to
classify traces using this model. This classification using log skeletons is
shown to score better on the PDC of 2017 than state-of-the-art discovery
algorithms: 194 out of 200. As a result, one can argue that the
fully-automated algorithm to construct (or: discover) a log skeleton from an
event log outperforms existing state-of-the-art fully-automated discovery
algorithms.
Alignments Approximation
Alignment Approximation for Process Trees
Daniel Schuster, Sebastiaan J. van Zelst, et al.
Process Querying, Manipulation, and Intelligence Workshop
2020
Comparing observed behavior (event data generated during
process executions) with modeled behavior (process models), is an essential step in process mining analyses. Alignments are the de-facto standard
technique for calculating conformance checking statistics. However, the
calculation of alignments is computationally complex since a shortest
path problem must be solved on a state space which grows non-linearly
with the size of the model and the observed behavior, leading to the
well-known state space explosion problem. In this paper, we present a
novel framework to approximate alignments on process trees by exploiting their hierarchical structure. Process trees are an important process
model formalism used by state-of-the-art process mining techniques such
as the inductive mining approaches. Our approach exploits structural
properties of a given process tree and splits the alignment computation
problem into smaller sub-problems. Finally, sub-results are composed to
obtain an alignment. Our experiments show that our approach provides
a good balance between accuracy and computation time.
Temporal Profile
Temporal Conformance Checking at Runtime based on Time-infused Process Models
Stertz, Florian, Jürgen Mangler, and Stefanie Rinderle-Ma
arXiv preprint arXiv:2008.07262
2020
Conformance checking quantifies the deviations between a set of traces in a given process log and a set of possible traces defined by a process model. Current approaches mostly focus on added or missing events. Lately, multi-perspective mining has provided means to check for conformance with time and resource constraints encoded as data elements. This paper presents an approach for quantifying temporal deviations in conformance checking based on infusing the input process model with a temporal profile. The temporal profile is calculated based on an associated process log considering task durations and the temporal distance between events. Moreover, a simple semantic annotation on tasks in the process model signifies their importance with respect to time. During runtime, deviations between an event stream and the process model with the temporal profile are quantified through a cost function for temporal deviations. The evaluation of the approach shows that the results for two real-world data sets from the financial and a manufacturing domain hold the promise to improve runtime process monitoring and control capabilities.
Extended Marking Equation
Efficiently computing alignments: using the extended marking equation
Boudewijn F. van Dongen
International Conference on Business Process Management
2018
Conformance checking is considered to be anything where observed behaviour needs to be related to already modelled behaviour. Fundamental to conformance checking are alignments which provide a precise relation between a sequence of activities observed in an event log and a execution sequence of a model. However, computing alignments is a complex task, both in time and memory, especially when models contain large amounts of parallelism.
When computing alignments for Petri nets, (Integer) Linear Programming problems based on the marking equation are typically used to guide the search. Solving such problems is the main driver for the time complexity of alignments. In this paper, we adopt existing work in such a way that (a) the extended marking equation is used rather than the marking equation and (b) the number of linear problems that is solved is kept at a minimum.
To do so, we exploit fundamental properties of the Petri nets and we show that we are able to compute optimal alignments for models for which this was previously infeasible. Furthermore, using a large collection of benchmark models, we empirically show that we improve on the state-of-the-art in terms of time and memory complexity.