pm4py implements state-of-the-art process mining algorithms, i.e., 'fresh of the academical press'. Here, you can find an overview of the different algorithms implemented in pm4py and the papers describing them.

Process Discovery

Inductive Miner


Discovering block-structured process models from event logs-a constructive approach
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
International conference on applications and theory of Petri nets and concurrency
2013
Process discovery is the problem of, given a log of observed behaviour, finding a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm guarantees to return a fitting model (i.e., able to reproduce all observed behaviour) that is sound (free of deadlocks and other anomalies) in finite time. We present an extensible framework to discover from any given log a set of block-structured process models that are sound and fit the observed behaviour. In addition we characterise the minimal information required in the log to rediscover a particular process model. We then provide a polynomial-time algorithm for discovering a sound, fitting, block-structured model from any given log; we give sufficient conditions on the log for which our algorithm returns a model that is language-equivalent to the process model underlying the log, including unseen behaviour. The technique is implemented in a prototypical tool.

Inductive Miner Infrequent


Discovering block-structured process models from event logs containing infrequent behaviour
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
International conference on business process management
2013
Given an event log describing observed behaviour, process discovery aims to find a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm returns a sound model in all cases (free of deadlocks and other anomalies), handles infrequent behaviour well and finishes quickly. We present a technique able to cope with infrequent behaviour and large event logs, while ensuring soundness. The technique has been implemented in ProM and we compare the technique with existing approaches in terms of quality and performance.

Inductive Miner Directly-Follows


Scalable process discovery and conformance checking
Leemans, Sander JJ, Dirk Fahland, and Wil MP van der Aalst
Software & Systems Modeling 17.2
2018
Considerable amounts of data, including process events, are collected and stored by organisations nowadays. Discovering a process model from such event data and verification of the quality of discovered models are important steps in process mining. Many discovery techniques have been proposed, but none of them combines scalability with strong quality guarantees. We would like such techniques to handle billions of events or thousands of activities, to produce sound models (without deadlocks and other anomalies), and to guarantee that the underlying process can be rediscovered when sufficient information is available. In this paper, we introduce a framework for process discovery that ensures these properties while passing over the log only once and introduce three algorithms using the framework. To measure the quality of discovered models for such large logs, we introduce a model–model and model–log comparison framework that applies a divide-and-conquer strategy to measure recall, fitness, and precision. We experimentally show that these discovery and measuring techniques sacrifice little compared to other algorithms, while gaining the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities on a standard computer.

Heuristics Miner


Process mining with the heuristics miner-algorithm
Weijters, A. J. M. M., Wil MP van Der Aalst, and AK Alves De Medeiros
Technische Universiteit Eindhoven, Tech. Rep. WP 166
2006
The basic idea of process mining is to extract knowledge from event logs recorded by an information system. Until recently, the information in these event logs was rarely used to analyze the underlying processes. Process mining aims at improving this by providing techniques and tools for discovering process, organizational, social, and performance information from event logs. Fuelled by the omnipresence of event logs in transactional information systems (cf. WFM, ERP, CRM, SCM, and B2B systems), process mining has become a vivid research area [1, 2]. In this paper we introduce the challenging process mining domain and discuss a heuristics driven process mining algorithm; the so-called “HeuristicsMiner” in detail. HeuristicsMiner is a practical applicable mining algorithm that can deal with noise, and can be used to express the main behavior (i.e. not all details and exceptions) registered in an event log. In the experimental section of this paper we introduce benchmark material (12.000 different event logs) and measurements by which the performance of process mining algorithms can be measured.

Correlation Miner


Correlation miner: mining business process models and event correlations without case identifiers
Pourmirza, Shaya, Remco Dijkman, and Paul Grefen
International Journal of Cooperative Information Systems 26.02
2017
Process discovery algorithms aim to capture process models from event logs. These algorithms have been designed for logs in which the events that belong to the same case are related to each other — and to that case — by means of a unique case identifier. However, in service-oriented systems, these case identifiers are rarely stored beyond request-response pairs, which makes it hard to relate events that belong to the same case. This is known as the correlation challenge. This paper addresses the correlation challenge by introducing a technique, called the correlation miner, that facilitates discovery of business process models when events are not associated with a case identifier. It extends previous work on the correlation miner, by not only enabling the discovery of the process model, but also detecting which events belong to the same case. Experiments performed on both synthetic and real-world event logs show the applicability of the correlation miner. The resulting technique enables us to observe a service-oriented system and determine — with high accuracy — which request-response pairs sent by different communicating parties are related to each other.

Prefix Tree


Skeletal algorithms in process mining
Przybylek, Michal R
Springer, Berlin, Heidelberg
2013
This paper studies sample applications of skeletal algorithm to process mining and automata discovery. The basic idea behind the skeletal algorithm is to express a problem in terms of congruences on a structure, build an initial set of congruences, and improve it by taking limited unions/intersections, until a suitable condition is reached. Skeletal algorithms naturally arise in the context of process mining and automata discovery, where the skeleton is the “free” structure on initial data and a congruence corresponds to similarities in data. In such a context, skeletal algorithms come equipped with fitness functions measuring the complexity of a model. We examine two fitness functions for our sample problem — one based on Minimum Description Length Principle, and the other based on Bayesian Interpretation.

Conformance Checking

Token-based Replay


A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement
Alessandro Berti, Wil van der Aalst
ToPNoC (Transactions on Petri Nets and other models of Concurrency)
2020
Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., "token flooding") and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the pm4py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics. %Moreover, a revision of an existing precision measure (ETConformance) will be proposed through integration with the token-based replayer.

Alignments


Conformance checking using cost-based fitness analysis
Adriansyah, Arya, Boudewijn F. van Dongen, and Wil van der Aalst
2011 IEEE 15th International Enterprise Distributed Object Computing Conference
2011
The growing complexity of processes in many organizations stimulates the adoption of business process analysis techniques. Typically, such techniques are based on process models and assume that the operational processes in reality conform to these models. However, experience shows that reality often deviates from hand-made models. Therefore, the problem of checking to what extent the operational process conforms to the process model is important for process management, process improvement, and compliance. In this paper, we present a robust replay analysis technique that is able to measure the conformance of an event log for a given process model. The approach quantifies conformance and provides intuitive diagnostics (skipped and inserted activities). Our technique has been implemented in the ProM 6 framework. Comparative evaluations show that the approach overcomes many of the limitations of existing conformance checking techniques.

Decomposed/Recomposed Alignments


Recomposing conformance: Closing the circle on decomposed alignment-based conformance checking in process mining
Lee, Wai Lam Jonathan, et al.
Information Sciences 466
2018
In the area of process mining, efficient conformance checking is one of the main challenges. Several process mining vendors are in the process of implementing conformance checking in their tools to allow the user to check how well a model fits an event log. Current approaches for conformance checking are monolithic and compute exact fitness values but this may take excessive time. Alternatively, one can use a decomposition approach, which runs much faster but does not always compute an exact fitness value. This paper introduces a recomposition approach that takes the best of both: it returns the exact fitness value by using the decomposition approach in an iterative manner. Results show that similar speedups can be obtained as by using the decomposition approach, but now the exact fitness value is guaranteed. Even better, this approach supports a configurable time-bound: “Give me the best fitness estimation you can find within 10 min.” In such a case, the approach returns an interval that contains the exact fitness value. If such an interval is sufficiently narrow, there is no need to spend unnecessary time to compute the exact value.

Log Skeleton


Log skeletons: A classification approach to process discovery
Verbeek, H. M. W., and R. Medeiros de Carvalho
arXiv preprint arXiv:1806.08247 (2018).
2018
To test the effectiveness of process discovery algorithms, a Process Discovery Contest (PDC) has been set up. This PDC uses a classification approach to measure this effectiveness: The better the discovered model can classify whether or not a new trace conforms to the event log, the better the discovery algorithm is supposed to be. Unfortunately, even the state-of-the-art fully-automated discovery algorithms score poorly on this classification. Even the best of these algorithms, the Inductive Miner, scored only 147 correct classified traces out of 200 traces on the PDC of 2017. This paper introduces the rule-based log skeleton model, which is closely related to the Declare constraint model, together with a way to classify traces using this model. This classification using log skeletons is shown to score better on the PDC of 2017 than state-of-the-art discovery algorithms: 194 out of 200. As a result, one can argue that the fully-automated algorithm to construct (or: discover) a log skeleton from an event log outperforms existing state-of-the-art fully-automated discovery algorithms.

Alignments Approximation


Alignment Approximation for Process Trees
Daniel Schuster, Sebastiaan J. van Zelst, et al.
Process Querying, Manipulation, and Intelligence Workshop
2020
Comparing observed behavior (event data generated during process executions) with modeled behavior (process models), is an essential step in process mining analyses. Alignments are the de-facto standard technique for calculating conformance checking statistics. However, the calculation of alignments is computationally complex since a shortest path problem must be solved on a state space which grows non-linearly with the size of the model and the observed behavior, leading to the well-known state space explosion problem. In this paper, we present a novel framework to approximate alignments on process trees by exploiting their hierarchical structure. Process trees are an important process model formalism used by state-of-the-art process mining techniques such as the inductive mining approaches. Our approach exploits structural properties of a given process tree and splits the alignment computation problem into smaller sub-problems. Finally, sub-results are composed to obtain an alignment. Our experiments show that our approach provides a good balance between accuracy and computation time.

Temporal Profile


Temporal Conformance Checking at Runtime based on Time-infused Process Models
Stertz, Florian, Jürgen Mangler, and Stefanie Rinderle-Ma
arXiv preprint arXiv:2008.07262
2020
Conformance checking quantifies the deviations between a set of traces in a given process log and a set of possible traces defined by a process model. Current approaches mostly focus on added or missing events. Lately, multi-perspective mining has provided means to check for conformance with time and resource constraints encoded as data elements. This paper presents an approach for quantifying temporal deviations in conformance checking based on infusing the input process model with a temporal profile. The temporal profile is calculated based on an associated process log considering task durations and the temporal distance between events. Moreover, a simple semantic annotation on tasks in the process model signifies their importance with respect to time. During runtime, deviations between an event stream and the process model with the temporal profile are quantified through a cost function for temporal deviations. The evaluation of the approach shows that the results for two real-world data sets from the financial and a manufacturing domain hold the promise to improve runtime process monitoring and control capabilities.

Extended Marking Equation


Efficiently computing alignments: using the extended marking equation
Boudewijn F. van Dongen
International Conference on Business Process Management
2018
Conformance checking is considered to be anything where observed behaviour needs to be related to already modelled behaviour. Fundamental to conformance checking are alignments which provide a precise relation between a sequence of activities observed in an event log and a execution sequence of a model. However, computing alignments is a complex task, both in time and memory, especially when models contain large amounts of parallelism. When computing alignments for Petri nets, (Integer) Linear Programming problems based on the marking equation are typically used to guide the search. Solving such problems is the main driver for the time complexity of alignments. In this paper, we adopt existing work in such a way that (a) the extended marking equation is used rather than the marking equation and (b) the number of linear problems that is solved is kept at a minimum. To do so, we exploit fundamental properties of the Petri nets and we show that we are able to compute optimal alignments for models for which this was previously infeasible. Furthermore, using a large collection of benchmark models, we empirically show that we improve on the state-of-the-art in terms of time and memory complexity.

Evaluation Log-Model

Fitness (Token-based Replay)


A Novel Token-Based Replay Technique to Speed Up Conformance Checking and Process Enhancement
Alessandro Berti, Wil van der Aalst
ToPNoC (Transactions on Petri Nets and other models of Concurrency)
2020
Token-based replay used to be the standard way to conduct conformance checking. With the uptake of more advanced techniques (e.g., alignment based), token-based replay got abandoned. However, despite decomposition approaches and heuristics to speed-up computation, the more advanced conformance checking techniques have limited scalability, especially when traces get longer and process models more complex. This paper presents an improved token-based replay approach that is much faster and scalable. Moreover, the approach provides more accurate diagnostics that avoid known problems (e.g., "token flooding") and help to pinpoint compliance problems. The novel token-based replay technique has been implemented in the pm4py process mining library. We will show that the replay technique outperforms state-of-the-art techniques in terms of speed and/or diagnostics. %Moreover, a revision of an existing precision measure (ETConformance) will be proposed through integration with the token-based replayer.

Fitness (Alignments)


Quality dimensions in process discovery: The importance of fitness, precision, generalization and simplicity
Buijs, Joos CAM, Boudewijn F. van Dongen, and Wil MP van der Aalst
International Journal of Cooperative Information Systems 23.01
2014
Process discovery algorithms typically aim at discovering process models from event logs that best describe the recorded behavior. Often, the quality of a process discovery algorithm is measured by quantifying to what extent the resulting model can reproduce the behavior in the log, i.e. replay fitness. At the same time, there are other measures that compare a model with recorded behavior in terms of the precision of the model and the extent to which the model generalizes the behavior in the log. Furthermore, many measures exist to express the complexity of a model irrespective of the log. In this paper, we first discuss several quality dimensions related to process discovery. We further show that existing process discovery algorithms typically consider at most two out of the four main quality dimensions: replay fitness, precision, generalization and simplicity. Moreover, existing approaches cannot steer the discovery process based on user-defined weights for the four quality dimensions. This paper presents the ETM algorithm which allows the user to seamlessly steer the discovery process based on preferences with respect to the four quality dimensions. We show that all dimensions are important for process discovery. However, it only makes sense to consider precision, generalization and simplicity if the replay fitness is acceptable.

ETConformance precision


A fresh look at precision in process conformance
Muñoz-Gama, Jorge, and Josep Carmona
International Conference on Business Process Management
2010
Process Conformance is a crucial step in the area of Process Mining: the adequacy of a model derived from applying a discovery algorithm to a log must be certified before making further decisions that affect the system under consideration. Among the different conformance dimensions, in this paper we propose a novel measure for precision, based on the simple idea of counting these situations were the model deviates from the log. Moreover, a log-based traversal of the model that avoids inspecting its whole behavior is presented. Experimental results show a significant improvement when compared to current approaches for the same task. Finally, the detection of the shortest traces in the model that lead to discrepancies is presented.

Align-ETConformance precision


Measuring precision of modeled behavior
Adriansyah, Arya, et al.
Information systems and e-Business Management 13.1
2015
Conformance checking techniques compare observed behavior (i.e., event logs) with modeled behavior for a variety of reasons. For example, discrepancies between a normative process model and recorded behavior may point to fraud or inefficiencies. The resulting diagnostics can be used for auditing and compliance management. Conformance checking can also be used to judge a process model automatically discovered from an event log. Models discovered using different process discovery techniques need to be compared objectively. These examples illustrate just a few of the many use cases for aligning observed and modeled behavior. Thus far, most conformance checking techniques focused on replay fitness, i.e., the ability to reproduce the event log. However, it is easy to construct models that allow for lots of behavior (including the observed behavior) without being precise. In this paper, we propose a method to measure precision of process models, given their event logs by first aligning the logs to the models. This way, the measurement is not sensitive to non-fitting executions and more accurate values can be obtained for non-fitting logs. Furthermore, we introduce several variants of the technique to deal better with incomplete logs and reduce possible bias due to behavioral property of process models. The approach has been implemented in the ProM 6 framework and tested against both artificial and real-life cases. Experiments show that the approach is robust to noise and applicable to handle logs and models of real-life complexity.

Generalization


Quality dimensions in process discovery: The importance of fitness, precision, generalization and simplicity
Buijs, Joos CAM, Boudewijn F. van Dongen, and Wil MP van der Aalst
International Journal of Cooperative Information Systems 23.01
2014
Process discovery algorithms typically aim at discovering process models from event logs that best describe the recorded behavior. Often, the quality of a process discovery algorithm is measured by quantifying to what extent the resulting model can reproduce the behavior in the log, i.e. replay fitness. At the same time, there are other measures that compare a model with recorded behavior in terms of the precision of the model and the extent to which the model generalizes the behavior in the log. Furthermore, many measures exist to express the complexity of a model irrespective of the log. In this paper, we first discuss several quality dimensions related to process discovery. We further show that existing process discovery algorithms typically consider at most two out of the four main quality dimensions: replay fitness, precision, generalization and simplicity. Moreover, existing approaches cannot steer the discovery process based on user-defined weights for the four quality dimensions. This paper presents the ETM algorithm which allows the user to seamlessly steer the discovery process based on preferences with respect to the four quality dimensions. We show that all dimensions are important for process discovery. However, it only makes sense to consider precision, generalization and simplicity if the replay fitness is acceptable.

Simplicity


ProDiGen: Mining complete, precise and minimal structure process models with a genetic algorithm
Vázquez-Barreiros, Borja, Manuel Mucientes, and Manuel Lama
Information Sciences 294
2015
Process discovery techniques automatically extract the real workflow of a process by analyzing the events that are collected and stored in log files. Although in the last years several process discovery algorithms have been presented, none of them guarantees to find complete, precise and simple models for all the given logs. In this paper we address the problem of process discovery through a genetic algorithm with a new fitness function that takes into account both completeness, precision and simplicity. ProDiGen (Process Discovery through a Genetic algorithm) includes new definitions for precision and simplicity, and specific crossover and mutation operators. The proposal has been validated with 39 process models and several noise levels, giving a total of 111 different logs. We have compared our approach with the state of the art algorithms; non-parametric statistical tests show that our algorithm outperforms the other approaches, and that the difference is statistically significant.

Anti-Alignments


Anti-alignments—measuring the precision of process models and event logs
Chatain, Thomas, Mathilde Boltenhagen, and Josep Carmona
Information Systems 98
2021
Processes are a crucial artefact in organizations, since they coordinate the execution of activities so that products and services are provided. The use of models to analyse the underlying processes is a well-known practice. However, due to the complexity and continuous evolution of their processes, organizations need an effective way of analysing the relation between processes and models. Conformance checking techniques asses the suitability of a process model in representing an underlying process, observed through a collection of real executions. One important metric in conformance checking is to asses the precision of the model with respect to the observed executions, i.e., characterize the ability of the model to produce behavior unrelated to the one observed. In this paper we present the notion of anti-alignment as a concept to help unveiling runs in the model that may deviate significantly from the observed behavior. Using anti-alignments, a new metric for precision is proposed. The proposed anti-alignment based precision metric satisfies most of the required axioms highlighted in a recent publication. Moreover, a complexity analysis of the problem of computing anti-alignments is provided, which sheds light into the practicability of using anti-alignment to estimate precision. Experiments are provided that witness the validity of the concepts introduced in this paper.

Multi-Alignments


Optimized SAT encoding of conformance checking artefacts
Boltenhagen, Mathilde, Thomas Chatain, and Josep Carmona
Computing 103.1
2021
Conformance checking is a growing discipline that aims at assisting organizations in monitoring their processes. On its core, conformance checking relies on the computation of particular artefacts which enable reasoning on the relation between observed and modeled behavior. It is widely acknowledge that the computation of these artifacts is the lion’s share of conformance checking techniques. This paper shows how important conformance artefacts like alignments, anti-alignments or multi-alignments, defined over the Levenshtein edit distance, can be efficiently computed by encoding the problem as an optimized SAT instance. From a general perspective, the work advocates for a unified family of techniques that can compute conformance artefacts in the same way. The implementation of the techniques presented in this paper show capabilities for dealing with both synthetic and real-life instances, which may open the door for a fresh way of applying conformance checking in the near future.

Object-Centric Process Mining

OC-DFG Discovery


OC-PM: analyzing object-centric event logs and process models
Alessandro Berti and Wil van der Aalst
International Journal on Software Tools for Technology Transfer
2022
Object-centric process mining is a novel branch of process mining that aims to analyze event data from mainstream information systems (such as SAP) more naturally, without being forced to form mutually exclusive groups of events with the specification of a case notion. The development of object-centric process mining is related to exploiting object-centric event logs, which includes exploring and filtering the behavior contained in the logs and constructing process models which can encode the behavior of different classes of objects and their interactions (which can be discovered from object-centric event logs). This paper aims to provide a broad look at the exploration and processing of object-centric event logs to discover information related to the lifecycle of the different objects composing the event log. Also, comprehensive tool support (OC-PM) implementing the proposed techniques is described in the paper.

OC-Petri Nets Discovery


Discovering Object-Centric Petri Nets
Wil van der Aalst and Alessandro Berti
Fundamenta informaticae
2020
Techniques to discover Petri nets from event data assume precisely one case identifier per event. These case identifiers are used to correlate events, and the resulting discovered Petri net aims to describe the life-cycle of individual cases. In reality, there is not one possible case notion, but multiple intertwined case notions. For example, events may refer to mixtures of orders, items, packages, customers, and products. A package may refer to multiple items, multiple products, one order, and one customer. Therefore, we need to assume that each event refers to a collection of objects, each having a type (instead of a single case identifier). Such object-centric event logs are closer to data in real-life information systems. From an object-centric event log, we want to discover an object-centric Petri net with places that correspond to object types and transitions that may consume and produce collections of objects of different types. Object-centric Petri nets visualize the complex relationships among objects from different types. This paper discusses a novel process discovery approach implemented in PM4Py. As will be demonstrated, it is indeed feasible to discover holistic process models that can be used to drill-down into specific viewpoints if needed

Other Approaches

Process Tree Generation


PTandLogGenerator: A Generator for Artificial Event Data
Jouck, Toon, and Benoît Depaire
BPM (Demos) 1789
2016
The empirical analysis of process discovery algorithms has recently gained more attention. An important step within such an analysis is the acquisition of the appropriate test event data, i.e. event logs and reference models. This requires an implemented framework that supports the random and automated generation of event data based on user specifications. This paper presents a tool for generating artificial process trees and event logs that can be used to study and compare the empirical workings of process discovery algorithms. It extends current tools by giving users full control over an extensive set of process control-flow constructs included in the final models and event logs. Additionally, it is integrated within the ProM framework that offers a plethora of process discovery algorithms and evaluation metrics which are required during empirical analysis.

Decision Mining


A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs
De Leoni, Massimiliano, Wil MP van der Aalst, and Marcus Dees
Information Systems 56
2016
Process mining can be viewed as the missing link between model-based process analysis and data-oriented analysis techniques. Lion׳s share of process mining research has been focusing on process discovery (creating process models from raw data) and replay techniques to check conformance and analyze bottlenecks. These techniques have helped organizations to address compliance and performance problems. However, for a more refined analysis, it is essential to correlate different process characteristics. For example, do deviations from the normative process cause additional delays and costs? Are rejected cases handled differently in the initial phases of the process? What is the influence of a doctor׳s experience on treatment process? These and other questions may involve process characteristics related to different perspectives (control-flow, data-flow, time, organization, cost, compliance, etc.). Specific questions (e.g., predicting the remaining processing time) have been investigated before, but a generic approach was missing thus far. The proposed framework unifies a number of approaches for correlation analysis proposed in literature, proposing a general solution that can perform those analyses and many more. The approach has been implemented in ProM and combines process and data mining techniques. In this paper, we also demonstrate the applicability using a case study conducted with the UWV (Employee Insurance Agency), one of the largest “administrative factories” in The Netherlands.

Soudness Checking (WOFLAN)


Diagnosing workflow processes using Woflan
Verbeek, Henricus MW, Twan Basten, and Wil MP van der Aalst
The computer journal 44.4
2001
Workflow management technology promises a flexible solution for business-process support facilitating the easy creation of new business processes and modification of existing processes. Unfortunately, today's workflow products have no support for workflow verification. Errors made at design-time are not detected and result in very costly failures at run-time. This paper presents the verification tool Woflan. Woflan analyzes workflow process definitions downloaded from commercial workflow products using state-of-the-art Petri-net-based analysis techniques. This paper describes the functionality of Woflan emphasizing diagnostics to locate the source of a design error. Woflan is evaluated via two case studies, one involving 20 groups of students designing a complex workflow process and one involving an industrial workflow process designed by Staffware Benelux. The results are encouraging and show that Woflan guides the user in finding and correcting errors in the design of workflows.

Trace Clustering


A Generic Framework for Attribute-Driven Hierarchical Trace Clustering
Sebastiaan J. van Zelst, Yukun Cao
Business Process Intelligence Workshop
2020
The execution of business processes often entails a specific process execution context, e.g. a customer, service or product. Often, the corresponding event data logs indicators of such an execution context, e.g., a customer type (bronze, silver, gold or platinum). Typically, variations in the execution of a process exist for the different execution context of a process. To gain a better understanding of the global process execution, it is interesting to study the behavioral (dis)similarity between different execution contexts of a process. However, in real business settings, the exact number of execution contexts might be too large to analyze manually. At the same time, current trace clustering techniques do not take process type information into account, i.e., they are solely behaviorally driven. Hence, in this paper, we present a hierarchical data-attribute-driven trace clustering framework that allows us to compare the behavior of different groups of traces. Our evaluation shows that the incorporation of data-attributes in trace clustering yields interesting novel process insights.

Performance Spectrum


The Performance Spectrum Miner: Visual Analytics for Fine-Grained Performance Analysis of Processes
Vadim Denisov, Elena Belkina, et al.
BPM (Industry)
2018
We present the Performance Spectrum Miner, a ProM plugin, which implements a new technique for fine-grained performance analysis of processes. The technique uses the performance spectrum as a simple model, that maps all observed flows between two process steps together regarding their performance over time, and can be applied for event logs of any kinds of processes. The tool computes and visualizes performance spectra of processes, and provides rich functionality to explore various performance aspects. The demo is aimed to make process mining practitioners familiar with the technique and tool, and engage them into applying this tool for solving their daily process mining-related tasks.

LTL Checker


Process Mining and Verification of Properties: An Approach Based on Temporal Logic
Wil M. P. van der Aalst, et al.
CoopIS
2005
Information systems are facing conflicting requirements. On the one hand, systems need to be adaptive and self-managing to deal with rapidly changing circumstances. On the other hand, legislation such as the Sarbanes-Oxley Act, is putting increasing demands on monitoring activities and processes. As processes and systems become more flexible, both the need for, and the complexity of monitoring increases. Our earlier work on process mining has primarily focused on process discovery, i.e., automatically constructing models describing knowledge extracted from event logs. In this paper, we focus on a different problem complementing process discovery. Given an event log and some property, we want to verify whether the property holds. For this purpose we have developed a new language based on Linear Temporal Logic (LTL) and we combine this with a standard XML format to store event logs. Given an event log and an LTL property, our LTL Checker verifies whether the observed behavior matches the (un)expected/(un)desirable behavior.

Workflow Nets


Translating Workflow Nets to Process Trees: An Algorithmic Approach
Sebastiaan J. van Zelst, Sander J. J. Leemans
Algorithms
2020
Since their introduction, process trees have been frequently used as a process modeling formalism in many process mining algorithms. A process tree is a (mathematical) tree-based model of a process, in which internal vertices represent behavioral control-flow relations and leaves represent process activities. Translation of a process tree into a sound workflow net is trivial. However, the reverse is not the case. Simultaneously, an algorithm that translates a WF-net into a process tree is of great interest, e.g., the explicit knowledge of the control-flow hierarchy in a WF-net allows one to reason on its behavior more easily. Hence, in this paper, we present such an algorithm, i.e., it detects whether a WF-net corresponds to a process tree, and, if so, constructs it. We prove that, if the algorithm finds a process tree, the language of the process tree is equal to the language of the original WF-net. The experiments conducted show that the algorithm’s corresponding implementation has a quadratic time complexity in the size of the WF-net. Furthermore, the experiments show strong evidence of process tree rediscoverability.

Social Network Analysis


Discovering social networks from event logs
Van Der Aalst, Wil MP, Hajo A. Reijers, and Minseok Song
Computer Supported Cooperative Work
2005
Process mining techniques allow for the discovery of knowledge based on so-called “event logs”, i.e., a log recording the execution of activities in some business process. Many information systems provide such logs, e.g., most WFM, ERP, CRM, SCM, and B2B systems record transactions in a systematic way. Process mining techniques typically focus on performance and control-flow issues. However, event logs typically also log the performer, e.g., the person initiating or completing some activity. This paper focuses on mining social networks using this information. For example, it is possible to build a social network based on the hand-over of work from one performer to the next. By combining concepts from workflow management and social network analysis, it is possible to discover and analyze social networks. This paper defines metrics, presents a tool, and applies these to a real event log within the setting of a large Dutch organization.

Roles Discovery


Business models enhancement through discovery of roles.
Burattin, Andrea, Alessandro Sperduti, and Marco Veluscek
CIDM
2013
The term process mining refers to a family of techniques which does not involve only the control flow discovery. This work proposes an approach to enhance a business process models with information on roles. Specifically, the identification of those roles is based on the detection of handover of roles. In this paper we first discuss the general problem of grouping activities in roles, and then we propose metrics for the identification of handover. An approach to automatically enumerate all the significant partitionings and measures for the evaluation are proposed too. The entire contribution has been implemented in the ProM toolkit, and experiments on several logs are reported.

Resource Profiles


Mining resource profiles from event logs
Anastasiia Pika, Michael Leyer et al.
ACM Trans. Manag. Inf. Syst.
2017
In most business processes, several activities need to be executed by human resources and cannot be fully automated. To evaluate resource performance and identify best practices as well as opportunities for improvement, managers need objective information about resource behaviors. Companies often use information systems to support their processes, and these systems record information about process execution in event logs. We present a framework for analyzing and evaluating resource behavior through mining such event logs. The framework provides (1) a method for extracting descriptive information about resource skills, utilization, preferences, productivity, and collaboration patterns; (2) a method for analyzing relationships between different resource behaviors and outcomes; and (3) a method for evaluating the overall resource productivity, tracking its changes over time, and comparing it to the productivity of other resources. To demonstrate the applicability of our framework, we apply it to analyze employee behavior in an Australian company and evaluate its usefulness by a survey among industry managers.

Organizational Mining


OrgMining 2.0: a Novel Framework for Organizational Model Mining from Event Logs
Jing Yang, Chun Ouyang, et al.
arXiv abs/2011.12445
2020
Providing appropriate structures around human resources can streamline operations and thus facilitate the competitiveness of an organization. To achieve this goal, modern organizations need to acquire an accurate and timely understanding of human resource grouping while faced with an ever-changing environment. The use of process mining offers a promising way to help address the need through utilizing event log data stored in information systems. By extracting knowledge about the actual behavior of resources participating in business processes from event logs, organizational models can be constructed, which facilitate the analysis of the de facto grouping of human resources relevant to process execution. Nevertheless, open research gaps remain to be addressed when applying the state-of-the-art process mining to analyze resource grouping. For one, the discovery of organizational models has only limited connections with the context of process execution. For another, a rigorous solution that evaluates organizational models against event log data is yet to be proposed. In this paper, we aim to tackle these research challenges by developing a novel framework built upon a richer definition of organizational models coupling resource grouping with process execution knowledge. By introducing notions of conformance checking for organizational models, the framework allows effective evaluation of organizational models, and therefore provides a foundation for analyzing and improving resource grouping based on event logs. We demonstrate the feasibility of this framework by proposing an approach underpinned by the framework for organizational model discovery, and also conduct experiments on real-life event logs to discover and evaluate organizational models.

Differential Privacy


PRIPEL: privacy-preserving event log publishing including contextual information
Fahrenkrog-Petersen, Stephan A., Han van der Aa, and Matthias Weidlich
International Conference on Business Process Management
2020
Event logs capture the execution of business processes in terms of executed activities and their execution context. Since logs contain potentially sensitive information about the individuals involved in the process, they should be pre-processed before being published to preserve the individuals’ privacy. However, existing techniques for such pre-processing are limited to a process’ control-flow and neglect contextual information, such as attribute values and durations. This thus precludes any form of process analysis that involves contextual factors. To bridge this gap, we introduce PRIPEL, a framework for privacy-aware event log publishing. Compared to existing work, PRIPEL takes a fundamentally different angle and ensures privacy on the level of individual cases instead of the complete log. This way, contextual information as well as the long tail process behaviour are preserved, which enables the application of a rich set of process analysis techniques. We demonstrate the feasibility of our framework in a case study with a real-world event log.

Batch Detection


Batch processing: definition and event log identification
Martin, Niels, et al.
SIMPDA
2015
A resource typically executes a particular activity on a series of cases. When a resource performs an activity on several cases simultaneously, (quasi-) sequentially or concurrently, this is referred to as batch processing. Given its influence on process performance, batch processing needs to be taken into account when modeling business processes for performance evaluation purposes. This paper suggests event logs as an information source to gain insight in batching behavior. It marks a first step towards a more thorough support for the retrieval of batch processing knowledge from an event log by (i) identifying different types of batch processing and (ii) briefly outlining a method to generate event log insights.

Temporal Feature Extraction


Supporting automatic system dynamics model generation for simulation in the context of process mining
Pourbafrani, Mahsa, Sebastiaan J. van Zelst, and Wil MP van der Aalst
BIS
2020
Using process mining actionable insights can be extracted from the event data stored in information systems. The analysis of event data may reveal many performance and compliance problems, and generate ideas for performance improvements. This is valuable, however, process mining techniques tend to be backward-looking and provide little support for forward-looking approaches since potential process interventions are not assessed. System dynamics complements process mining since it aims to capture the relationships between different factors at a higher abstraction level, and uses simulation to predict the effects of process improvement actions. In this paper, we propose a new approach to support the design of system dynamics models using event data. We extract a variety of performance parameters from the current state of the process using historical execution data and provide an interactive platform for modeling the performance metrics as system dynamics models. The generated models are able to answer “what-if” questions. Our experiments, using event logs including different relationships between parameters, show that our approach is able to generate valid models and uncover the underlying relations.