pm4py package

Subpackages

Submodules

pm4py.analysis module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.analysis.check_is_workflow_net(net: pm4py.objects.petri_net.obj.PetriNet) bool[source]

Checks if the input Petri net satisfies the WF-net conditions: 1. unique source place 2. unique sink place 3. every node is on a path from the source to the sink

Parameters

net – PetriNet

Returns

Return type

True iff the input net is a WF-net.

pm4py.analysis.check_soundness(petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) bool[source]

Check if a given Petri net is a sound WF-net. A Petri net is a WF-net iff:

  • it has a unique source place

  • it has a unique end place

  • every element in the WF-net is on a path from the source to the sink place

A WF-net is sound iff:
  • it contains no live-locks

  • it contains no deadlocks

  • we are able to always reach the final marking

For a formal definition of sound WF-net, consider: http://www.padsweb.rwth-aachen.de/wvdaalst/publications/p628.pdf

Parameters
  • petri_net – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

Returns

Soundness

Return type

boolean

pm4py.analysis.construct_synchronous_product_net(trace: pm4py.objects.log.obj.Trace, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking][source]

constructs the synchronous product net between a trace and a Petri net process model.

Parameters
  • trace – Trace of an event log

  • petri_net – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

Returns

  • sync_net – Synchronous product net

  • sync_im – Initial marking of the sync net

  • sync_fm – Final marking of the sync net

pm4py.analysis.insert_artificial_start_end(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Inserts the artificial start/end activities in an event log / Pandas dataframe

Parameters

log – Event log / Pandas dataframe

Returns

Event log / Pandas dataframe with artificial start / end activities

Return type

log

pm4py.analysis.solve_extended_marking_equation(trace: pm4py.objects.log.obj.Trace, sync_net: pm4py.objects.petri_net.obj.PetriNet, sync_im: pm4py.objects.petri_net.obj.Marking, sync_fm: pm4py.objects.petri_net.obj.Marking, split_points: Optional[List[int]] = None) float[source]

Gets an heuristics value (underestimation of the cost of an alignment) between a trace and a synchronous product net using the extended marking equation with the standard cost function (e.g. sync moves get cost equal to 0, invisible moves get cost equal to 1, other move on model / move on log get cost equal to 10000), with an optimal provisioning of the split points

Parameters
  • trace – Trace

  • sync_net – Synchronous product net

  • sync_im – Initial marking (of the sync net)

  • sync_fm – Final marking (of the sync net)

  • split_points – If specified, the indexes of the events of the trace to be used as split points. If not specified, the split points are identified automatically

Returns

Heuristics value calculated resolving the marking equation

Return type

h_value

pm4py.analysis.solve_marking_equation(petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking, cost_function: Optional[Dict[pm4py.objects.petri_net.obj.PetriNet.Transition, float]] = None) float[source]

Solves the marking equation of a Petri net. The marking equation is solved as an ILP problem. An optional transition-based cost function to minimize can be provided as well.

Parameters
  • petri_net – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

  • cost_function – optional cost function to use when solving the marking equation.

Returns

Heuristics value calculated resolving the marking equation

Return type

h_value

pm4py.conformance module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.conformance.check_is_fitting(*args, activity_key='concept:name')[source]

Checks if a trace object is fit against a process model

Parameters
  • trace – Trace object (trace / variant)

  • model – Model (process tree, Petri net, BPMN, …)

  • activity_key – Activity key (optional)

Returns

Boolean value (True if the trace fits; False if the trace does not)

Return type

is_fit

pm4py.conformance.conformance_alignments(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) List[Dict[str, Any]][source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. conformance_alignments is deprecated, use conformance_diagnostics_alignments

pm4py.conformance.conformance_diagnostics_alignments(log: pm4py.objects.log.obj.EventLog, *args, multi_processing: bool = False) List[Dict[str, Any]][source]

Apply the alignments algorithm between a log and a process model. The methods return the full alignment diagnostics.

Parameters
  • log – Event log

  • args – Specification of the process model

  • multi_processing – Boolean value that enables the multiprocessing (default: False)

Returns

A list of alignments for each trace of the log (in the same order as the traces in the event log)

Return type

aligned_traces

pm4py.conformance.conformance_diagnostics_footprints(*args) Union[List[Dict[str, Any]], Dict[str, Any]][source]

Provide conformance checking diagnostics using footprints

Parameters

args – Provided argument: - The first argument is supposed to be an event log (or the footprints discovered from the event log) - The other arguments are supposed to be the process model (or the footprints discovered from the process model)

Returns

Footprints of the event log / process model

Return type

fps

pm4py.conformance.conformance_diagnostics_token_based_replay(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) List[Dict[str, Any]][source]

Apply token-based replay for conformance checking analysis. The methods return the full token-based-replay diagnostics.

Parameters
  • log – Event log

  • petri_net – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

Returns

A list of replay results for each trace of the log (in the same order as the traces in the event log)

Return type

replay_results

pm4py.conformance.conformance_tbr(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) List[Dict[str, Any]][source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. conformance_tbr is deprecated, use conformance_diagnostics_token_based_replay

pm4py.conformance.evaluate_fitness_alignments(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) Dict[str, float][source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. evaluate_fitness_alignments is deprecated, use fitness_alignments

pm4py.conformance.evaluate_fitness_tbr(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) Dict[str, float][source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. evaluate_fitness_tbr is deprecated, use fitness_token_based_replay

pm4py.conformance.evaluate_precision_alignments(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) float[source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. evaluate_precision_alignments is deprecated, use precision_alignments

pm4py.conformance.evaluate_precision_tbr(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) float[source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. evaluate_precision_tbr is deprecated, use precision_token_based_replay

pm4py.conformance.fitness_alignments(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking, multi_processing: bool = False) Dict[str, float][source]

Calculates the fitness using alignments

Parameters
  • log – Event log

  • petri_net – Petri net object

  • initial_marking – Initial marking

  • final_marking – Final marking

  • multi_processing – Boolean value that enables the multiprocessing (default: False)

Returns

dictionary describing average fitness (key: average_trace_fitness) and the percentage of fitting traces (key: percentage_of_fitting_traces)

Return type

fitness_dictionary

pm4py.conformance.fitness_footprints(*args) Dict[str, float][source]

Calculates fitness using footprints

Parameters

args – Provided argument: - The first argument is supposed to be an event log (or the footprints discovered from the event log) - The other arguments are supposed to be the process model (or the footprints discovered from the process model)

Returns

A dictionary containing two keys: - perc_fit_traces => percentage of fit traces (over the log) - log_fitness => the fitness value over the log

Return type

fitness_dict

pm4py.conformance.fitness_token_based_replay(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) Dict[str, float][source]

Calculates the fitness using token-based replay. The fitness is calculated on a log-based level.

Parameters
  • log – Event log

  • petri_net – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

Returns

dictionary describing average fitness (key: average_trace_fitness) and the percentage of fitting traces (key: percentage_of_fitting_traces)

Return type

fitness_dictionary

pm4py.conformance.precision_alignments(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking, multi_processing: bool = False) float[source]

Calculates the precision of the model w.r.t. the event log using alignments

Parameters
  • log – Event log

  • petri_net – Petri net object

  • initial_marking – Initial marking

  • final_marking – Final marking

  • multi_processing – Boolean value that enables the multiprocessing (default: False)

Returns

float representing the precision value

Return type

precision

pm4py.conformance.precision_footprints(*args) float[source]

Calculates precision using footprints

Parameters

args – Provided argument: - The first argument is supposed to be an event log (or the footprints discovered from the event log) - The other arguments are supposed to be the process model (or the footprints discovered from the process model)

Returns

The precision of the process model (as a number between 0 and 1)

Return type

precision

pm4py.conformance.precision_token_based_replay(log: pm4py.objects.log.obj.EventLog, petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking) float[source]

Calculates the precision precision using token-based replay

Parameters
  • log – Event log

  • petri_net – Petri net object

  • initial_marking – Initial marking

  • final_marking – Final marking

Returns

float representing the precision value

Return type

precision

pm4py.convert module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.convert.convert_to_bpmn(*args: Union[Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking], pm4py.objects.process_tree.obj.ProcessTree]) pm4py.objects.bpmn.obj.BPMN[source]

Converts an object to a BPMN diagram

Parameters

*args – Object (process tree)

Returns

BPMN diagram

Return type

bpmn_diagram

pm4py.convert.convert_to_dataframe(obj: Union[pm4py.objects.log.obj.EventStream, pm4py.objects.log.obj.EventLog]) pandas.core.frame.DataFrame[source]

Converts a log object to a dataframe

Parameters

obj – Log object

Returns

Dataframe

Return type

df

pm4py.convert.convert_to_event_log(obj: Union[pandas.core.frame.DataFrame, pm4py.objects.log.obj.EventStream]) pm4py.objects.log.obj.EventLog[source]

Converts a log object to an event log

Parameters

obj – Log object

Returns

Event log object

Return type

log

pm4py.convert.convert_to_event_stream(obj: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) pm4py.objects.log.obj.EventStream[source]

Converts a log object to an event stream

Parameters

obj – Log object

Returns

Event stream object

Return type

stream

pm4py.convert.convert_to_petri_net(*args: Union[pm4py.objects.bpmn.obj.BPMN, pm4py.objects.process_tree.obj.ProcessTree, pm4py.objects.heuristics_net.obj.HeuristicsNet, dict]) Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking][source]

Converts an object to an (accepting) Petri net

Parameters

*args – Object (process tree, BPMN)

Returns

  • net – Petri net

  • im – Initial marking

  • fm – Final marking

pm4py.convert.convert_to_process_tree(*args: Union[Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking], pm4py.objects.bpmn.obj.BPMN]) pm4py.objects.process_tree.obj.ProcessTree[source]

Converts an object to a process tree

Parameters

*args – Object (Petri net, BPMN)

Returns

Process tree (when the model is block-structured)

Return type

tree

pm4py.discovery module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.discovery.derive_minimum_self_distance(log: Union[pandas.core.frame.DataFrame, pm4py.objects.log.obj.EventLog, pm4py.objects.log.obj.EventStream]) Dict[str, int][source]

This algorithm computes the minimum self-distance for each activity observed in an event log. The self distance of a in <a> is infinity, of a in <a,a> is 0, in <a,b,a> is 1, etc. The activity key ‘concept:name’ is used.

Parameters

log – event log (either pandas.DataFrame, EventLog or EventStream)

Returns

Return type

dict mapping an activity to its self-distance, if it exists, otherwise it is not part of the dict.

pm4py.discovery.discover_bpmn_inductive(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], noise_threshold: float = 0.0) pm4py.objects.bpmn.obj.BPMN[source]

Discovers a BPMN using the Inductive Miner algorithm

Parameters
  • log – Event log

  • noise_threshold – Noise threshold (default: 0.0)

Returns

BPMN diagram

Return type

bpmn_diagram

pm4py.discovery.discover_dfg(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Tuple[dict, dict, dict][source]

Discovers a DFG from a log

Parameters

log – Event log

Returns

  • dfg – DFG

  • start_activities – Start activities

  • end_activities – End activities

pm4py.discovery.discover_directly_follows_graph(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Tuple[dict, dict, dict][source]
pm4py.discovery.discover_eventually_follows_graph(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Dict[Tuple[str, str], int][source]

Gets the eventually follows graph from a log object

Parameters

log – Log object

Returns

Dictionary of tuples of activities that eventually follows each other; along with the number of occurrences

Return type

eventually_follows_graph

pm4py.discovery.discover_footprints(*args: Union[pm4py.objects.log.obj.EventLog, Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking], pm4py.objects.process_tree.obj.ProcessTree]) Union[List[Dict[str, Any]], Dict[str, Any]][source]

Discovers the footprints out of the provided event log / pocess model

Parameters

args – Event log / process model

pm4py.discovery.discover_heuristics_net(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], dependency_threshold: float = 0.5, and_threshold: float = 0.65, loop_two_threshold: float = 0.5) pm4py.objects.heuristics_net.obj.HeuristicsNet[source]

Discovers an heuristics net

Parameters
  • log – Event log

  • dependency_threshold – Dependency threshold (default: 0.5)

  • and_threshold – AND threshold (default: 0.65)

  • loop_two_threshold – Loop two threshold (default: 0.5)

Returns

Heuristics net

Return type

heu_net

pm4py.discovery.discover_oc_petri_net(ocel: pm4py.objects.ocel.obj.OCEL) Dict[str, Any][source]

Discovers an object-centric Petri net from the provided object-centric event log.

Reference paper: van der Aalst, Wil MP, and Alessandro Berti. “Discovering object-centric Petri nets.” Fundamenta informaticae 175.1-4 (2020): 1-40.

Parameters

ocel – Object-centric event log

Returns

Object-centric Petri net

Return type

ocpn

pm4py.discovery.discover_ocdfg(ocel: pm4py.objects.ocel.obj.OCEL, business_hours=False, worktiming=[7, 17], weekends=[6, 7]) Dict[str, Any][source]

Discovers an OC-DFG from an object-centric event log.

Reference paper: Berti, Alessandro, and Wil van der Aalst. “Extracting multiple viewpoint models from relational databases.” Data-Driven Process Discovery and Analysis. Springer, Cham, 2018. 24-51.

Parameters
  • ocel – Object-centric event log

  • business_hours – Boolean value that enables the usage of the business hours

  • worktiming – (if business hours are in use) work timing during the day (default: [7, 17])

  • weekends – (if business hours are in use) weekends (default: [6, 7])

Returns

Object-centric directly-follows graph

Return type

ocdfg

pm4py.discovery.discover_performance_dfg(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], business_hours: bool = False, worktiming: List[int] = [7, 17], weekends: List[int] = [6, 7], workcalendar=None) Tuple[dict, dict, dict][source]

Discovers a performance directly-follows graph from an event log

Parameters
  • log – Event log

  • business_hours – Enables/disables the computation based on the business hours (default: False)

  • worktiming – (If the business hours are enabled) The hour range in which the resources of the log are working (default: 7 to 17)

  • weekends – (If the business hours are enabled) The weekends days (default: Saturday (6), Sunday (7))

Returns

  • performance_dfg – Performance DFG

  • start_activities – Start activities

  • end_activities – End activities

pm4py.discovery.discover_petri_net_alpha(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking][source]

Discovers a Petri net using the Alpha Miner

Parameters

log – Event log

Returns

  • petri_net – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

pm4py.discovery.discover_petri_net_alpha_plus(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking][source]

Discovers a Petri net using the Alpha+ algorithm

Parameters

log – Event log

Returns

  • petri_net – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

pm4py.discovery.discover_petri_net_heuristics(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], dependency_threshold: float = 0.5, and_threshold: float = 0.65, loop_two_threshold: float = 0.5) Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking][source]

Discover a Petri net using the Heuristics Miner

Parameters
  • log – Event log

  • dependency_threshold – Dependency threshold (default: 0.5)

  • and_threshold – AND threshold (default: 0.65)

  • loop_two_threshold – Loop two threshold (default: 0.5)

Returns

  • petri_net – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

pm4py.discovery.discover_petri_net_inductive(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], noise_threshold: float = 0.0) Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking][source]

Discovers a Petri net using the IMDFc algorithm

Parameters
  • log – Event log

  • noise_threshold – Noise threshold (default: 0.0)

Returns

  • petri_net – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

pm4py.discovery.discover_process_tree_inductive(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], noise_threshold: float = 0.0) pm4py.objects.process_tree.obj.ProcessTree[source]

Discovers a process tree using the IM algorithm

Parameters
  • log – Event log

  • noise_threshold – Noise threshold (default: 0.0)

Returns

Process tree object

Return type

process_tree

pm4py.discovery.discover_tree_inductive(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], noise_threshold: float = 0.0) pm4py.objects.process_tree.obj.ProcessTree[source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. discover_tree_inductive is deprecated, use discover_process_tree_inductive

pm4py.filtering module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.filtering.filter_activities_rework(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], activity: str, min_occurrences: int = 2) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filters the event log, keeping the cases where the specified activity occurs at least min_occurrences times.

Parameters
  • log – Event log / Pandas dataframe

  • activity – Activity

  • min_occurrences – Minimum desidered number of occurrences

Returns

Log with cases having at least min_occurrences occurrences of the given activity

Return type

filtered_log

pm4py.filtering.filter_attribute_values(log, attribute_key, values, level='case', retain=True)[source]

Deprecated since version 2.1.4: This will be removed in 2.4.0. Filtering method will be removed due to fuzzy naming. Use: filter_event_attribute_values

pm4py.filtering.filter_between(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], act1: str, act2: str) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Finds all the sub-cases leading from an event with activity “act1” to an event with activity “act2” in the log, and returns a log containing only them.

Example:

Log A B C D E F A B E F C A B F C B C B E F C

act1 = B act2 = C

Returned sub-cases: B C (from the first case) B E F C (from the second case) B F C (from the third case) B C (from the third case) B E F C (from the third case)

Parameters
  • log – Event log / Pandas dataframe

  • act1 – Source activity

  • act2 – Target activity

Returns

Log containing all the subcases

Return type

filtered_log

pm4py.filtering.filter_case_performance(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], min_performance: float, max_performance: float) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filters the event log, keeping the cases having a duration (the timestamp of the last event minus the timestamp of the first event) included between min_performance and max_performance

Parameters
  • log – Event log / Pandas dataframe

  • min_performance – Minimum allowed case duration

  • max_performance – Maximum allowed case duration

Returns

Log with cases having a duration in the specified range

Return type

filtered_log

pm4py.filtering.filter_case_size(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], min_size: int, max_size: int) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filters the event log, keeping the cases having a length (number of events) included between min_size and max_size

Parameters
  • log – Event log / Pandas dataframe

  • min_size – Minimum allowed number of events

  • max_size – Maximum allowed number of events

Returns

Log with cases having the desidered number of events.

Return type

filtered_log

pm4py.filtering.filter_directly_follows_relation(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], relations: List[str], retain: bool = True) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Retain traces that contain any of the specified ‘directly follows’ relations. For example, if relations == [(‘a’,’b’),(‘a’,’c’)] and log [<a,b,c>,<a,c,b>,<a,d,b>] the resulting log will contain traces describing [<a,b,c>,<a,c,b>].

Parameters
  • log – Log object

  • relations – List of activity name pairs, which are allowed/forbidden paths

  • retain – Parameter that says whether the paths should be kept/removed

Returns

Filtered log object

Return type

filtered_log

pm4py.filtering.filter_end_activities(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], activities: Union[Set[str], List[str]], retain: bool = True) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filter cases having an end activity in the provided list

Parameters
  • log – Log object

  • activities – List of admitted end activities

  • retain – if True, we retain the traces containing the given activities, if false, we drop the traces

Returns

Filtered log object

Return type

filtered_log

pm4py.filtering.filter_event_attribute_values(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], attribute_key: str, values: Union[Set[str], List[str]], level: str = 'case', retain: bool = True) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filter a log object on the values of some event attribute

Parameters
  • log – Log object

  • attribute_key – Attribute to filter

  • values – Admitted (or forbidden) values

  • level – Specifies how the filter should be applied (‘case’ filters the cases where at least one occurrence happens, ‘event’ filter the events eventually trimming the cases)

  • retain – Specified if the values should be kept or removed

Returns

Filtered log object

Return type

filtered_log

pm4py.filtering.filter_eventually_follows_relation(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], relations: List[str], retain: bool = True) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Retain traces that contain any of the specified ‘eventually follows’ relations. For example, if relations == [(‘a’,’b’),(‘a’,’c’)] and log [<a,b,c>,<a,c,b>,<a,d,b>] the resulting log will contain traces describing [<a,b,c>,<a,c,b>,<a,d,b>].

Parameters
  • log – Log object

  • relations – List of activity name pairs, which are allowed/forbidden paths

  • retain – Parameter that says whether the paths should be kept/removed

Returns

Filtered log object

Return type

filtered_log

pm4py.filtering.filter_log_relative_occurrence_event_attribute(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], min_relative_stake: float, attribute_key: str = 'concept:name', level='cases') Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filters the event log keeping only the events having an attribute value which occurs: - in at least the specified (min_relative_stake) percentage of events, when level=”events” - in at least the specified (min_relative_stake) percentage of cases, when level=”cases”

Parameters
  • log – Event log / Pandas dataframe

  • min_relative_stake – Minimum percentage of cases (expressed as a number between 0 and 1) in which the attribute should occur.

  • attribute_key – The attribute to filter

  • level – The level of the filter (if level=”events”, then events / if level=”cases”, then cases)

Returns

Filtered event log

Return type

filtered_log

pm4py.filtering.filter_ocel_cc_object(ocel: pm4py.objects.ocel.obj.OCEL, object_id: str) pm4py.objects.ocel.obj.OCEL[source]

Returns the connected component of the object-centric event log to which the object with the provided identifier belongs.

Parameters
  • ocel – object-centric event log

  • object_id – object identifier

Return type

OCEL

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_cc_object(ocel, 'order1')
pm4py.filtering.filter_ocel_end_events_per_object_type(ocel: pm4py.objects.ocel.obj.OCEL, object_type: str) pm4py.objects.ocel.obj.OCEL[source]

Filters the events in which an object for the given object type terminates its lifecycle. (E.g. an event with activity “Pay Order” might terminate an order).

Parameters
  • ocel – Object-centric event log

  • object_type – Object type to consider

Returns

Filtered object-centric event log

Return type

filtered_ocel

pm4py.filtering.filter_ocel_event_attribute(ocel: pm4py.objects.ocel.obj.OCEL, attribute_key: str, attribute_values: Collection[Any], positive: bool = True) pm4py.objects.ocel.obj.OCEL[source]

Filters the object-centric event log on the provided event attributes values

Parameters
  • ocel – Object-centric event log

  • attribute_key – Attribute at the event level

  • attribute_values – Attribute values

  • positive – Decides if the values should be kept (positive=True) or removed (positive=False)

Returns

Filtered object-centric event log

Return type

filtered_ocel

pm4py.filtering.filter_ocel_events(ocel: pm4py.objects.ocel.obj.OCEL, event_identifiers: Collection[str], positive: bool = True) pm4py.objects.ocel.obj.OCEL[source]

Filters the event identifiers of an object-centric event log.

Parameters
  • ocel – object-centric event log

  • event_identifiers – event identifiers to keep/remove

  • positive – boolean value (True=keep, False=remove)

Return type

OCEL

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_events(ocel, ['e1'])
pm4py.filtering.filter_ocel_events_timestamp(ocel: pm4py.objects.ocel.obj.OCEL, min_timest: Union[datetime.datetime, str], max_timest: Union[datetime.datetime, str], timestamp_key: str = 'ocel:timestamp') pm4py.objects.ocel.obj.OCEL[source]

Filters the object-centric event log keeping events in the provided timestamp range

Parameters
  • ocel – Object-centric event log

  • min_timest – Left extreme of the allowed timestamp interval (provided in the format: YYYY-mm-dd HH:MM:SS)

  • max_timest – Right extreme of the allowed timestamp interval (provided in the format: YYYY-mm-dd HH:MM:SS)

  • timestamp_key – The attribute to use as timestamp (default: ocel:timestamp)

Returns

Filtered object-centric event log

Return type

filtered_ocel

pm4py.filtering.filter_ocel_object_attribute(ocel: pm4py.objects.ocel.obj.OCEL, attribute_key: str, attribute_values: Collection[Any], positive: bool = True) pm4py.objects.ocel.obj.OCEL[source]

Filters the object-centric event log on the provided object attributes values

Parameters
  • ocel – Object-centric event log

  • attribute_key – Attribute at the event level

  • attribute_values – Attribute values

  • positive – Decides if the values should be kept (positive=True) or removed (positive=False)

Returns

Filtered object-centric event log

Return type

filtered_ocel

pm4py.filtering.filter_ocel_object_per_type_count(ocel: pm4py.objects.ocel.obj.OCEL, min_num_obj_type: Dict[str, int]) pm4py.objects.ocel.obj.OCEL[source]

Filters the events of the object-centric logs which are related to at least the specified amount of objects per type.

E.g. pm4py.filter_object_per_type_count(ocel, {“order”: 1, “element”: 2})

Would keep the following events:

ocel:eid ocel:timestamp ocel:activity ocel:type:element ocel:type:order

0 e1 1980-01-01 Create Order [i4, i1, i3, i2] [o1] 1 e11 1981-01-01 Create Order [i6, i5] [o2] 2 e14 1981-01-04 Create Order [i8, i7] [o3]

Parameters
  • ocel – Object-centric event log

  • min_num_obj_type – Minimum number of objects per type

Returns

Filtered object-centric event log

Return type

filtered_event_log

pm4py.filtering.filter_ocel_object_types(ocel: pm4py.objects.ocel.obj.OCEL, obj_types: Collection[str], positive: bool = True) pm4py.objects.ocel.obj.OCEL[source]

Filters the object types of an object-centric event log.

Parameters
  • ocel – object-centric event log

  • obj_types – object types to keep/remove

  • positive – boolean value (True=keep, False=remove)

Return type

OCEL

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_object_types(ocel, ['order'])
pm4py.filtering.filter_ocel_object_types_allowed_activities(ocel: pm4py.objects.ocel.obj.OCEL, correspondence_dict: Dict[str, Collection[str]]) pm4py.objects.ocel.obj.OCEL[source]

Filters an object-centric event log keeping only the specified object types with the specified activity set (filters out the rest).

Parameters
  • ocel – Object-centric event log

  • correspondence_dict – Dictionary containing, for every object type of interest, a collection of allowed activities. Example:

    {“order”: [“Create Order”], “element”: [“Create Order”, “Create Delivery”]}

    Keeps only the object types “order” and “element”. For the “order” object type, only the activity “Create Order” is kept. For the “element” object type, only the activities “Create Order” and “Create Delivery” are kept.

Returns

Filtered object-centric event log

Return type

filtered_ocel

pm4py.filtering.filter_ocel_objects(ocel: pm4py.objects.ocel.obj.OCEL, object_identifiers: Collection[str], positive: bool = True, level: int = 1) pm4py.objects.ocel.obj.OCEL[source]

Filters the object identifiers of an object-centric event log.

Parameters
  • ocel – object-centric event log

  • object_identifiers – object identifiers to keep/remove

  • positive – boolean value (True=keep, False=remove)

  • level – recursively expand the set of object identifiers until the specified level

Return type

OCEL

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_objects(ocel, ['o1'], level=1)
pm4py.filtering.filter_ocel_start_events_per_object_type(ocel: pm4py.objects.ocel.obj.OCEL, object_type: str) pm4py.objects.ocel.obj.OCEL[source]

Filters the events in which a new object for the given object type is spawn. (E.g. an event with activity “Create Order” might spawn new orders).

Parameters
  • ocel – Object-centric event log

  • object_type – Object type to consider

Returns

Filtered object-centric event log

Return type

filtered_ocel

pm4py.filtering.filter_paths(log, allowed_paths, retain=True)[source]

Deprecated since version 2.1.3.1: This will be removed in 2.4.0. Use filter_directly_follows_relation

pm4py.filtering.filter_paths_performance(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], path: Tuple[str, str], min_performance: float, max_performance: float, keep=True) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filters the event log, either: - (keep=True) keeping the cases having the specified path (tuple of 2 activities) with a duration included between min_performance and max_performance - (keep=False) discarding the cases having the specified path with a duration included between min_performance and max_performance

Parameters
  • log – Event log

  • path – Tuple of two activities (source_activity, target_activity)

  • min_performance – Minimum allowed performance (of the path)

  • max_performance – Maximum allowed performance (of the path)

  • keep – Keep/discard the cases having the specified path with a duration included between min_performance and max_performance

Returns

Filtered log with the desidered behavior

Return type

filtered_log

pm4py.filtering.filter_prefixes(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], activity: str, strict=True, first_or_last='first')[source]

Filters the log, keeping the prefixes to a given activity. E.g., for a log with traces:

A,B,C,D A,B,Z,A,B,C,D A,B,C,D,C,E,C,F

The prefixes to “C” are respectively:

A,B A,B,Z,A,B A,B

Parameters
  • log – Event log / Pandas dataframe

  • activity – Target activity of the filter

  • strict – Applies the filter strictly (cuts the occurrences of the selected activity).

  • first_or_last – Decides if the first or last occurrence of an activity should be selected as baseline for the filter.

Returns

Filtered log / dataframe

Return type

filtered_log

pm4py.filtering.filter_start_activities(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], activities: Union[Set[str], List[str]], retain: bool = True) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filter cases having a start activity in the provided list

Parameters
  • log – Log object

  • activities – List start activities

  • retain – if True, we retain the traces containing the given activities, if false, we drop the traces

Returns

Filtered log object

Return type

filtered_log

pm4py.filtering.filter_suffixes(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], activity: str, strict=True, first_or_last='first')[source]

Filters the log, keeping the suffixes from a given activity. E.g., for a log with traces:

A,B,C,D A,B,Z,A,B,C,D A,B,C,D,C,E,C,F

The suffixes from “C” are respectively:

D D D,C,E,C,F

Parameters
  • log – Event log / Pandas dataframe

  • activity – Target activity of the filter

  • strict – Applies the filter strictly (cuts the occurrences of the selected activity).

  • first_or_last – Decides if the first or last occurrence of an activity should be selected as baseline for the filter.

Returns

Filtered log / dataframe

Return type

filtered_log

pm4py.filtering.filter_time_range(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], dt1: str, dt2: str, mode='events') Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filter a log on a time interval

Parameters
  • log – Log object

  • dt1 – Left extreme of the interval

  • dt2 – Right extreme of the interval

  • mode – Modality of filtering (events, traces_contained, traces_intersecting) events: any event that fits the time frame is retained traces_contained: any trace completely contained in the timeframe is retained traces_intersecting: any trace intersecting with the time-frame is retained.

Returns

Filtered log

Return type

filtered_log

pm4py.filtering.filter_trace_attribute(log, attribute_key, values, retain=True)[source]

Deprecated since version 2.1.4: This will be removed in 2.4.0. Filtering method will be removed due to fuzzy naming. Use: filter_event_attribute_values

pm4py.filtering.filter_trace_attribute_values(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], attribute_key: str, values: Union[Set[str], List[str]], retain: bool = True) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filter a log on the values of a trace attribute

Parameters
  • log – Event log

  • attribute_key – Attribute to filter

  • values – Values to filter (list of)

  • retain – Boolean value (keep/discard matching traces)

Returns

Filtered event log

Return type

filtered_log

pm4py.filtering.filter_variants(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], variants: Union[Set[str], List[str]], retain: bool = True) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filter a log on a specified set of variants

Parameters
  • log – Event log

  • variants – collection of variants to filter; A variant should be specified as a list of activity names, e.g., [‘a’,’b’,’c’]

  • retain – boolean; if True all traces conforming to the specified variants are retained; if False, all those traces are removed

Returns

Filtered log object

Return type

filtered_log

pm4py.filtering.filter_variants_by_coverage_percentage(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], min_coverage_percentage: float) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filters the variants of the log by a coverage percentage (e.g., if min_coverage_percentage=0.4, and we have a log with 1000 cases, of which 500 of the variant 1, 400 of the variant 2, and 100 of the variant 3, the filter keeps only the traces of variant 1 and variant 2).

Parameters
  • log – Event log

  • min_coverage_percentage – Minimum allowed percentage of coverage

  • parameters – Parameters

Returns

Filtered log

Return type

filtered_log

pm4py.filtering.filter_variants_percentage(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], threshold: float = 0.8) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Filter a log on the percentage of variants

Parameters
  • log – Event log

  • threshold – Percentage (scale 0.1) of admitted variants

Returns

Filtered log object

Return type

filtered_log

Deprecated since version 2.1.3.1: This will be removed in 2.4.0. Filtering method will be removed due to fuzzy interpretation of the threshold. Will be replaced with two new functions filter_variants_top_k and filter_variants_relative_frequency

pm4py.filtering.filter_variants_top_k(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], k: int) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Keeps the top-k variants of the log

Parameters
  • log – Event log

  • k – Number of variants that should be kept

  • parameters – Parameters

Returns

Filtered log

Return type

filtered_log

pm4py.hof module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.hof.filter_log(f: Callable[[Any], bool], log: pm4py.objects.log.obj.EventLog) Union[pm4py.objects.log.obj.EventLog, pm4py.objects.log.obj.EventStream][source]

Filters the log according to a given (lambda) function.

Parameters
  • f – function that specifies the filter criterion, may be a lambda

  • log – event log; either EventLog or EventStream Object

Returns

filtered event log if object provided is correct; original log if not correct

Return type

log

pm4py.hof.filter_trace(f: Callable[[Any], bool], trace: pm4py.objects.log.obj.Trace) pm4py.objects.log.obj.Trace[source]

Filters the trace according to a given (lambda) function.

Parameters
  • f – function that specifies the filter criterion, may be a lambda

  • trace – trace; PM4Py trace object

Returns

filtered trace if object provided is correct; original log if not correct

Return type

trace

pm4py.hof.sort_log(log: pm4py.objects.log.obj.EventLog, key, reverse: bool = False) Union[pm4py.objects.log.obj.EventLog, pm4py.objects.log.obj.EventStream][source]

Sorts the event log according to a given key.

Parameters
  • log – event log object; either EventLog or EventStream

  • key – sorting key

  • reverse – indicates whether sorting should be reversed or not

Returns

Return type

sorted event log if object provided is correct; original log if not correct

pm4py.hof.sort_trace(trace: pm4py.objects.log.obj.Trace, key, reverse: bool = False) pm4py.objects.log.obj.Trace[source]
Parameters
  • trace – input trace

  • key – sorting key

  • reverse – indicate whether sorting should be reversed (default False)

Returns

Return type

sorted trace if object provided is correct; original log if not correct

pm4py.meta module

Process Mining for Python (PM4Py)

pm4py.ml module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.ml.get_prefixes_from_log(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], length: int) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

Gets the prefixes of a log of a given length

Parameters
  • log – Event log / Pandas dataframe

  • length – Length

Returns

Log contain the prefixes: - if a trace has lower or identical length, it is included as-is - if a trace has greater length, it is cut

Return type

prefix_log

pm4py.ml.split_train_test(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], train_percentage: float = 0.8) Union[Tuple[pm4py.objects.log.obj.EventLog, pm4py.objects.log.obj.EventLog], Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]][source]

Split an event log in a training log and a test log (for machine learning purposes)

Parameters
  • log – Event log / Pandas dataframe

  • train_percentage – Fraction of traces to be included in the training log (from 0.0 to 1.0)

Returns

  • training_log – Training event log

  • test_log – Test event log

pm4py.ocel module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.ocel.ocel_flattening(ocel: pm4py.objects.ocel.obj.OCEL, object_type: str) pandas.core.frame.DataFrame[source]

Flattens the object-centric event log to a traditional event log with the choice of an object type. In the flattened log, the objects of a given object type are the cases, and each case contains the set of events related to the object.

Parameters
  • ocel – Object-centric event log

  • object_type – Object type

Returns

Flattened log in the form of a Pandas dataframe

Return type

dataframe

pm4py.ocel.ocel_get_attribute_names(ocel: pm4py.objects.ocel.obj.OCEL) List[str][source]

Gets the list of attributes at the event and the object level of an object-centric event log (e.g. [“cost”, “amount”, “name”])

Parameters

ocel – Object-centric event log

Returns

List of attributes at the event and object level (e.g. [“cost”, “amount”, “name”])

Return type

attributes_list

pm4py.ocel.ocel_get_object_types(ocel: pm4py.objects.ocel.obj.OCEL) List[str][source]

Gets the list of object types contained in the object-centric event log (e.g., [“order”, “item”, “delivery”]).

Parameters

ocel – Object-centric event log

Returns

List of object types contained in the event log (e.g., [“order”, “item”, “delivery”])

Return type

object_types_list

pm4py.ocel.ocel_object_type_activities(ocel: pm4py.objects.ocel.obj.OCEL) Dict[str, Collection[str]][source]

Gets the set of activities performed for each object type

Parameters

ocel – Object-centric event log

Returns

A dictionary having as key the object types and as values the activities performed for that object type

Return type

dict

pm4py.ocel.ocel_objects_ot_count(ocel: pm4py.objects.ocel.obj.OCEL) Dict[str, Dict[str, int]][source]

Counts for each event the number of related objects per type

Parameters
  • ocel – Object-centric Event log

  • parameters – Parameters of the algorithm, including: - Parameters.EVENT_ID => the event identifier to be used - Parameters.OBJECT_ID => the object identifier to be used - Parameters.OBJECT_TYPE => the object type to be used

Returns

Dictionary associating to each event identifier a dictionary with the number of related objects

Return type

dict_ot

pm4py.ocel.ocel_objects_summary(ocel: pm4py.objects.ocel.obj.OCEL) pandas.core.frame.DataFrame[source]

Gets the objects summary of an object-centric event log

Parameters

ocel – object-centric event log

Return type

pd.DataFrame

import pm4py

objects_summary = pm4py.ocel_objects_summary(ocel)
pm4py.ocel.ocel_temporal_summary(ocel: pm4py.objects.ocel.obj.OCEL) pandas.core.frame.DataFrame[source]

Returns the ``temporal summary’’ from an object-centric event log. The temporal summary aggregates all the events performed in the same timestamp, and reports the set of activities and the involved objects.

Parameters

ocel – object-centric event log

Return type

pd.DataFrame

import pm4py

temporal_summary = pm4py.ocel_temporal_summary(ocel)

pm4py.org module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.org.discover_activity_based_resource_similarity(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame])[source]

Calculates similarity between the resources in the event log, based on their activity profiles.

Parameters

log – Event log or Pandas dataframe

Returns

Values of the metric

Return type

metric_values

pm4py.org.discover_handover_of_work_network(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], beta=0)[source]

Calculates the handover of work network of the event log. The handover of work network is essentially the DFG of the event log, however, using the resource as a node of the graph, instead of the activity. As such, to use this, resource information should be present in the event log.

Parameters
  • log – Event log or Pandas dataframe

  • beta – beta parameter for Handover metric

Returns

Values of the metric

Return type

metric_values

pm4py.org.discover_network_analysis(log: Union[pandas.core.frame.DataFrame, pm4py.objects.log.obj.EventLog, pm4py.objects.log.obj.EventStream], out_column: str, in_column: str, node_column_source: str, node_column_target: str, edge_column: str, edge_reference: str = '_out', performance: bool = False, sorting_column: str = 'time:timestamp', timestamp_column: str = 'time:timestamp') Dict[Tuple[str, str], Dict[str, Any]][source]

Performs a network analysis of the log based on the provided parameters. The output is a multigraph. Two events EV1 and EV2 of the log are merged (indipendently from the case notion) based on having EV1.OUT_COLUMN = EV2.IN_COLUMN. Then, an aggregation is applied on the couple of events (NODE_COLUMN) to obtain the nodes that are connected. The edges between these nodes are aggregated based on some property of the source event (EDGE_COLUMN).

Parameters
  • log – Event log / Pandas dataframe

  • out_column – The source column of the link (default: the case identifier; events of the same case are linked)

  • in_column – The target column of the link (default: the case identifier; events of the same case are linked)

  • node_column_source – The attribute to be used for the node definition of the source event (default: the resource of the log, org:resource)

  • node_column_target – The attribute to be used for the node definition of the target event (default: the resource of the log, org:resource)

  • edge_column

    The attribute to be used for the edge definition (default: the activity of the log,

    concept:name)

  • edge_reference

    Decide if the edge attribute should be picked from the source event. Values:
    • _out => the source event

    • _in => the target event

  • performance – Boolean value that enables the performance calculation on the edges of the network analysis

  • sorting_column – The column that should be used to sort the log before performing the network analysis (default: time:timestamp)

  • timestamp_column – The column that should be used as timestamp for the performance-related analysis (default: time:timestamp)

Returns

Edges of the network analysis (first key: edge; second key: type; value: number of occurrences)

Return type

network_analysis

pm4py.org.discover_organizational_roles(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame])[source]

Mines the organizational roles

Parameters

log – Event log or Pandas dataframe

Returns

Organizational roles. List where each role is a sublist with two elements: - The first element of the sublist is the list of activities belonging to a role. Each activity belongs to a single role - The second element of the sublist is a dictionary containing the resources of the role and the number of times they executed activities belonging to the role.

Return type

roles

pm4py.org.discover_subcontracting_network(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], n=2)[source]

Calculates the subcontracting network of the process.

Parameters
  • log – Event log or Pandas dataframe

  • n – n parameter for Subcontracting metric

Returns

Values of the metric

Return type

metric_values

pm4py.org.discover_working_together_network(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame])[source]

Calculates the working together network of the process. Two nodes resources are connected in the graph if the resources collaborate on an instance of the process.

Parameters

log – Event log or Pandas dataframe

Returns

Values of the metric

Return type

metric_values

pm4py.read module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.read.read_bpmn(file_path: str) pm4py.objects.bpmn.obj.BPMN[source]

Reads a BPMN from a .bpmn file

Parameters

file_path – File path

Returns

BPMN graph

Return type

bpmn_graph

pm4py.read.read_dfg(file_path: str) Tuple[dict, dict, dict][source]

Reads a DFG from a .dfg file

Parameters

file_path – File path

Returns

  • dfg – DFG

  • start_activities – Start activities

  • end_activities – End activities

pm4py.read.read_ocel(file_path: str, objects_path: Optional[str] = None) pm4py.objects.ocel.obj.OCEL[source]

Reads an object-centric event log from a file (to get an explanation of what an object-centric event log is, you can refer to http://www.ocel-standard.org/).

Parameters
  • file_path – Path from which the object-centric event log should be read.

  • objects_path – (Optional, only used in CSV exporter) Path from which the objects dataframe should be read.

Returns

Object-centric event log

Return type

ocel

pm4py.read.read_petri_net(file_path: str) Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking][source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. read_petri_net is deprecated, use read_pnml instead

pm4py.read.read_pnml(file_path: str) Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking][source]

Reads a Petri net from the .PNML format

Parameters

file_path – File path

Returns

  • petri_net – Petri net object

  • initial_marking – Initial marking

  • final_marking – Final marking

pm4py.read.read_process_tree(file_path: str) Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking][source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. read_process_tree is deprecated, use read_ptml instead

pm4py.read.read_ptml(file_path: str) pm4py.objects.process_tree.obj.ProcessTree[source]

Reads a process tree from a .ptml file

Parameters

file_path – File path

Returns

Process tree

Return type

tree

pm4py.read.read_xes(file_path: str) pm4py.objects.log.obj.EventLog[source]

Reads an event log in the XES standard

Parameters

file_path – File path

Returns

Event log

Return type

log

pm4py.sim module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.sim.generate_process_tree(**kwargs) pm4py.objects.process_tree.obj.ProcessTree[source]

Generates a process tree

Parameters

kwargs – Parameters of the process tree generator algorithm

Returns

process tree

Return type

model

pm4py.sim.play_out(*args: Union[Tuple[pm4py.objects.petri_net.obj.PetriNet, pm4py.objects.petri_net.obj.Marking, pm4py.objects.petri_net.obj.Marking], dict, collections.Counter, pm4py.objects.process_tree.obj.ProcessTree], **kwargs) pm4py.objects.log.obj.EventLog[source]

Performs the playout of the provided model, i.e., gets a set of traces from the model. The function either takes a petri net, initial and final marking, or, a process tree as an input.

Parameters
  • args – Model (Petri net, initial, final marking) or ProcessTree

  • kwargs – Parameters of the playout

Returns

Simulated event log

Return type

log

pm4py.stats module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.stats.get_activity_position_summary(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], activity: str) Dict[int, int][source]

Given an event log, returns a dictionary which summarize the positions of the activities in the different cases of the event log. E.g., if an activity happens 1000 times in the position 1 (the second event of a case), and 500 times in the position 2 (the third event of a case), then the returned dictionary would be: {1: 1000, 2: 500}

Parameters
  • log – Event log object / Pandas dataframe

  • activity – Activity to consider

Returns

Summary of the positions of the activity in the trace (e.g. {1: 1000, 2: 500})

Return type

pos_dict_summary

pm4py.stats.get_all_case_durations(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], business_hours: bool = False, worktiming: List[int] = [7, 17], weekends: List[int] = [6, 7]) List[float][source]

Gets the durations of the cases in the event log

Parameters
  • log – Event log

  • business_hours – Enables/disables the computation based on the business hours (default: False)

  • worktiming – (If the business hours are enabled) The hour range in which the resources of the log are working (default: 7 to 17)

  • weekends – (If the business hours are enabled) The weekends days (default: Saturday (6), Sunday (7))

Returns

Case durations (as list)

Return type

durations

pm4py.stats.get_attribute_values(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], attribute: str, count_once_per_case=False) Dict[str, int][source]

Deprecated since version 2.2.10: This will be removed in 3.0.0. please use get_event_attribute_values instead

pm4py.stats.get_attributes(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) List[str][source]

Deprecated since version 2.2.10: This will be removed in 3.0.0. please use get_event_attributes instead

pm4py.stats.get_case_arrival_average(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) float[source]

Gets the average difference between the start times of two consecutive cases

Parameters

log – Log object

Returns

Average difference between the start times of two consecutive cases

Return type

case_arrival_average

pm4py.stats.get_case_duration(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], case_id: str, business_hours: bool = False, worktiming: List[int] = [7, 17], weekends: List[int] = [6, 7]) float[source]

Gets the duration of a specific case

Parameters
  • log – Event log

  • case_id – Case identifier

  • business_hours – Enables/disables the computation based on the business hours (default: False)

  • worktiming – (If the business hours are enabled) The hour range in which the resources of the log are working (default: 7 to 17)

  • weekends – (If the business hours are enabled) The weekends days (default: Saturday (6), Sunday (7))

Returns

Duration of the given case

Return type

duration

pm4py.stats.get_case_overlap(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) List[int][source]

Associates to each case in the log the number of cases concurrently open

Parameters

log – Log object

Returns

List that for each case (identified by its index in the log) tells how many other cases are concurrently open.

Return type

overlap_list

pm4py.stats.get_cycle_time(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) float[source]

Calculates the cycle time of the event log.

The definition that has been followed is the one proposed in: https://www.presentationeze.com/presentations/lean-manufacturing-just-in-time/lean-manufacturing-just-in-time-full-details/process-cycle-time-analysis/calculate-cycle-time/#:~:text=Cycle%20time%20%3D%20Average%20time%20between,is%2024%20minutes%20on%20average.

So: Cycle time = Average time between completion of units.

Example taken from the website: Consider a manufacturing facility, which is producing 100 units of product per 40 hour week. The average throughput rate is 1 unit per 0.4 hours, which is one unit every 24 minutes. Therefore the cycle time is 24 minutes on average.

Parameters

log – Log object

Returns

Cycle time (calculated with the aforementioned formula).

Return type

cycle_time

pm4py.stats.get_end_activities(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Dict[str, int][source]

Returns the end activities of a log

Parameters

log – Lob object

Returns

Dictionary of end activities along with their count

Return type

end_activities

pm4py.stats.get_event_attribute_values(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], attribute: str, count_once_per_case=False) Dict[str, int][source]

Returns the values for a specified attribute

Parameters
  • log – Log object

  • attribute – Attribute

  • count_once_per_case – If True, consider only an occurrence of the given attribute value inside a case (if there are multiple events sharing the same attribute value, count only 1 occurrence)

Returns

Dictionary of values along with their count

Return type

attribute_values

pm4py.stats.get_event_attributes(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) List[str][source]

Returns the attributes at the event level of the log

Parameters

log – Log object

Returns

List of attributes contained in the log

Return type

attributes_list

pm4py.stats.get_minimum_self_distance_witnesses(log: pm4py.objects.log.obj.EventLog) Dict[str, Set[str]][source]

This function derives the minimum self distance witnesses. The self distance of a in <a> is infinity, of a in <a,a> is 0, in <a,b,a> is 1, etc. The minimum self distance is the minimal observed self distance value in the event log. A ‘witness’ is an activity that witnesses the minimum self distance. For example, if the minimum self distance of activity a in some log L is 2, then, if trace <a,b,c,a> is in log L, b and c are a witness of a.

Parameters

log – Event Log to use

Returns

Return type

Dictionary mapping each activity to a set of witnesses.

pm4py.stats.get_minimum_self_distances(log: pm4py.objects.log.obj.EventLog) Dict[str, int][source]

This algorithm computes the minimum self-distance for each activity observed in an event log. The self distance of a in <a> is infinity, of a in <a,a> is 0, in <a,b,a> is 1, etc. The minimum self distance is the minimal observed self distance value in the event log.

Parameters

log – event log (either pandas.DataFrame, EventLog or EventStream)

Returns

Return type

dict mapping an activity to its self-distance, if it exists, otherwise it is not part of the dict.

pm4py.stats.get_rework_cases_per_activity(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Dict[str, int][source]

Find out for which activities of the log the rework (more than one occurrence in the trace for the activity) occurs. The output is a dictionary associating to each of the aforementioned activities the number of cases for which the rework occurred.

Parameters

log – Log object

Returns

Dictionary associating to each of the aforementioned activities the number of cases for which the rework occurred.

Return type

rework_dictionary

pm4py.stats.get_start_activities(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Dict[str, int][source]

Returns the start activities from a log object

Parameters

log – Log object

Returns

Dictionary of start activities along with their count

Return type

start_activities

pm4py.stats.get_trace_attribute_values(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], attribute: str) Dict[str, int][source]

Returns the values for a specified trace attribute

Parameters
  • log – Log object

  • attribute – Attribute

Returns

Dictionary of values along with their count

Return type

attribute_values

pm4py.stats.get_trace_attributes(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) List[str][source]

Gets the attributes at the trace level of a log object

Parameters

log – Log object

Returns

List of attributes at the trace level

Return type

trace_attributes_list

pm4py.stats.get_variants(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Dict[str, List[pm4py.objects.log.obj.Trace]][source]

Gets the variants from the log

Parameters

log – Event log

Returns

Dictionary of variants along with their count

Return type

variants

pm4py.stats.get_variants_as_tuples(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame]) Dict[Tuple[str], List[pm4py.objects.log.obj.Trace]][source]

Gets the variants from the log (where the keys are tuples and not strings)

Parameters

log – Event log

Returns

Dictionary of variants along with their count

Return type

variants

pm4py.utils module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.utils.deserialize(ser_obj: Tuple[str, bytes]) Any[source]

Deserialize a bytes string to a PM4Py object

Parameters

ser – Serialized object (a tuple consisting of a string denoting the type of the object, and a bytes string representing the serialization)

Returns

A PM4Py object, among: - an EventLog object - a Pandas dataframe object - a (Petrinet, Marking, Marking) tuple - a ProcessTree object - a BPMN object - a DFG, including the dictionary of the directly-follows relations, the start activities and the end activities

Return type

obj

pm4py.utils.format_dataframe(df: pandas.core.frame.DataFrame, case_id: str = 'case:concept:name', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', start_timestamp_key: str = 'start_timestamp', timest_format: Optional[str] = None) pandas.core.frame.DataFrame[source]

Give the appropriate format on the dataframe, for process mining purposes

Parameters
  • df – Dataframe

  • case_id – Case identifier column

  • activity_key – Activity column

  • timestamp_key – Timestamp column

  • start_timestamp_key – Start timestamp column

  • timest_format – Timestamp format that is provided to Pandas

Returns

Dataframe

Return type

df

pm4py.utils.get_properties(log)[source]

Gets the properties from a log object

Parameters

log – Log object

Returns

Dictionary containing the properties of the log object

Return type

prop_dict

pm4py.utils.parse_event_log_string(traces: Collection[str], sep: str = ',', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'concept:name') pm4py.objects.log.obj.EventLog[source]

Parse a collection of traces expressed as strings (e.g., [“A,B,C,D”, “A,C,B,D”, “A,D”]) to an event log

Parameters
  • traces – Collection of traces expressed as strings

  • sep – Separator used to split the activities of a string trace

  • activity_key – The attribute that should be used as activity

  • timestamp_key – The attribute that should be used as timestamp

  • case_id_key – The attribute that should be used as case identifier

Returns

Event log

Return type

log

pm4py.utils.parse_process_tree(tree_string: str) pm4py.objects.process_tree.obj.ProcessTree[source]

Parse a process tree from a string

Parameters

tree_string – String representing a process tree (e.g. ‘-> ( ‘A’, O ( ‘B’, ‘C’ ), ‘D’ )’) Operators are ‘->’: sequence, ‘+’: parallel, ‘X’: xor choice, ‘*’: binary loop, ‘O’ or choice

Returns

Process tree

Return type

tree

pm4py.utils.project_on_event_attribute(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], attribute_key='concept:name') List[List[str]][source]

Project the event log on a specified event attribute. The result is a list, containing a list for each case: all the cases are transformed to list of values for the specified attribute.

Parameters
  • log – Event log / Pandas dataframe

  • attribute_key – The attribute to be used

Returns

Projection on the given attribute (a list containing, for each case, a list of its values for the specified attribute).

Example:

pm4py.project_on_event_attribute(log, “concept:name”)

[[‘register request’, ‘examine casually’, ‘check ticket’, ‘decide’, ‘reinitiate request’, ‘examine thoroughly’, ‘check ticket’, ‘decide’, ‘pay compensation’], [‘register request’, ‘check ticket’, ‘examine casually’, ‘decide’, ‘pay compensation’], [‘register request’, ‘examine thoroughly’, ‘check ticket’, ‘decide’, ‘reject request’], [‘register request’, ‘examine casually’, ‘check ticket’, ‘decide’, ‘pay compensation’], [‘register request’, ‘examine casually’, ‘check ticket’, ‘decide’, ‘reinitiate request’, ‘check ticket’, ‘examine casually’, ‘decide’, ‘reinitiate request’, ‘examine casually’, ‘check ticket’, ‘decide’, ‘reject request’], [‘register request’, ‘check ticket’, ‘examine thoroughly’, ‘decide’, ‘reject request’]]

Return type

projected_cases

pm4py.utils.rebase(log_obj: Union[pm4py.objects.log.obj.EventLog, pm4py.objects.log.obj.EventStream, pandas.core.frame.DataFrame], case_id: str = 'case:concept:name', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', start_timestamp_key: str = 'start_timestamp')[source]

Re-base the log object, changing the case ID, activity and timestamp attributes.

Parameters
  • log_obj – Log object

  • case_id – Case identifier

  • activity_key – Activity

  • timestamp_key – Timestamp

  • start_timestamp_key – Start timestamp

Returns

Rebased log object

Return type

rebased_log_obj

pm4py.utils.sample_cases(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], num_cases: int) Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame][source]

(Random) Sample a given number of cases from the event log.

Parameters
  • log – Event log / Pandas dataframe

  • num_cases – Number of cases to sample

Returns

Sampled event log (containing the specified amount of cases)

Return type

sampled_log

pm4py.utils.sample_events(log: Union[pm4py.objects.log.obj.EventStream, pm4py.objects.ocel.obj.OCEL], num_events: int) Union[pm4py.objects.log.obj.EventStream, pm4py.objects.ocel.obj.OCEL][source]

(Random) Sample a given number of events from the event log.

Parameters
  • log – Event stream / OCEL / Pandas dataframes

  • num_events – Number of events to sample

Returns

Sampled event stream / OCEL / Pandas dataframes (containing the specified amount of events)

Return type

sampled_log

pm4py.utils.serialize(*args) Tuple[str, bytes][source]

Serialize a PM4Py object into a bytes string

Parameters

args – A PM4Py object, among: - an EventLog object - a Pandas dataframe object - a (Petrinet, Marking, Marking) tuple - a ProcessTree object - a BPMN object - a DFG, including the dictionary of the directly-follows relations, the start activities and the end activities

Returns

Serialized object (a tuple consisting of a string denoting the type of the object, and a bytes string representing the serialization)

Return type

ser

pm4py.utils.set_classifier(log, classifier, classifier_attribute='@@classifier')[source]

Methods to set the specified classifier on an existing event log

Parameters
  • log – Log object

  • classifier – Classifier that should be set: - A list of event attributes can be provided - A single event attribute can be provided - A classifier stored between the “classifiers” of the log object can be provided

  • classifier_attribute – The attribute of the event that should store the concatenation of the attribute values for the given classifier

Returns

The same event log (methods acts inplace)

Return type

log

pm4py.vis module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.vis.save_vis_bpmn(bpmn_graph: pm4py.objects.bpmn.obj.BPMN, file_path: str)[source]

Saves the visualization of a BPMN graph

Parameters
  • bpmn_graph – BPMN graph

  • file_path – Destination path

pm4py.vis.save_vis_case_duration_graph(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], file_path: str)[source]

Saves the case duration graph in the specified path

Parameters
  • log – Log object

  • file_path – Destination path

pm4py.vis.save_vis_dfg(dfg: dict, start_activities: dict, end_activities: dict, file_path: str, log: Optional[pm4py.objects.log.obj.EventLog] = None)[source]

Saves a DFG visualization to a file

Parameters
  • dfg – DFG object

  • start_activities – Start activities

  • end_activities – End activities

  • file_path – Destination path

pm4py.vis.save_vis_dotted_chart(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], file_path: str, attributes=None)[source]

Saves the visualization of the dotted chart

Parameters
  • log – Event log

  • file_path – Destination path

  • attributes – Attributes that should be used to construct the dotted chart (for example, [“concept:name”, “org:resource”])

pm4py.vis.save_vis_events_distribution_graph(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], file_path: str, distr_type: str = 'days_week')[source]

Saves the distribution of the events in a picture file

Parameters
  • log – Event log

  • file_path – Destination path (including the extension)

  • distr_type – Type of distribution (default: days_week): - days_month => Gets the distribution of the events among the days of a month (from 1 to 31) - months => Gets the distribution of the events among the months (from 1 to 12) - years => Gets the distribution of the events among the years of the event log - hours => Gets the distribution of the events among the hours of a day (from 0 to 23) - days_week => Gets the distribution of the events among the days of a week (from Monday to Sunday)

pm4py.vis.save_vis_events_per_time_graph(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], file_path: str)[source]

Saves the events per time graph in the specified path

Parameters
  • log – Log object

  • file_path – Destination path

pm4py.vis.save_vis_heuristics_net(heu_net: pm4py.objects.heuristics_net.obj.HeuristicsNet, file_path: str)[source]

Saves the visualization of an heuristics net

Parameters
  • heu_net – Heuristics nte

  • file_path – Destination path

pm4py.vis.save_vis_network_analysis(network_analysis: Dict[Tuple[str, str], Dict[str, Any]], file_path: str, variant: str = 'frequency', activity_threshold: int = 1, edge_threshold: int = 1)[source]

Saves the visualization of the network analysis

Parameters
  • network_analysis – Network analysis

  • file_path – Target path of the visualization

  • variant

    Variant of the visualization:
    • frequency (if the discovered network analysis contains the frequency of the interactions)

    • performance (if the discovered network analysis contains the performance of the interactions)

  • activity_threshold – The minimum number of occurrences for an activity to be included (default: 1)

  • edge_threshold – The minimum number of occurrences for an edge to be included (default: 1)

pm4py.vis.save_vis_ocdfg(ocdfg: Dict[str, Any], file_path: str, annotation: str = 'frequency', act_metric: str = 'events', edge_metric='event_couples', act_threshold: int = 0, edge_threshold: int = 0, performance_aggregation: str = 'mean')[source]

Saves the visualization of an OC-DFG (object-centric directly-follows graph) with the provided configuration.

Parameters
  • ocdfg – Object-centric directly-follows graph

  • file_path – Destination path (including the extension)

  • annotation

    The annotation to use for the visualization. Values:
    • “frequency”: frequency annotation

    • “performance”: performance annotation

  • act_metric

    The metric to use for the activities. Available values:
    • “events” => number of events (default)

    • “unique_objects” => number of unique objects

    • “total_objects” => number of total objects

  • edge_metric

    The metric to use for the edges. Available values:
    • “event_couples” => number of event couples (default)

    • “unique_objects” => number of unique objects

    • “total_objects” => number of total objects

  • act_threshold – The threshold to apply on the activities frequency (default: 0). Only activities having a frequency >= than this are kept in the graph.

  • edge_threshold – The threshold to apply on the edges frequency (default 0). Only edges having a frequency >= than this are kept in the graph.

  • performance_aggregation – The aggregation measure to use for the performance: mean, median, min, max, sum

pm4py.vis.save_vis_ocpn(ocpn: Dict[str, Any], file_path: str)[source]

Saves the visualization of the object-centric Petri net into a file

Parameters
  • ocpn – Object-centric Petri net

  • file_path – Target path of the visualization

pm4py.vis.save_vis_performance_dfg(dfg: dict, start_activities: dict, end_activities: dict, file_path: str, aggregation_measure='mean')[source]

Saves the visualization of a performance DFG

Parameters
  • dfg – DFG object

  • start_activities – Start activities

  • end_activities – End activities

  • file_path – Destination path

  • aggregation_measure – Aggregation measure (default: mean): mean, median, min, max, sum, stdev

pm4py.vis.save_vis_performance_spectrum(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], activities: List[str], file_path: str)[source]

Saves the visualization of the performance spectrum to a file

Parameters
  • log – Event log

  • activities – List of activities (in order) that is used to build the performance spectrum

  • file_path – Destination path (including the extension)

pm4py.vis.save_vis_petri_net(petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking, file_path: str)[source]

Saves a Petri net visualization to a file

Parameters
  • petri_net – Petri net

  • initial_marking – Initial marking

  • final marking – Final marking

  • file_path – Destination path

pm4py.vis.save_vis_process_tree(tree: pm4py.objects.process_tree.obj.ProcessTree, file_path: str)[source]

Saves the visualization of a process tree

Parameters
  • tree – Process tree

  • file_path – Destination path

pm4py.vis.save_vis_sna(sna_metric, file_path: str)[source]

Saves the visualization of a SNA metric in a .html file

Parameters
  • sna_metric – Values of the metric

  • file_path – Destination path

pm4py.vis.view_bpmn(bpmn_graph: pm4py.objects.bpmn.obj.BPMN, format: str = 'png')[source]

Views a BPMN graph

Parameters
  • bpmn_graph – BPMN graph

  • format – Format of the visualization (default: png)

pm4py.vis.view_case_duration_graph(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], format: str = 'png')[source]

Visualizes the case duration graph

Parameters
  • log – Log object

  • format – Format of the visualization (png, svg, …)

pm4py.vis.view_dfg(dfg: dict, start_activities: dict, end_activities: dict, format: str = 'png', log: Optional[pm4py.objects.log.obj.EventLog] = None)[source]

Views a (composite) DFG

Parameters
  • dfg – DFG object

  • start_activities – Start activities

  • end_activities – End activities

  • format – Format of the output picture (default: png)

pm4py.vis.view_dotted_chart(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], format: str = 'png', attributes=None)[source]

Displays the dotted chart

Parameters
  • log – Event log

  • format – Image format

  • attributes – Attributes that should be used to construct the dotted chart. If None, the default dotted chart will be shown:

    x-axis: time y-axis: cases (in order of occurrence in the event log) color: activity

    For custom attributes, use a list of attributes of the form [x-axis attribute, y-axis attribute, color attribute], e.g., [“concept:name”, “org:resource”, “concept:name”])

pm4py.vis.view_events_distribution_graph(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], distr_type: str = 'days_week', format='png')[source]

Shows the distribution of the events in the specified dimension

Parameters
  • log – Event log

  • distr_type – Type of distribution (default: days_week): - days_month => Gets the distribution of the events among the days of a month (from 1 to 31) - months => Gets the distribution of the events among the months (from 1 to 12) - years => Gets the distribution of the events among the years of the event log - hours => Gets the distribution of the events among the hours of a day (from 0 to 23) - days_week => Gets the distribution of the events among the days of a week (from Monday to Sunday) - weeks => Gets the distribution of the events among the weeks of a year (from 0 to 52)

  • format – Format of the visualization (default: png)

pm4py.vis.view_events_per_time_graph(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], format: str = 'png')[source]

Visualizes the events per time graph

Parameters
  • log – Log object

  • format – Format of the visualization (png, svg, …)

pm4py.vis.view_heuristics_net(heu_net: pm4py.objects.heuristics_net.obj.HeuristicsNet, format: str = 'png')[source]

Views an heuristics net

Parameters
  • heu_net – Heuristics net

  • format – Format of the visualization (default: png)

pm4py.vis.view_network_analysis(network_analysis: Dict[Tuple[str, str], Dict[str, Any]], variant: str = 'frequency', format: str = 'png', activity_threshold: int = 1, edge_threshold: int = 1)[source]

Visualizes the network analysis

Parameters
  • network_analysis – Network analysis

  • variant

    Variant of the visualization:
    • frequency (if the discovered network analysis contains the frequency of the interactions)

    • performance (if the discovered network analysis contains the performance of the interactions)

  • format – Format of the visualization (default: png)

  • activity_threshold – The minimum number of occurrences for an activity to be included (default: 1)

  • edge_threshold – The minimum number of occurrences for an edge to be included (default: 1)

pm4py.vis.view_ocdfg(ocdfg: Dict[str, Any], annotation: str = 'frequency', act_metric: str = 'events', edge_metric='event_couples', act_threshold: int = 0, edge_threshold: int = 0, performance_aggregation: str = 'mean', format: str = 'png')[source]

Views an OC-DFG (object-centric directly-follows graph) with the provided configuration.

Parameters
  • ocdfg – Object-centric directly-follows graph

  • annotation

    The annotation to use for the visualization. Values:
    • “frequency”: frequency annotation

    • “performance”: performance annotation

  • act_metric

    The metric to use for the activities. Available values:
    • “events” => number of events (default)

    • “unique_objects” => number of unique objects

    • “total_objects” => number of total objects

  • edge_metric

    The metric to use for the edges. Available values:
    • “event_couples” => number of event couples (default)

    • “unique_objects” => number of unique objects

    • “total_objects” => number of total objects

  • act_threshold – The threshold to apply on the activities frequency (default: 0). Only activities having a frequency >= than this are kept in the graph.

  • edge_threshold – The threshold to apply on the edges frequency (default 0). Only edges having a frequency >= than this are kept in the graph.

  • performance_aggregation – The aggregation measure to use for the performance: mean, median, min, max, sum

  • format – The format of the output visualization (default: “png”)

pm4py.vis.view_ocpn(ocpn: Dict[str, Any], format: str = 'png')[source]

Visualizes on the screen the object-centric Petri net

Parameters
  • ocpn – Object-centric Petri net

  • format – Format of the visualization (default: png)

pm4py.vis.view_performance_dfg(dfg: dict, start_activities: dict, end_activities: dict, format: str = 'png', aggregation_measure='mean')[source]

Views a performance DFG

Parameters
  • dfg – DFG object

  • start_activities – Start activities

  • end_activities – End activities

  • format – Format of the output picture (default: png)

  • aggregation_measure – Aggregation measure (default: mean): mean, median, min, max, sum, stdev

pm4py.vis.view_performance_spectrum(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], activities: List[str], format: str = 'png')[source]

Displays the performance spectrum

Parameters
  • perf_spectrum – Performance spectrum

  • format – Format of the visualization (png, svg …)

pm4py.vis.view_petri_net(petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: Optional[pm4py.objects.petri_net.obj.Marking] = None, final_marking: Optional[pm4py.objects.petri_net.obj.Marking] = None, format: str = 'png')[source]

Views a (composite) Petri net

Parameters
  • petri_net – Petri net

  • initial_marking – Initial marking

  • final marking – Final marking

  • format – Format of the output picture (default: png)

pm4py.vis.view_process_tree(tree: pm4py.objects.process_tree.obj.ProcessTree, format: str = 'png')[source]

Views a process tree

Parameters
  • tree – Process tree

  • format – Format of the visualization (default: png)

pm4py.vis.view_sna(sna_metric)[source]

Represents a SNA metric (.html)

Parameters

sna_metric – Values of the metric

pm4py.write module

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.write.write_bpmn(bpmn_graph: pm4py.objects.bpmn.obj.BPMN, file_path: str, enable_layout: bool = True)[source]

Writes a BPMN to a file

Parameters
  • bpmn_graph – BPMN

  • file_path – Destination path

  • enable_layout – Enables the automatic layouting of the BPMN diagram (default: True)

pm4py.write.write_dfg(dfg: dict, start_activities: dict, end_activities: dict, file_path: str)[source]

Exports a DFG

Parameters
  • dfg – DFG

  • start_activities – Start activities

  • end_activities – End activities

  • file_path – Destination path

Returns

Return type

void

pm4py.write.write_ocel(ocel: pm4py.objects.ocel.obj.OCEL, file_path: str, objects_path: Optional[str] = None)[source]

Stores the content of the object-centric event log to a file. Different formats are supported, including CSV (flat table), JSON-OCEL and XML-OCEL (described in the site http://www.ocel-standard.org/).

Parameters
  • ocel – Object-centric event log

  • file_path – Path at which the object-centric event log should be stored

  • objects_path – (Optional, only used in CSV exporter) Path where the objects dataframe should be stored

pm4py.write.write_petri_net(petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking, file_path: str) None[source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. write_petri_net is deprecated, please use write_pnml

pm4py.write.write_pnml(petri_net: pm4py.objects.petri_net.obj.PetriNet, initial_marking: pm4py.objects.petri_net.obj.Marking, final_marking: pm4py.objects.petri_net.obj.Marking, file_path: str) None[source]

Exports a (composite) Petri net object

Parameters
  • petri_net – Petri net

  • initial_marking – Initial marking

  • final_marking – Final marking

  • file_path – Destination path

Returns

Return type

void

pm4py.write.write_process_tree(tree: pm4py.objects.process_tree.obj.ProcessTree, file_path: str) None[source]

Deprecated since version 2.2.2: This will be removed in 2.4.0. write_process_tree is deprecated, please use write_ptml

pm4py.write.write_ptml(tree: pm4py.objects.process_tree.obj.ProcessTree, file_path: str) None[source]

Exports a process tree

Parameters
  • tree – Process tree

  • file_path – Destination path

Returns

Return type

void

pm4py.write.write_xes(log: Union[pm4py.objects.log.obj.EventLog, pandas.core.frame.DataFrame], file_path: str) None[source]

Exports a XES log

Parameters
  • log – Event log

  • file_path – Destination path

Returns

Return type

void

Module contents

Process Mining for Python (PM4Py)