pm4py.algo.discovery.correlation_mining.variants package#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

Submodules#

pm4py.algo.discovery.correlation_mining.variants.classic module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.algo.discovery.correlation_mining.variants.classic.Parameters(value)[source]#

Bases: Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'#
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'#
EXACT_TIME_MATCHING = 'exact_time_matching'#
INDEX_KEY = 'index_key'#
pm4py.algo.discovery.correlation_mining.variants.classic.apply(log: Union[EventLog, EventStream, DataFrame], parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Dict[Tuple[str, str], int], Dict[Tuple[str, str], float]][source]#

Apply the correlation miner to an event stream (other types of logs are converted to that)

The approach is described in: Pourmirza, Shaya, Remco Dijkman, and Paul Grefen. “Correlation miner: mining business process models and event correlations without case identifiers.” International Journal of Cooperative Information Systems 26.02 (2017): 1742002.

Parameters#

log

Log object

parameters

Parameters of the algorithm

Returns#

dfg

DFG

performance_dfg

Performance DFG (containing the estimated performance for the arcs)

pm4py.algo.discovery.correlation_mining.variants.classic.resolve_lp_get_dfg(PS_matrix, duration_matrix, activities, activities_counter)[source]#

Resolves a LP problem to get a DFG

Parameters#

PS_matrix

Precede-succeed matrix

duration_matrix

Duration matrix

activities

List of activities of the log

activities_counter

Counter of the activities

Returns#

dfg

DFG

performance_dfg

Performance DFG (containing the estimated performance for the arcs)

pm4py.algo.discovery.correlation_mining.variants.classic.get_PS_dur_matrix(activities_grouped, activities, parameters=None)[source]#

Combined methods to get the two matrixes

Parameters#

activities_grouped

Grouped activities

activities

List of activities of the log

parameters

Parameters of the algorithm

Returns#

PS_matrix

Precede-succeed matrix

duration_matrix

Duration matrix

pm4py.algo.discovery.correlation_mining.variants.classic.preprocess_log(log, activities=None, parameters=None)[source]#

Preprocess a log to enable correlation mining

Parameters#

log

Log object

activities

(if provided) list of activities of the log

parameters

Parameters of the algorithm

Returns#

transf_stream

Transformed stream

activities_grouped

Grouped activities

activities

List of activities of the log

pm4py.algo.discovery.correlation_mining.variants.classic.get_precede_succeed_matrix(activities, activities_grouped, timestamp_key, start_timestamp_key)[source]#

Calculates the precede succeed matrix

Parameters#

activities

Ordered list of activities of the log

activities_grouped

Grouped list of activities

timestamp_key

Timestamp key

start_timestamp_key

Start timestamp key (events start)

Returns#

precede_succeed_matrix

Precede succeed matrix

pm4py.algo.discovery.correlation_mining.variants.classic.get_duration_matrix(activities, activities_grouped, timestamp_key, start_timestamp_key, exact=False)[source]#

Calculates the duration matrix

Parameters#

activities

Ordered list of activities of the log

activities_grouped

Grouped list of activities

timestamp_key

Timestamp key

start_timestamp_key

Start timestamp key (events start)

exact

Performs an exact matching of the times (True/False)

Returns#

duration_matrix

Duration matrix

pm4py.algo.discovery.correlation_mining.variants.classic_split module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.algo.discovery.correlation_mining.variants.classic_split.Parameters(value)[source]#

Bases: Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'#
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'#
SAMPLE_SIZE = 'sample_size'#
pm4py.algo.discovery.correlation_mining.variants.classic_split.apply(log: Union[EventLog, EventStream, DataFrame], parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Dict[Tuple[str, str], int], Dict[Tuple[str, str], float]][source]#

Applies the correlation miner (splits the log in smaller chunks)

Parameters#

log

Log object

parameters

Parameters of the algorithm

Returns#

dfg

Frequency DFG

performance_dfg

Performance DFG

pm4py.algo.discovery.correlation_mining.variants.trace_based module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.algo.discovery.correlation_mining.variants.trace_based.Parameters(value)[source]#

Bases: Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'#
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'#
CASE_ID_KEY = 'pm4py:param:case_id_key'#
INDEX_KEY = 'index_key'#
pm4py.algo.discovery.correlation_mining.variants.trace_based.apply(log: Union[EventLog, EventStream, DataFrame], parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Dict[Tuple[str, str], int], Dict[Tuple[str, str], float]][source]#

Novel approach of correlation mining, that creates the PS-matrix and the duration matrix using the order list of events of each trace of the log

Parameters#

log

Event log

parameters

Parameters

Returns#

dfg

DFG

performance_dfg

Performance DFG (containing the estimated performance for the arcs)

pm4py.algo.discovery.correlation_mining.variants.trace_based.resolve_lp_get_dfg(PS_matrix, duration_matrix, activities, activities_counter)[source]#

Resolves a LP problem to get a DFG

Parameters#

PS_matrix

Precede-succeed matrix

duration_matrix

Duration matrix

activities

List of activities of the log

activities_counter

Counter for the activities of the log

Returns#

dfg

Frequency DFG

performance_dfg

Performance DFG

pm4py.algo.discovery.correlation_mining.variants.trace_based.get_PS_duration_matrix(activities, trace_grouped_list, parameters=None)[source]#

Gets the precede-succeed matrix

Parameters#

activities

Activities

trace_grouped_list

Grouped list of simplified traces (per activity)

parameters

Parameters of the algorithm

Returns#

PS_matrix

precede-succeed matrix

duration_matrix

Duration matrix

pm4py.algo.discovery.correlation_mining.variants.trace_based.preprocess_log(log, activities=None, activities_counter=None, parameters=None)[source]#

Preprocess the log to get a grouped list of simplified traces (per activity)

Parameters#

log

Log object

activities

(if provided) activities of the log

activities_counter

(if provided) counter of the activities of the log

parameters

Parameters of the algorithm

Returns#

traces_list

List of simplified traces of the log

trace_grouped_list

Grouped list of simplified traces (per activity)

activities

Activities of the log

activities_counter

Activities counter

pm4py.algo.discovery.correlation_mining.variants.trace_based.get_precede_succeed_matrix(activities, trace_grouped_list, timestamp_key, start_timestamp_key)[source]#

Calculates the precede succeed matrix

Parameters#

activities

Sorted list of activities of the log

trace_grouped_list

A list of lists of lists, containing for each trace and each activity the events having such activity

timestamp_key

The key to be used as timestamp

start_timestamp_key

The key to be used as start timestamp

Returns#

mat

The precede succeed matrix

pm4py.algo.discovery.correlation_mining.variants.trace_based.get_duration_matrix(activities, trace_grouped_list, timestamp_key, start_timestamp_key)[source]#

Calculates the duration matrix

Parameters#

activities

Sorted list of activities of the log

trace_grouped_list

A list of lists of lists, containing for each trace and each activity the events having such activity

timestamp_key

The key to be used as timestamp

start_timestamp_key

The key to be used as start timestamp

Returns#

mat

The duration matrix