pm4py.algo.discovery.correlation_mining.variants package

Submodules

pm4py.algo.discovery.correlation_mining.variants.classic module

class pm4py.algo.discovery.correlation_mining.variants.classic.Parameters(value)[source]

Bases: enum.Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'
EXACT_TIME_MATCHING = 'exact_time_matching'
INDEX_KEY = 'index_key'
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'
pm4py.algo.discovery.correlation_mining.variants.classic.apply(log, parameters=None)[source]

Apply the correlation miner to an event stream (other types of logs are converted to that)

The approach is described in: Pourmirza, Shaya, Remco Dijkman, and Paul Grefen. “Correlation miner: mining business process models and event correlations without case identifiers.” International Journal of Cooperative Information Systems 26.02 (2017): 1742002.

Parameters
  • log – Log object

  • parameters – Parameters of the algorithm

Returns

  • dfg – DFG

  • performance_dfg – Performance DFG (containing the estimated performance for the arcs)

pm4py.algo.discovery.correlation_mining.variants.classic.get_PS_dur_matrix(activities_grouped, activities, parameters=None)[source]

Combined methods to get the two matrixes

Parameters
  • activities_grouped – Grouped activities

  • activities – List of activities of the log

  • parameters – Parameters of the algorithm

Returns

  • PS_matrix – Precede-succeed matrix

  • duration_matrix – Duration matrix

pm4py.algo.discovery.correlation_mining.variants.classic.get_duration_matrix(activities, activities_grouped, timestamp_key, start_timestamp_key, exact=False)[source]

Calculates the duration matrix

Parameters
  • activities – Ordered list of activities of the log

  • activities_grouped – Grouped list of activities

  • timestamp_key – Timestamp key

  • start_timestamp_key – Start timestamp key (events start)

  • exact – Performs an exact matching of the times (True/False)

Returns

Duration matrix

Return type

duration_matrix

pm4py.algo.discovery.correlation_mining.variants.classic.get_precede_succeed_matrix(activities, activities_grouped, timestamp_key, start_timestamp_key)[source]

Calculates the precede succeed matrix

Parameters
  • activities – Ordered list of activities of the log

  • activities_grouped – Grouped list of activities

  • timestamp_key – Timestamp key

  • start_timestamp_key – Start timestamp key (events start)

Returns

Precede succeed matrix

Return type

precede_succeed_matrix

pm4py.algo.discovery.correlation_mining.variants.classic.preprocess_log(log, activities=None, parameters=None)[source]

Preprocess a log to enable correlation mining

Parameters
  • log – Log object

  • activities – (if provided) list of activities of the log

  • parameters – Parameters of the algorithm

Returns

  • transf_stream – Transformed stream

  • activities_grouped – Grouped activities

  • activities – List of activities of the log

pm4py.algo.discovery.correlation_mining.variants.classic.resolve_lp_get_dfg(PS_matrix, duration_matrix, activities, activities_counter)[source]

Resolves a LP problem to get a DFG

Parameters
  • PS_matrix – Precede-succeed matrix

  • duration_matrix – Duration matrix

  • activities – List of activities of the log

  • activities_counter – Counter of the activities

Returns

  • dfg – DFG

  • performance_dfg – Performance DFG (containing the estimated performance for the arcs)

pm4py.algo.discovery.correlation_mining.variants.classic_split module

class pm4py.algo.discovery.correlation_mining.variants.classic_split.Parameters(value)[source]

Bases: enum.Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'
SAMPLE_SIZE = 'sample_size'
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'
pm4py.algo.discovery.correlation_mining.variants.classic_split.apply(log, parameters=None)[source]

Applies the correlation miner (splits the log in smaller chunks)

Parameters
  • log – Log object

  • parameters – Parameters of the algorithm

Returns

  • dfg – Frequency DFG

  • performance_dfg – Performance DFG

pm4py.algo.discovery.correlation_mining.variants.trace_based module

class pm4py.algo.discovery.correlation_mining.variants.trace_based.Parameters(value)[source]

Bases: enum.Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'
CASE_ID_KEY = 'case_id_glue'
INDEX_KEY = 'index_key'
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'
pm4py.algo.discovery.correlation_mining.variants.trace_based.apply(log, parameters=None)[source]

Novel approach of correlation mining, that creates the PS-matrix and the duration matrix using the order list of events of each trace of the log

Parameters
  • log – Event log

  • parameters – Parameters

Returns

  • dfg – DFG

  • performance_dfg – Performance DFG (containing the estimated performance for the arcs)

pm4py.algo.discovery.correlation_mining.variants.trace_based.get_PS_duration_matrix(activities, trace_grouped_list, parameters=None)[source]

Gets the precede-succeed matrix

Parameters
  • activities – Activities

  • trace_grouped_list – Grouped list of simplified traces (per activity)

  • parameters – Parameters of the algorithm

Returns

  • PS_matrix – precede-succeed matrix

  • duration_matrix – Duration matrix

pm4py.algo.discovery.correlation_mining.variants.trace_based.get_duration_matrix(activities, trace_grouped_list, timestamp_key, start_timestamp_key)[source]

Calculates the duration matrix

Parameters
  • activities – Sorted list of activities of the log

  • trace_grouped_list – A list of lists of lists, containing for each trace and each activity the events having such activity

  • timestamp_key – The key to be used as timestamp

  • start_timestamp_key – The key to be used as start timestamp

Returns

The duration matrix

Return type

mat

pm4py.algo.discovery.correlation_mining.variants.trace_based.get_precede_succeed_matrix(activities, trace_grouped_list, timestamp_key, start_timestamp_key)[source]

Calculates the precede succeed matrix

Parameters
  • activities – Sorted list of activities of the log

  • trace_grouped_list – A list of lists of lists, containing for each trace and each activity the events having such activity

  • timestamp_key – The key to be used as timestamp

  • start_timestamp_key – The key to be used as start timestamp

Returns

The precede succeed matrix

Return type

mat

pm4py.algo.discovery.correlation_mining.variants.trace_based.preprocess_log(log, activities=None, activities_counter=None, parameters=None)[source]

Preprocess the log to get a grouped list of simplified traces (per activity)

Parameters
  • log – Log object

  • activities – (if provided) activities of the log

  • activities_counter – (if provided) counter of the activities of the log

  • parameters – Parameters of the algorithm

Returns

  • traces_list – List of simplified traces of the log

  • trace_grouped_list – Grouped list of simplified traces (per activity)

  • activities – Activities of the log

  • activities_counter – Activities counter

pm4py.algo.discovery.correlation_mining.variants.trace_based.resolve_lp_get_dfg(PS_matrix, duration_matrix, activities, activities_counter)[source]

Resolves a LP problem to get a DFG

Parameters
  • PS_matrix – Precede-succeed matrix

  • duration_matrix – Duration matrix

  • activities – List of activities of the log

  • activities_counter – Counter for the activities of the log

Returns

  • dfg – Frequency DFG

  • performance_dfg – Performance DFG

Module contents