pm4py.objects.log.util package#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

Submodules#

pm4py.objects.log.util.activities_to_alphabet module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.objects.log.util.activities_to_alphabet.Parameters(value)[source]#

Bases: Enum

An enumeration.

ACTIVITY_KEY = 'activity_key'#
RETURN_MAPPING = 'return_mapping'#
pm4py.objects.log.util.activities_to_alphabet.apply(dataframe: DataFrame, parameters: Optional[Dict[Any, Any]] = None) Union[DataFrame, Tuple[DataFrame, Dict[str, str]]][source]#

Remap the activities in a dataframe using an augmented alphabet to minimize the size of the encoding

Running example:

import pm4py from pm4py.objects.log.util import activities_to_alphabet from pm4py.util import constants

dataframe = pm4py.read_xes(“tests/input_data/running-example.xes”) renamed_dataframe = activities_to_alphabet.apply(dataframe, parameters={constants.PARAMETER_CONSTANT_ACTIVITY_KEY: “concept:name”}) print(renamed_dataframe)

Parameters#

dataframe

Pandas dataframe

parameters

Parameters of the method, including: - Parameters.ACTIVITY_KEY => attribute to be used as activity - Parameters.RETURN_MAPPING => (boolean) enables the returning the mapping dictionary (so the original activities can be re-constructed)

Returns#

ren_dataframe

Pandas dataframe in which the activities have been remapped to the (augmented) alphabet

inv_mapping

(if required) Dictionary associating to every letter of the (augmented) alphabet the original activity

pm4py.objects.log.util.artificial module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.objects.log.util.artificial.Parameters(value)[source]#

Bases: Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'#
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
PARAM_ARTIFICIAL_START_ACTIVITY = 'pm4py:param:art_start_act'#
PARAM_ARTIFICIAL_END_ACTIVITY = 'pm4py:param:art_end_act'#
pm4py.objects.log.util.artificial.insert_artificial_start_end(log: EventLog, parameters: Optional[Dict[Any, Any]] = None) EventLog[source]#

Inserts the artificial start/end activities in an event log

Parameters#

log

Event log

parameters

Parameters of the algorithm, including: - Parameters.ACTIVITY_KEY: the activity - Parameters.TIMESTAMP_KEY: the timestamp

Returns#

log

Enriched log

pm4py.objects.log.util.basic_filter module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.objects.log.util.basic_filter.Parameters(value)[source]#

Bases: Enum

An enumeration.

ATTRIBUTE_KEY = 'pm4py:param:attribute_key'#
POSITIVE = 'positive'#
pm4py.objects.log.util.basic_filter.filter_log_events_attr(log, values, parameters=None)[source]#

Filter log by keeping only events with an attribute value that belongs to the provided values list

Parameters#

log

log

values

Allowed attributes

parameters
Parameters of the algorithm, including:

activity_key -> Attribute identifying the activity in the log positive -> Indicate if events should be kept/removed

Returns#

filtered_log

Filtered log

pm4py.objects.log.util.basic_filter.filter_log_traces_attr(log, values, parameters=None)[source]#

Filter log by keeping only traces that has/has not events with an attribute value that belongs to the provided values list

Parameters#

log

Trace log

values

Allowed attributes

parameters
Parameters of the algorithm, including:

activity_key -> Attribute identifying the activity in the log positive -> Indicate if events should be kept/removed

Returns#

filtered_log

Filtered log

pm4py.objects.log.util.dataframe_utils module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.objects.log.util.dataframe_utils.Parameters(value)[source]#

Bases: Enum

An enumeration.

PARTITION_COLUMN = 'partition_column'#
CASE_ID_KEY = 'pm4py:param:case_id_key'#
CASE_PREFIX = 'case:'#
CASE_ATTRIBUTES = 'case_attributes'#
MANDATORY_ATTRIBUTES = 'mandatory_attributes'#
MAX_NO_CASES = 'max_no_cases'#
MIN_DIFFERENT_OCC_STR_ATTR = 5#
MAX_DIFFERENT_OCC_STR_ATTR = 50#
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
ACTIVITY_KEY = 'pm4py:param:activity_key'#
PARAM_ARTIFICIAL_START_ACTIVITY = 'pm4py:param:art_start_act'#
PARAM_ARTIFICIAL_END_ACTIVITY = 'pm4py:param:art_end_act'#
INDEX_KEY = 'index_key'#
CASE_INDEX_KEY = 'case_index_key'#
USE_EXTREMES_TIMESTAMP = 'use_extremes_timestamp'#
ADD_CASE_IDENTIFIER_COLUMN = 'add_case_identifier_column'#
pm4py.objects.log.util.dataframe_utils.insert_partitioning(df, num_partitions, parameters=None)[source]#

Insert the partitioning in the specified dataframe

Parameters#

df

Dataframe

num_partitions

Number of partitions

parameters

Parameters of the algorithm

Returns#

df

Partitioned dataframe

pm4py.objects.log.util.dataframe_utils.legacy_parquet_support(df, parameters=None)[source]#

For legacy support, Parquet files columns could not contain a “:” that has been arbitrarily replaced by a replacer string. This string substitutes the replacer to the :

Parameters#

dataframe

Dataframe

parameters

Parameters of the algorithm

pm4py.objects.log.util.dataframe_utils.table_to_stream(table, parameters=None)[source]#

Converts a Pyarrow table to an event stream

Parameters#

table

Pyarrow table

parameters

Possible parameters of the algorithm

pm4py.objects.log.util.dataframe_utils.table_to_log(table, parameters=None)[source]#

Converts a Pyarrow table to an event log

Parameters#

table

Pyarrow table

parameters

Possible parameters of the algorithm

pm4py.objects.log.util.dataframe_utils.convert_timestamp_columns_in_df(df, timest_format=None, timest_columns=None)[source]#

Convert all dataframe columns in a dataframe

Parameters#

df

Dataframe

timest_format

(If provided) Format of the timestamp columns in the CSV file

timest_columns

Columns of the CSV that shall be converted into timestamp

Returns#

df

Dataframe with timestamp columns converted

pm4py.objects.log.util.dataframe_utils.sample_dataframe(df, parameters=None)[source]#

Sample a dataframe on a given number of cases

Parameters#

df

Dataframe

parameters

Parameters of the algorithm, including: - Parameters.CASE_ID_KEY - Parameters.CASE_ID_TO_RETAIN

Returns#

sampled_df

Sampled dataframe

pm4py.objects.log.util.dataframe_utils.automatic_feature_selection_df(df, parameters=None)[source]#

Performs an automatic feature selection on dataframes, keeping the features useful for ML purposes

Parameters#

df

Dataframe

parameters

Parameters of the algorithm

Returns#

featured_df

Dataframe with only the features that have been selected

pm4py.objects.log.util.dataframe_utils.select_number_column(df: DataFrame, fea_df: DataFrame, col: str, case_id_key='case:concept:name') DataFrame[source]#

Extract a column for the features dataframe for the given numeric attribute

Parameters#

df

Dataframe

fea_df

Feature dataframe

col

Numeric column

case_id_key

Case ID key

Returns#

fea_df

Feature dataframe (desidered output)

pm4py.objects.log.util.dataframe_utils.select_string_column(df: DataFrame, fea_df: DataFrame, col: str, case_id_key='case:concept:name') DataFrame[source]#

Extract N columns (for N different attribute values; hotencoding) for the features dataframe for the given string attribute

Parameters#

df

Dataframe

fea_df

Feature dataframe

col

String column

case_id_key

Case ID key

Returns#

fea_df

Feature dataframe (desidered output)

pm4py.objects.log.util.dataframe_utils.get_features_df(df: DataFrame, list_columns: List[str], parameters: Optional[Dict[Any, Any]] = None) DataFrame[source]#

Given a dataframe and a list of columns, performs an automatic feature extraction

Parameters#

df

Dataframe

list_column

List of column to consider in the feature extraction

parameters

Parameters of the algorithm, including: - Parameters.CASE_ID_KEY: the case ID

Returns#

fea_df

Feature dataframe (desidered output)

pm4py.objects.log.util.dataframe_utils.automatic_feature_extraction_df(df: DataFrame, parameters: Optional[Dict[Any, Any]] = None) DataFrame[source]#

Performs an automatic feature extraction given a dataframe

Parameters#

df

Dataframe

parameters

Parameters of the algorithm, including: - Parameters.CASE_ID_KEY: the case ID - Parameters.MIN_DIFFERENT_OCC_STR_ATTR - Parameters.MAX_DIFFERENT_OCC_STR_ATTR

Returns#

fea_df

Dataframe with the features

pm4py.objects.log.util.dataframe_utils.insert_artificial_start_end(df0: DataFrame, parameters: Optional[Dict[Any, Any]] = None) DataFrame[source]#

Inserts the artificial start/end activities in a Pandas dataframe

Parameters#

df0

Dataframe

parameters

Parameters of the algorithm, including: - Parameters.CASE_ID_KEY: the case identifier - Parameters.TIMESTAMP_KEY: the timestamp - Parameters.ACTIVITY_KEY: the activity

Returns#

enriched_df

Dataframe with artificial start/end activities

pm4py.objects.log.util.dataframe_utils.dataframe_to_activity_case_table(df: DataFrame, parameters: Optional[Dict[Any, Any]] = None)[source]#

Transforms a Pandas dataframe into: - an “activity” table, containing the events and their attributes - a “case” table, containing the cases and their attributes

Parameters#

df

Dataframe

parameters

Parameters of the algorithm that should be used, including: - Parameters.CASE_ID_KEY => the column to be used as case ID (shall be included both in the activity table and the case table) - Parameters.CASE_PREFIX => if a list of attributes at the case level is not provided, then all the ones of the dataframe

starting with one of these are considered.

  • Parameters.CASE_ATTRIBUTES => the attributes of the dataframe to be used as case columns

Returns#

activity_table

Activity table

case_table

Case table

pm4py.objects.log.util.filtering_utils module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.filtering_utils.keep_one_trace_per_variant(log, parameters=None)[source]#

Keeps only one trace per variant (does not matter for basic inductive miner)

Parameters#

log

Log

parameters

Parameters of the algorithm

Returns#

new_log

Log (with one trace per variant)

pm4py.objects.log.util.filtering_utils.keep_only_one_attribute_per_event(log, attribute_key)[source]#

Keeps only one attribute per event

Parameters#

log

Event log

attribute_key

Attribute key

pm4py.objects.log.util.get_class_representation module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.get_class_representation.get_class_representation_by_str_ev_attr_value_presence(log, str_attr_name, str_attr_value)[source]#

Get the representation for the target part of the decision tree learning if the focus is on the presence of a given value of a (string) event attribute

Parameters#

log

Trace log

str_attr_name

Attribute name to consider

str_attr_value

Attribute value to consider

Returns#

target

Target part for decision tree learning

classes

Name of the classes, in order

pm4py.objects.log.util.get_class_representation.get_class_representation_by_str_ev_attr_value_value(log, str_attr_name)[source]#

Get the representation for the target part of the decision tree learning if the focus is on all (string) values of an event attribute

Parameters#

log

Trace log

str_attr_name

Attribute name to consider

Returns#

target

Target part for decision tree learning

classes

Name of the classes, in order

pm4py.objects.log.util.get_class_representation.get_class_representation_by_trace_duration(log, target_trace_duration, timestamp_key='time:timestamp', parameters=None)[source]#

Get class representation by splitting traces according to trace duration

Parameters#

log

Trace log

target_trace_duration

Target trace duration

timestamp_key

Timestamp key

Returns#

target

Target part for decision tree learning

classes

Name of the classes, in order

pm4py.objects.log.util.get_log_encoded module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.get_log_encoded.get_log_encoded(event_log, trace_attributes=[], event_attributes=[], concatenate=False)[source]#

Get event log encoded into matrix.

Parameters#

event_log

Trace log

trace_attributes

Attributes of the trace to be encoded

event_attributes

Attributes of the events to be encoded

concatenate

Boolean indicating if to generate all sub-sequences of events in a trace

Returns#

dataset

A numpy matrix with the event log

columns

The names of the columns in the dataset

pm4py.objects.log.util.get_prefixes module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.get_prefixes.get_prefixes_from_log(log: EventLog, length: int) EventLog[source]#

Gets the prefixes of a log of a given length

Parameters#

log

Event log

length

Length

Returns#

prefix_log

Log contain the prefixes: - if a trace has lower or identical length, it is included as-is - if a trace has greater length, it is cut

pm4py.objects.log.util.get_prefixes.get_log_with_log_prefixes(log, parameters=None)[source]#

Gets an extended log that contains, in order, all the prefixes for a case of the original log

Parameters#

log

Original log

parameters

Possible parameters of the algorithm

Returns#

all_prefixes_log

Log with all the prefixes

change_indexes

Indexes of the extended log where there was a change between cases

pm4py.objects.log.util.get_prefixes.get_log_traces_to_activities(log, activities, parameters=None)[source]#

Get sublogs taking to each one of the specified activities

Parameters#

log

Trace log object

activities

List of activities in the log

parameters
Possible parameters of the algorithm, including:

PARAMETER_CONSTANT_ACTIVITY_KEY -> activity PARAMETER_CONSTANT_TIMESTAMP_KEY -> timestamp

Returns#

list_logs

List of event logs taking to the first occurrence of each activity

considered_activities

All activities that are effectively have been inserted in the list of logs (in some of them, the resulting log may be empty)

pm4py.objects.log.util.get_prefixes.get_log_traces_until_activity(log, activity, parameters=None)[source]#

Gets a reduced version of the log containing, for each trace, only the events before a specified activity

Parameters#

log

Trace log

activity

Activity to reach

parameters
Possible parameters of the algorithm, including:

PARAMETER_CONSTANT_ACTIVITY_KEY -> activity PARAMETER_CONSTANT_TIMESTAMP_KEY -> timestamp

Returns#

new_log

New log

pm4py.objects.log.util.index_attribute module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.index_attribute.insert_event_index_as_event_attribute(stream, event_index_attr_name='@@eventindex')[source]#

Insert the current event index as event attribute

Parameters#

stream

Stream

event_index_attr_name

Attribute name given to the event index

pm4py.objects.log.util.index_attribute.insert_trace_index_as_event_attribute(log, trace_index_attr_name='@@traceindex')[source]#

Inserts the current trace index as event attribute (overrides previous values if needed)

Parameters#

log

Log

trace_index_attr_name

Attribute name given to the trace index

pm4py.objects.log.util.insert_classifier module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.insert_classifier.search_act_class_attr(log, force_activity_transition_insertion=False)[source]#

Search among classifiers expressed in the log one that is good for the process model extraction

Parameters#

log

Trace log

force_activity_transition_insertion

Optionally force the activitiy+transition classifier insertion

Returns#

log

Trace log (plus eventually one additional event attribute as the classifier)

pm4py.objects.log.util.insert_classifier.insert_activity_classifier_attribute(log, classifier, force_activity_transition_insertion=False)[source]#

Insert the specified classifier as additional event attribute in the log

Parameters#

log

Trace log

classifier

Event classifier

force_activity_transition_insertion

Optionally force the activitiy+transition classifier insertion

Returns#

log

Trace log (plus eventually one additional event attribute as the classifier)

classifier_attr_key

Attribute name of the attribute that contains the classifier value

pm4py.objects.log.util.insert_classifier.insert_trace_classifier_attribute(log, classifier)[source]#

Insert the specified classifier as additional trace attribute in the log

Parameter#

log

Trace log

classifier

Event classifier

Returns#

log

Trace log (plus eventually one additional event attribute as the classifier)

classifier_attr_key

Attribute name of the attribute that contains the classifier value

pm4py.objects.log.util.interval_lifecycle module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.objects.log.util.interval_lifecycle.Parameters(value)[source]#

Bases: Enum

An enumeration.

TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'#
TRANSITION_KEY = 'pm4py:param:transition_key'#
ACTIVITY_KEY = 'pm4py:param:activity_key'#
LIFECYCLE_INSTANCE_KEY = 'pm4py:param:lifecycle:instance:key'#
BUSINESS_HOURS = 'business_hours'#
BUSINESS_HOUR_SLOTS = 'business_hour_slots'#
WORKCALENDAR = 'workcalendar'#
pm4py.objects.log.util.interval_lifecycle.to_interval(log, parameters=None)[source]#

Converts a log to interval format (e.g. an event has two timestamps) from lifecycle format (an event has only a timestamp, and a transition lifecycle)

Parameters#

log

Log (expressed in the lifecycle format)

parameters

Possible parameters of the method (activity, timestamp key, start timestamp key, transition …)

Returns#

log

Interval event log

pm4py.objects.log.util.interval_lifecycle.to_lifecycle(log, parameters=None)[source]#

Converts a log from interval format (e.g. an event has two timestamps) to lifecycle format (an event has only a timestamp, and a transition lifecycle)

Parameters#

log

Log (expressed in the interval format)

parameters

Possible parameters of the method (activity, timestamp key, start timestamp key, transition …)

Returns#

log

Lifecycle event log

pm4py.objects.log.util.interval_lifecycle.assign_lead_cycle_time(log, parameters=None)[source]#

Assigns the lead and cycle time to an interval log

Parameters#

log

Interval log

parameters

Parameters of the algorithm, including: start_timestamp_key, timestamp_key, business_hour_slots

pm4py.objects.log.util.log module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.log.get_event_labels(event_log, key)[source]#

Fetches the labels present in a log, given a key to use within the events.

Parameters#

param event_log:

log to use

param key:

to use for event identification, can for example be “concept:name”

Returns#

return:

a list of labels

pm4py.objects.log.util.log.get_event_labels_counted(event_log, key)[source]#

Fetches the labels (and their frequency) present in a log, given a key to use within the events.

Parameters#

param event_log:

log to use

param key:

to use for event identification, can for example be “concept:name”

Returns#

return:

a list of labels

pm4py.objects.log.util.log.get_trace_variants(event_log, key='concept:name')[source]#

Returns a pair of a list of (variants, dict[index -> trace]) where the index of a variant maps to all traces describing that variant, with that key.

Parameters#

type key:

str

param event_log:

log

param key:

key to use to identify the label of an event

Returns#

return:

pm4py.objects.log.util.log.project_traces(event_log, keys='concept:name')[source]#

projects traces on a (set of) event attribute key(s). If the key provided is of type string, each trace is converted into a list of strings. If the key provided is a collection, each trace is converted into a list of (smaller) dicts of key value pairs

Parameters:
  • event_log

  • keys (str) –

Returns:

pm4py.objects.log.util.log.derive_and_lift_trace_attributes_from_event_attributes(trlog, ignore=None, retain_on_event_level=False, verbose=False)[source]#
pm4py.objects.log.util.log.add_artficial_start_and_end(event_log, start='[start>', end='[end]', activity_key='concept:name')[source]#

pm4py.objects.log.util.log_regex module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.log_regex.get_encoded_trace(trace, mapping, parameters=None)[source]#

Gets the encoding of the provided trace

Parameters#

trace

Trace of the event log

mapping

Mapping (activity to symbol)

Returns#

trace_str

Trace string

pm4py.objects.log.util.log_regex.get_encoded_log(log, mapping, parameters=None)[source]#

Gets the encoding of the provided log

Parameters#

log

Event log

mapping

Mapping (activity to symbol)

Returns#

list_str

List of encoded strings

pm4py.objects.log.util.log_regex.form_encoding_dictio_from_log(log, parameters=None)[source]#

Forms the encoding dictionary from the current log

Parameters#

log

Event log

parameters

Parameters of the algorithm

Returns#

encoding_dictio

Encoding dictionary

pm4py.objects.log.util.log_regex.form_encoding_dictio_from_two_logs(log1: EventLog, log2: EventLog, parameters=None) Dict[str, str][source]#

Forms the encoding dictionary from a couple of logs

Parameters#

log1

First log

log2

Second log

parameters

Parameters of the algorithm

Returns#

encoding_dictio

Encoding dictionary

pm4py.objects.log.util.move_attrs_to_trace module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.objects.log.util.move_attrs_to_trace.Parameters(value)[source]#

Bases: Enum

An enumeration.

ENABLE_DEEPCOPY = 'enable_deepcopy'#
pm4py.objects.log.util.move_attrs_to_trace.apply(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) EventLog[source]#

Moves the attributes that are constant for all the events of the trace, and they do not belong to a standard extension, to the trace level

Parameters#

log

Event log

parameters

Parameters of the algorithm, including: - Parameters.DEEPCOPY => enables the deepcopy of the event log

Returns#

log

Event log, where some attribute has been possibly moved from the event to the trace level

pm4py.objects.log.util.pandas_log_wrapper module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.objects.log.util.pandas_log_wrapper.Parameters(value)[source]#

Bases: Enum

An enumeration.

CASE_ID_KEY = 'pm4py:param:case_id_key'#
CASE_ATTRIBUTE_PREFIX = 'case:'#
class pm4py.objects.log.util.pandas_log_wrapper.PandasTraceWrapper(dataframe: DataFrame, parameters: Optional[Dict[Any, Any]] = None)[source]#

Bases: Sequence

class pm4py.objects.log.util.pandas_log_wrapper.PandasLogWrapper(dataframe: DataFrame, parameters: Optional[Dict[Any, Any]] = None)[source]#

Bases: Sequence

pm4py.objects.log.util.pandas_numpy_variants module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.objects.log.util.pandas_numpy_variants.Parameters(value)[source]#

Bases: Enum

An enumeration.

CASE_ID_KEY = 'pm4py:param:case_id_key'#
ACTIVITY_KEY = 'pm4py:param:activity_key'#
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
INDEX_KEY = 'index_key'#
pm4py.objects.log.util.pandas_numpy_variants.apply(dataframe: DataFrame, parameters=None) Tuple[Dict[Collection[str], int], Dict[str, Collection[str]]][source]#

Efficient method returning the variants from a Pandas dataframe (through Numpy)

Minimum viable example:

import pandas as pd import pm4py from pm4py.objects.log.util import pandas_numpy_variants

dataframe = pd.read_csv(‘tests/input_data/receipt.csv’) dataframe = pm4py.format_dataframe(dataframe) variants_dict, case_variant = pandas_numpy_variants.apply(dataframe)

Parameters#

dataframe

Dataframe

parameters

Parameters of the algorithm, including: - Parameters.CASE_ID_KEY => the case identifier - Parameters.ACTIVITY_KEY => the activity - Parameters.TIMESTAMP_KEY => the timestamp - Parameters.INDEX_KEY => the index

Returns#

variants_dict

Dictionary associating to each variant the number of occurrences in the dataframe

case_variant

Dictionary associating to each case identifier the corresponding variant

pm4py.objects.log.util.sampling module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.sampling.sample_stream(event_log, no_events=100)[source]#

Randomly sample a fixed number of events from the original event log

Parameters#

event_log

Event log

no_events

Number of events that the sample should have

Returns#

newLog

Filtered log

pm4py.objects.log.util.sampling.sample_log(log, no_traces=100)[source]#

Randomly sample a fixed number of traces from the original log

Parameters#

log

Log

no_traces

Number of traces that the sample should have

Returns#

newLog

Filtered log

pm4py.objects.log.util.sampling.sample(log, n=100)[source]#

Randomly sample a fixed number of traces from the original log

Parameters#

log

Trace/event log

n

Number of elements that the sample should have

Returns#

newLog

Filtered log

pm4py.objects.log.util.sorting module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.sorting.sort_timestamp_trace(trace, timestamp_key='time:timestamp', reverse_sort=False)[source]#

Sort a trace based on timestamp key

Parameters#

trace

Trace

timestamp_key

Timestamp key

reverse_sort

If true, reverses the direction in which the sort is done (ascending)

Returns#

trace

Sorted trace

pm4py.objects.log.util.sorting.sort_timestamp_stream(event_log, timestamp_key='time:timestamp', reverse_sort=False)[source]#

Sort an event log based on timestamp key

Parameters#

event_log

Event log

timestamp_key

Timestamp key

reverse_sort

If true, reverses the direction in which the sort is done (ascending)

Returns#

event_log

Sorted event log

pm4py.objects.log.util.sorting.sort_timestamp_log(event_log, timestamp_key='time:timestamp', reverse_sort=False)[source]#

Sort a log based on timestamp key

Parameters#

event_log

Log

timestamp_key

Timestamp key

reverse_sort

If true, reverses the direction in which the sort is done (ascending)

Returns#

log

Sorted log

pm4py.objects.log.util.sorting.sort_timestamp(log, timestamp_key='time:timestamp', reverse_sort=False)[source]#

Sort a log based on timestamp key

Parameters#

log

Trace/Event log

timestamp_key

Timestamp key

reverse_sort

If true, reverses the direction in which the sort is done (ascending)

Returns#

log

Sorted Trace/Event log

pm4py.objects.log.util.sorting.sort_lambda_log(event_log, sort_function, reverse=False)[source]#

Sort a log based on a lambda expression

Parameters#

event_log

Log

sort_function

Sort function

reverse

Boolean (sort by reverse order)

Returns#

new_log

Sorted log

pm4py.objects.log.util.sorting.sort_lambda_stream(event_log, sort_function, reverse=False)[source]#

Sort a stream based on a lambda expression

Parameters#

event_log

Stream

sort_function

Sort function

reverse

Boolean (sort by reverse order)

Returns#

stream

Sorted stream

pm4py.objects.log.util.sorting.sort_lambda(log, sort_function, reverse=False)[source]#

Sort a log based on lambda expression

Parameters#

log

Log

sort_function

Sort function

reverse

Boolean (sort by reverse order)

Returns#

log

Sorted log

pm4py.objects.log.util.split_train_test module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

pm4py.objects.log.util.split_train_test.split(log: EventLog, train_percentage: float = 0.8) Tuple[EventLog, EventLog][source]#

Split an event log in a training log and a test log (for machine learning purposes)

Parameters#

log

Event log

train_percentage

Fraction of traces to be included in the training log (from 0.0 to 1.0)

Returns#

training_log

Training event log

test_log

Test event log

pm4py.objects.log.util.xes module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.