pm4py.algo.transformation.log_to_features.variants package#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

Submodules#

pm4py.algo.transformation.log_to_features.variants.event_based module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.algo.transformation.log_to_features.variants.event_based.Parameters(value)[source]#

Bases: Enum

An enumeration.

STR_EVENT_ATTRIBUTES = 'str_ev_attr'#
NUM_EVENT_ATTRIBUTES = 'num_ev_attr'#
FEATURE_NAMES = 'feature_names'#
MIN_NUM_DIFF_STR_VALUES = 'min_num_diff_str_values'#
MAX_NUM_DIFF_STR_VALUES = 'max_num_diff_str_values'#
pm4py.algo.transformation.log_to_features.variants.event_based.extract_all_ev_features_names_from_log(log: EventLog, str_ev_attr: List[str], num_ev_attr: List[str], parameters: Optional[Dict[Union[str, Parameters], Any]] = None) List[str][source]#

Extracts the feature names from an event log.

Parameters#

log

Event log

str_ev_attr

(if provided) list of string event attributes to consider in extracting the feature names

num_ev_attr

(if provided) list of integer event attributes to consider in extracting the feature names

parameters
Parameters, including:
  • MIN_NUM_DIFF_STR_VALUES => minimum number of distinct values to include an attribute as feature(s)

  • MAX_NUM_DIFF_STR_VALUES => maximum number of distinct values to include an attribute as feature(s)

Returns#

feature_names

List of feature names

pm4py.algo.transformation.log_to_features.variants.event_based.extract_features(log: EventLog, feature_names: List[str], parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Extracts the matrix of the features from an event log

Parameters#

log

Event log

feature_names

Features to consider (in the given order)

Returns#

data

Data to provide for decision tree learning

feature_names

Names of the features, in order

pm4py.algo.transformation.log_to_features.variants.event_based.apply(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Extracts all the features for the traces of an event log (each trace becomes a vector of vectors, where each event has its own vector)

Parameters#

log

Event log

parameters
Parameters of the algorithm, including:
  • STR_EVENT_ATTRIBUTES => string event attributes to consider in the features extraction

  • NUM_EVENT_ATTRIBUTES => numeric event attributes to consider in the features extraction

  • FEATURE_NAMES => features to consider (in the given order)

Returns#

data

Data to provide for decision tree learning

feature_names

Names of the features, in order

pm4py.algo.transformation.log_to_features.variants.temporal module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.algo.transformation.log_to_features.variants.temporal.Parameters(value)[source]#

Bases: Enum

An enumeration.

ARRIVAL_RATE = 'arrival_rate'#
FINISH_RATE = 'finish_rate'#
CASE_ID_COLUMN = 'pm4py:param:case_id_key'#
START_TIMESTAMP_COLUMN = 'pm4py:param:start_timestamp_key'#
TIMESTAMP_COLUMN = 'pm4py:param:timestamp_key'#
RESOURCE_COLUMN = 'pm4py:param:resource_key'#
ACTIVITY_COLUMN = 'pm4py:param:activity_key'#
GROUPER_FREQ = 'grouper_freq'#
SERVICE_TIME = 'service_time'#
WAITING_TIME = 'waiting_time'#
SOJOURN_TIME = 'sojourn_time'#
DIFF_START_END = 'diff_start_end'#
pm4py.algo.transformation.log_to_features.variants.temporal.apply(log: Union[EventLog, EventStream, DataFrame], parameters: Optional[Dict[Any, Any]] = None) DataFrame[source]#

Extracts temporal features with the provided granularity from the Pandas dataframe.

Implements the approach described in the paper: Pourbafrani, Mahsa, Sebastiaan J. van Zelst, and Wil MP van der Aalst. “Supporting automatic system dynamics model generation for simulation in the context of process mining.” International Conference on Business Information Systems. Springer, Cham, 2020.

Parameters#

log

Event log / Event stream / Pandas dataframe

parameters

Parameters of the algorithm, including: - Parameters.GROUPER_FREQ => the time interval to be used for the grouping - Parameters.ARRIVAL_RATE => column of the dataframe which is going to host the arrival rate - Parameters.FINISH_RATE => column of the dataframe which is going to host the finishing rate - Parameters.SERVICE_TIME => column of the dataframe which is going to host the service time - Parameters.WAITING_TIME => column of the dataframe which is going to host the waiting time - Parameters.SOJOURN_TIME => column of the dataframe which is going to host the sojourn time - Parameters.CASE_ID_COLUMN => case ID column in the dataframe (default: case:concept:name) - Parameters.ACTIVITY_COLUMN => activity column in the dataframe (default: concept:name) - Parameters.TIMESTAMP_COLUMN => timestamp column in the dataframe (default: time:timestamp) - Parameters.RESOURCE_COLUMN => resource column in the dataframe (default: org:resource) - Parameters.START_TIMESTAMP_COLUMN => start timestamp column in the dataframe (if not provided, the timestamp column is used)

Returns#

features_df

Dataframe with temporal features

pm4py.algo.transformation.log_to_features.variants.trace_based module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.algo.transformation.log_to_features.variants.trace_based.Parameters(value)[source]#

Bases: Enum

An enumeration.

ENABLE_ACTIVITY_DEF_REPRESENTATION = 'enable_activity_def_representation'#
ENABLE_SUCC_DEF_REPRESENTATION = 'enable_succ_def_representation'#
STR_TRACE_ATTRIBUTES = 'str_tr_attr'#
STR_EVENT_ATTRIBUTES = 'str_ev_attr'#
NUM_TRACE_ATTRIBUTES = 'num_tr_attr'#
NUM_EVENT_ATTRIBUTES = 'num_ev_attr'#
STR_EVSUCC_ATTRIBUTES = 'str_evsucc_attr'#
FEATURE_NAMES = 'feature_names'#
ACTIVITY_KEY = 'pm4py:param:activity_key'#
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'#
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
CASE_ID_KEY = 'pm4py:param:case_id_key'#
RESOURCE_KEY = 'pm4py:param:resource_key'#
EPSILON = 'epsilon'#
DEFAULT_NOT_PRESENT = 'default_not_present'#
ENABLE_ALL_EXTRA_FEATURES = 'enable_all_extra_features'#
ENABLE_CASE_DURATION = 'enable_case_duration'#
ADD_CASE_IDENTIFIER_COLUMN = 'add_case_identifier_column'#
ENABLE_TIMES_FROM_FIRST_OCCURRENCE = 'enable_times_from_first_occurrence'#
ENABLE_TIMES_FROM_LAST_OCCURRENCE = 'enable_times_from_last_occurrence'#
ENABLE_DIRECT_PATHS_TIMES_LAST_OCC = 'enable_direct_paths_times_last_occ'#
ENABLE_INDIRECT_PATHS_TIMES_LAST_OCC = 'enable_indirect_paths_times_last_occ'#
ENABLE_WORK_IN_PROGRESS = 'enable_work_in_progress'#
ENABLE_RESOURCE_WORKLOAD = 'enable_resource_workload'#
ENABLE_FIRST_LAST_ACTIVITY_INDEX = 'enable_first_last_activity_index'#
ENABLE_MAX_CONCURRENT_EVENTS = 'enable_max_concurrent_events'#
ENABLE_MAX_CONCURRENT_EVENTS_PER_ACTIVITY = 'enable_max_concurrent_events_per_activity'#
CASE_ATTRIBUTE_PREFIX = 'case:'#
pm4py.algo.transformation.log_to_features.variants.trace_based.max_concurrent_events(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Counts for every trace the maximum number of events (of any activity) that happen concurrently (e.g., their time intervals [st1, ct1] and [st2, ct2] have non-empty intersection).

Parameters#

log

Event log

parameters

Parameters of the algorithm

Returns#

data

Numeric value of the features

feature_names

Names of the features

pm4py.algo.transformation.log_to_features.variants.trace_based.max_concurrent_events_per_activity(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Counts for every trace and every activity the maximum number of events of the given activity that happen concurrently (e.g., their time intervals [st1, ct1] and [st2, ct2] have non-empty intersection).

Parameters#

log

Event log

parameters

Parameters of the algorithm

Returns#

data

Numeric value of the features

feature_names

Names of the features

pm4py.algo.transformation.log_to_features.variants.trace_based.resource_workload(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Calculates for each case, and for each resource of the log, the workload of the resource during the lead time of a case. Defaults if a resource is not contained in a case.

Parameters#

log

Event log

parameters

Parameters of the algorithm

Returns#

data

Numeric value of the features

feature_names

Names of the features

pm4py.algo.transformation.log_to_features.variants.trace_based.work_in_progress(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Calculates for each case, and for each resource of the log, the number of cases which are open during the lead time of the case.

Parameters#

log

Event log

parameters

Parameters of the algorithm

Returns#

data

Numeric value of the features

feature_names

Names of the features

pm4py.algo.transformation.log_to_features.variants.trace_based.indirect_paths_times_last_occ(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Calculates for each case, and for each indirect path of the case, the difference between the start timestamp of the later event and the completion timestamp of the first event. Defaults if a path is not present in a case.

Parameters#

log

Event log

parameters

Parameters of the algorithm

Returns#

data

Numeric value of the features

feature_names

Names of the features

pm4py.algo.transformation.log_to_features.variants.trace_based.direct_paths_times_last_occ(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Calculates for each case, and for each direct path of the case, the difference between the start timestamp of the later event and the completion timestamp of the first event. Defaults if a path is not present in a case.

Parameters#

log

Event log

parameters

Parameters of the algorithm

Returns#

data

Numeric value of the features

feature_names

Names of the features

pm4py.algo.transformation.log_to_features.variants.trace_based.times_from_first_occurrence_activity_case(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Calculates for each case, and for each activity, the times from the start to the case, and to the end of the case, from the first occurrence of the activity in the case.

Parameters#

log

Event log

parameters

Parameters of the algorithm

Returns#

data

Numeric value of the features

feature_names

Names of the features

pm4py.algo.transformation.log_to_features.variants.trace_based.times_from_last_occurrence_activity_case(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Calculates for each case, and for each activity, the times from the start to the case, and to the end of the case, from the last occurrence of the activity in the case.

Parameters#

log

Event log

parameters

Parameters of the algorithm

Returns#

data

Numeric value of the features

feature_names

Names of the features

pm4py.algo.transformation.log_to_features.variants.trace_based.first_last_activity_index_trace(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Consider as features the first and the last index of an activity inside a case

Parameters#

log

Event log

parameters

Parameters, including: - Parameters.ACTIVITY_KEY => the attribute to use as activity - Parameters.DEFAULT_NOT_PRESENT => the replacement value for activities that are not present for the specific case

Returns#

data

Numeric value of the features

feature_names

Names of the features

pm4py.algo.transformation.log_to_features.variants.trace_based.case_duration(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Calculates for each case, the case duration (and adds it as a feature)

Parameters#

log

Event log

parameters

Parameters of the algorithm

Returns#

data

Numeric value of the features

feature_names

Names of the features

pm4py.algo.transformation.log_to_features.variants.trace_based.get_string_trace_attribute_rep(trace: Trace, trace_attribute: str) str[source]#

Get a representation of the feature name associated to a string trace attribute value

Parameters#

trace

Trace of the log

trace_attribute

Attribute of the trace to consider

Returns#

rep

Representation of the feature name associated to a string trace attribute value

pm4py.algo.transformation.log_to_features.variants.trace_based.get_all_string_trace_attribute_values(log: EventLog, trace_attribute: str) List[str][source]#

Get all string trace attribute values representations for a log

Parameters#

log

Trace log

trace_attribute

Attribute of the trace to consider

Returns#

list

List containing for each trace a representation of the feature name associated to the attribute

pm4py.algo.transformation.log_to_features.variants.trace_based.get_string_event_attribute_rep(event: Event, event_attribute: str) str[source]#

Get a representation of the feature name associated to a string event attribute value

Parameters#

event

Single event of a trace

event_attribute

Event attribute to consider

Returns#

rep

Representation of the feature name associated to a string event attribute value

pm4py.algo.transformation.log_to_features.variants.trace_based.get_values_event_attribute_for_trace(trace: Trace, event_attribute: str) Set[str][source]#

Get all the representations for the events of a trace associated to a string event attribute values

Parameters#

trace

Trace of the log

event_attribute

Event attribute to consider

Returns#

values

All feature names present for the given attribute in the given trace

pm4py.algo.transformation.log_to_features.variants.trace_based.get_all_string_event_attribute_values(log: EventLog, event_attribute: str) List[str][source]#

Get all the representations for all the traces of the log associated to a string event attribute values

Parameters#

log

Trace of the log

event_attribute

Event attribute to consider

Returns#

values

All feature names present for the given attribute in the given log

pm4py.algo.transformation.log_to_features.variants.trace_based.get_string_event_attribute_succession_rep(event1: Event, event2: Event, event_attribute: str) str[source]#

Get a representation of the feature name associated to a string event attribute value

Parameters#

event1

First event of the succession

event2

Second event of the succession

event_attribute

Event attribute to consider

Returns#

rep

Representation of the feature name associated to a string event attribute value

pm4py.algo.transformation.log_to_features.variants.trace_based.get_values_event_attribute_succession_for_trace(trace: Trace, event_attribute: str) Set[str][source]#

Get all the representations for the events of a trace associated to a string event attribute succession values

Parameters#

trace

Trace of the log

event_attribute

Event attribute to consider

Returns#

values

All feature names present for the given attribute succession in the given trace

pm4py.algo.transformation.log_to_features.variants.trace_based.get_all_string_event_succession_attribute_values(log: EventLog, event_attribute: str) List[str][source]#

Get all the representations for all the traces of the log associated to a string event attribute succession values

Parameters#

log

Trace of the log

event_attribute

Event attribute to consider

Returns#

values

All feature names present for the given attribute succession in the given log

pm4py.algo.transformation.log_to_features.variants.trace_based.get_numeric_trace_attribute_rep(trace_attribute: str) str[source]#

Get the feature name associated to a numeric trace attribute

Parameters#

trace_attribute

Name of the trace attribute

Returns#

feature_name

Name of the feature

pm4py.algo.transformation.log_to_features.variants.trace_based.get_numeric_trace_attribute_value(trace: Trace, trace_attribute: str) Union[int, float][source]#

Get the value of a numeric trace attribute from a given trace

Parameters#

trace

Trace of the log

Returns#

value

Value of the numeric trace attribute for the given trace

pm4py.algo.transformation.log_to_features.variants.trace_based.get_numeric_event_attribute_rep(event_attribute: str) str[source]#

Get the feature name associated to a numeric event attribute

Parameters#

event_attribute

Name of the event attribute

Returns#

feature_name

Name of the feature

pm4py.algo.transformation.log_to_features.variants.trace_based.get_numeric_event_attribute_value(event: Event, event_attribute: str) Union[int, float][source]#

Get the value of a numeric event attribute from a given event

Parameters#

event

Event

Returns#

value

Value of the numeric event attribute for the given event

pm4py.algo.transformation.log_to_features.variants.trace_based.get_numeric_event_attribute_value_trace(trace: Trace, event_attribute: str) Union[int, float][source]#

Get the value of the last occurrence of a numeric event attribute given a trace

Parameters#

trace

Trace of the log

Returns#

value

Value of the last occurrence of a numeric trace attribute for the given trace

pm4py.algo.transformation.log_to_features.variants.trace_based.get_default_representation_with_attribute_names(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None, feature_names: Optional[List[str]] = None) Tuple[Any, List[str], List[str], List[str], List[str], List[str]][source]#

Gets the default data representation of an event log (for process tree building) returning also the attribute names

Parameters#

log

Trace log

parameters

Possible parameters of the algorithm

feature_names

(If provided) Feature to use in the representation of the log

Returns#

data

Data to provide for decision tree learning

feature_names

Names of the features, in order

pm4py.algo.transformation.log_to_features.variants.trace_based.get_default_representation(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None, feature_names: Optional[List[str]] = None) Tuple[Any, List[str]][source]#

Gets the default data representation of an event log (for process tree building)

Parameters#

log

Trace log

parameters

Possible parameters of the algorithm

feature_names

(If provided) Feature to use in the representation of the log

Returns#

data

Data to provide for decision tree learning

feature_names

Names of the features, in order

pm4py.algo.transformation.log_to_features.variants.trace_based.get_representation(log: EventLog, str_tr_attr: List[str], str_ev_attr: List[str], num_tr_attr: List[str], num_ev_attr: List[str], str_evsucc_attr: Optional[List[str]] = None, feature_names: Optional[List[str]] = None) Tuple[Any, List[str]][source]#

Get a representation of the event log that is suited for the data part of the decision tree learning

NOTE: this function only encodes the last value seen for each attribute

Parameters#

log

Trace log

str_tr_attr

List of string trace attributes to consider in data vector creation

str_ev_attr

List of string event attributes to consider in data vector creation

num_tr_attr

List of numeric trace attributes to consider in data vector creation

num_ev_attr

List of numeric event attributes to consider in data vector creation

str_evsucc_attr

List of attributes succession of values to consider in data vector creation

feature_names

(If provided) Feature to use in the representation of the log

Returns#

data

Data to provide for decision tree learning

feature_names

Names of the features, in order

pm4py.algo.transformation.log_to_features.variants.trace_based.apply(log: EventLog, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[Any, List[str]][source]#

Extract the features from an event log (a vector for each trace)

Parameters#

log

Log

parameters

Parameters of the algorithm, including: - STR_TRACE_ATTRIBUTES => string trace attributes to consider in the features extraction - STR_EVENT_ATTRIBUTES => string event attributes to consider in the features extraction - NUM_TRACE_ATTRIBUTES => numeric trace attributes to consider in the features extraction - NUM_EVENT_ATTRIBUTES => numeric event attributes to consider in the features extraction - STR_EVSUCC_ATTRIBUTES => succession of event attributes to consider in the features extraction - FEATURE_NAMES => features to consider (in the given order) - ENABLE_ALL_EXTRA_FEATURES => enables all the extra features - ENABLE_CASE_DURATION => enables the case duration as additional feature - ENABLE_TIMES_FROM_FIRST_OCCURRENCE => enables the addition of the times from start of the case, to the end of the case, from the first occurrence of an activity of a case - ADD_CASE_IDENTIFIER_COLUMN => adds the case identifier (string) as column of the feature table (default: False) - ENABLE_TIMES_FROM_LAST_OCCURRENCE => enables the addition of the times from start of the case, to the end of the case, from the last occurrence of an activity of a case - ENABLE_DIRECT_PATHS_TIMES_LAST_OCC => add the duration of the last occurrence of a directed (i, i+1) path in the case as feature - ENABLE_INDIRECT_PATHS_TIMES_LAST_OCC => add the duration of the last occurrence of an indirect (i, j) path in the case as feature - ENABLE_WORK_IN_PROGRESS => enables the work in progress (number of concurrent cases) as a feature - ENABLE_RESOURCE_WORKLOAD => enables the resource workload as a feature - ENABLE_FIRST_LAST_ACTIVITY_INDEX => enables the insertion of the indexes of the activities as features - ENABLE_MAX_CONCURRENT_EVENTS => enables the count of the number of concurrent events inside a case - ENABLE_MAX_CONCURRENT_EVENTS_PER_ACTIVITY => enables the count of the number of concurrent events per activity

Returns#

data

Data to provide for decision tree learning

feature_names

Names of the features, in order