pm4py.statistics.traces.generic.pandas package

Submodules

pm4py.statistics.traces.generic.pandas.case_arrival module

class pm4py.statistics.traces.generic.pandas.case_arrival.Parameters(value)[source]

Bases: enum.Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'
ATTRIBUTE_KEY = 'pm4py:param:attribute_key'
CASE_ID_KEY = 'case_id_glue'
KEEP_ONCE_PER_CASE = 'keep_once_per_case'
MAX_NO_POINTS_SAMPLE = 'max_no_of_points_to_sample'
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'
pm4py.statistics.traces.generic.pandas.case_arrival.get_case_arrival_avg(df, parameters=None)[source]

Gets the average time interlapsed between case starts

Parameters
  • df – Pandas dataframe

  • parameters

    Parameters of the algorithm, including:

    Parameters.TIMESTAMP_KEY -> attribute of the log to be used as timestamp

Returns

Average time interlapsed between case starts

Return type

case_arrival_avg

pm4py.statistics.traces.generic.pandas.case_statistics module

class pm4py.statistics.traces.generic.pandas.case_statistics.Parameters(value)[source]

Bases: enum.Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'
ATTRIBUTE_KEY = 'pm4py:param:attribute_key'
CASE_ID_KEY = 'case_id_glue'
ENABLE_SORT = 'enable_sort'
MAX_RET_CASES = 'max_ret_cases'
MAX_VARIANTS_TO_RETURN = 'max_variants_to_return'
SORT_ASCENDING = 'sort_ascending'
SORT_BY_COLUMN = 'sort_by_column'
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'
VARIANTS_DF = 'variants_df'
pm4py.statistics.traces.generic.pandas.case_statistics.get_cases_description(df, parameters=None)[source]

Get a description of traces present in the Pandas dataframe

Parameters
  • df – Pandas dataframe

  • parameters

    Parameters of the algorithm, including:

    Parameters.CASE_ID_KEY -> Column that identifies the case ID Parameters.TIMESTAMP_KEY -> Column that identifies the timestamp enable_sort -> Enable sorting of traces Parameters.SORT_BY_COLUMN -> Sort traces inside the dataframe using the specified column. Admitted values: startTime, endTime, caseDuration Parameters.SORT_ASCENDING -> Set sort direction (boolean; it true then the sort direction is ascending, otherwise descending) Parameters.MAX_RET_CASES -> Set the maximum number of returned traces

Returns

Dictionary of traces associated to their start timestamp, their end timestamp and their duration

Return type

ret

pm4py.statistics.traces.generic.pandas.case_statistics.get_events(df, case_id, parameters=None)[source]

Get events belonging to the specified case

Parameters
  • df – Pandas dataframe

  • case_id – Required case ID

  • parameters

    Possible parameters of the algorithm, including:

    Parameters.CASE_ID_KEY -> Column in which the case ID is contained

Returns

List of events belonging to the case

Return type

list_eve

pm4py.statistics.traces.generic.pandas.case_statistics.get_kde_caseduration(df, parameters=None)[source]

Gets the estimation of KDE density for the case durations calculated on the dataframe

Parameters
  • df – Pandas dataframe

  • parameters

    Possible parameters of the algorithm, including:

    Parameters.GRAPH_POINTS -> number of points to include in the graph Parameters.CASE_ID_KEY -> Column hosting the Case ID

Returns

  • x – X-axis values to represent

  • y – Y-axis values to represent

pm4py.statistics.traces.generic.pandas.case_statistics.get_kde_caseduration_json(df, parameters=None)[source]

Gets the estimation of KDE density for the case durations calculated on the log/dataframe (expressed as JSON)

Parameters
  • df – Pandas dataframe

  • parameters

    Possible parameters of the algorithm, including:

    Parameters.GRAPH_POINTS -> number of points to include in the graph Parameters.CASE_ID_KEY -> Column hosting the Case ID

Returns

JSON representing the graph points

Return type

json

pm4py.statistics.traces.generic.pandas.case_statistics.get_variant_statistics(df, parameters=None)[source]

Get variants from a Pandas dataframe

Parameters
  • df – Dataframe

  • parameters

    Parameters of the algorithm, including:

    Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity Parameters.MAX_VARIANTS_TO_RETURN -> Maximum number of variants to return variants_df -> If provided, avoid recalculation of the variants dataframe

Returns

List of variants inside the Pandas dataframe

Return type

variants_list

pm4py.statistics.traces.generic.pandas.case_statistics.get_variants_df(df, parameters=None)[source]

Get variants dataframe from a Pandas dataframe

Parameters
  • df – Dataframe

  • parameters

    Parameters of the algorithm, including:

    Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity

Returns

Variants dataframe

Return type

variants_df

pm4py.statistics.traces.generic.pandas.case_statistics.get_variants_df_and_list(df, parameters=None)[source]

(Technical method) Provides variants_df and variants_list out of the box

Parameters
  • df – Dataframe

  • parameters

    Parameters of the algorithm, including:

    Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity

Returns

  • variants_df – Variants dataframe

  • variants_list – List of variants sorted by their count

pm4py.statistics.traces.generic.pandas.case_statistics.get_variants_df_with_case_duration(df, parameters=None)[source]

Get variants dataframe from a Pandas dataframe, with case duration that is included

Parameters
  • df – Dataframe

  • parameters

    Parameters of the algorithm, including:

    Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity Parameters.TIMESTAMP_KEY -> Column that contains the timestamp

Returns

Variants dataframe

Return type

variants_df

Module contents