pm4py.statistics.attributes.pandas package

Submodules

pm4py.statistics.attributes.pandas.get module

class pm4py.statistics.attributes.pandas.get.Parameters(value)[source]

Bases: enum.Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'
ATTRIBUTE_KEY = 'pm4py:param:attribute_key'
CASE_ID_KEY = 'case_id_glue'
KEEP_ONCE_PER_CASE = 'keep_once_per_case'
MAX_NO_POINTS_SAMPLE = 'max_no_of_points_to_sample'
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'
pm4py.statistics.attributes.pandas.get.get_attribute_values(df, attribute_key, parameters=None)[source]

Return list of attribute values contained in the specified column of the CSV

Parameters
  • df – Pandas dataframe

  • attribute_key – Attribute for which we want to known the values and the count

  • parameters – Possible parameters of the algorithm

Returns

Attributes in the specified column, along with their count

Return type

attributes_values_dict

pm4py.statistics.attributes.pandas.get.get_events_distribution(df: pandas.core.frame.DataFrame, distr_type: str = 'days_month', parameters: Optional[Dict[str, Any]] = None) → Tuple[List[str], List[int]][source]

Gets the distribution of the events in the specified dimension

Parameters
  • df – Dataframe

  • distr_type – Type of distribution: - days_month => Gets the distribution of the events among the days of a month (from 1 to 31) - months => Gets the distribution of the events among the months (from 1 to 12) - years => Gets the distribution of the events among the years of the event log - hours => Gets the distribution of the events among the hours of a day (from 0 to 23) - days_week => Gets the distribution of the events among the days of a week (from Monday to Sunday)

  • parameters – Parameters of the algorithm, including: - Parameters.TIMESTAMP_KEY

Returns

  • x – Points (of the X-axis)

  • y – Points (of the Y-axis)

pm4py.statistics.attributes.pandas.get.get_kde_date_attribute(df, attribute='time:timestamp', parameters=None)[source]

Gets the KDE estimation for the distribution of a date attribute values

Parameters
  • df – Pandas dataframe

  • attribute – Date attribute to analyse

  • parameters

    Possible parameters of the algorithm, including:

    graph_points -> number of points to include in the graph

Returns

  • x – X-axis values to represent

  • y – Y-axis values to represent

pm4py.statistics.attributes.pandas.get.get_kde_date_attribute_json(df, attribute='time:timestamp', parameters=None)[source]

Gets the KDE estimation for the distribution of a date attribute values (expressed as JSON)

Parameters
  • df – Pandas dataframe

  • attribute – Date attribute to analyse

  • parameters

    Possible parameters of the algorithm, including:

    graph_points -> number of points to include in the graph

Returns

JSON representing the graph points

Return type

json

pm4py.statistics.attributes.pandas.get.get_kde_numeric_attribute(df, attribute, parameters=None)[source]

Gets the KDE estimation for the distribution of a numeric attribute values

Parameters
  • df – Pandas dataframe

  • attribute – Numeric attribute to analyse

  • parameters

    Possible parameters of the algorithm, including:

    graph_points -> number of points to include in the graph

Returns

  • x – X-axis values to represent

  • y – Y-axis values to represent

pm4py.statistics.attributes.pandas.get.get_kde_numeric_attribute_json(df, attribute, parameters=None)[source]

Gets the KDE estimation for the distribution of a numeric attribute values (expressed as JSON)

Parameters
  • df – Pandas dataframe

  • attribute – Numeric attribute to analyse

  • parameters

    Possible parameters of the algorithm, including:

    graph_points -> number of points to include in the graph

Returns

JSON representing the graph points

Return type

json

Module contents