pm4py.statistics.attributes.pandas package#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

Submodules#

pm4py.statistics.attributes.pandas.get module#

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.statistics.attributes.pandas.get.Parameters(value)[source]#

Bases: Enum

An enumeration.

ATTRIBUTE_KEY = 'pm4py:param:attribute_key'#
ACTIVITY_KEY = 'pm4py:param:activity_key'#
START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'#
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
CASE_ID_KEY = 'pm4py:param:case_id_key'#
MAX_NO_POINTS_SAMPLE = 'max_no_of_points_to_sample'#
KEEP_ONCE_PER_CASE = 'keep_once_per_case'#
pm4py.statistics.attributes.pandas.get.get_events_distribution(df: DataFrame, distr_type: str = 'days_month', parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Tuple[List[str], List[int]][source]#

Gets the distribution of the events in the specified dimension

Parameters#

df

Dataframe

distr_type

Type of distribution: - days_month => Gets the distribution of the events among the days of a month (from 1 to 31) - months => Gets the distribution of the events among the months (from 1 to 12) - years => Gets the distribution of the events among the years of the event log - hours => Gets the distribution of the events among the hours of a day (from 0 to 23) - days_week => Gets the distribution of the events among the days of a week (from Monday to Sunday) - weeks => Distribution of the events among the weeks of a year (from 0 to 52)

parameters

Parameters of the algorithm, including: - Parameters.TIMESTAMP_KEY

Returns#

x

Points (of the X-axis)

y

Points (of the Y-axis)

pm4py.statistics.attributes.pandas.get.get_attribute_values(df: DataFrame, attribute_key: str, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Dict[Any, int][source]#

Return list of attribute values contained in the specified column of the CSV

Parameters#

df

Pandas dataframe

attribute_key

Attribute for which we want to known the values and the count

parameters

Possible parameters of the algorithm

Returns#

attributes_values_dict

Attributes in the specified column, along with their count

pm4py.statistics.attributes.pandas.get.get_kde_numeric_attribute(df: DataFrame, attribute: str, parameters: Optional[Dict[Union[str, Parameters], Any]] = None) Dict[Any, int][source]#

Gets the KDE estimation for the distribution of a numeric attribute values

Parameters#

df

Pandas dataframe

attribute

Numeric attribute to analyse

parameters
Possible parameters of the algorithm, including:

graph_points -> number of points to include in the graph

Returns#

x

X-axis values to represent

y

Y-axis values to represent

pm4py.statistics.attributes.pandas.get.get_kde_numeric_attribute_json(df, attribute, parameters=None)[source]#

Gets the KDE estimation for the distribution of a numeric attribute values (expressed as JSON)

Parameters#

df

Pandas dataframe

attribute

Numeric attribute to analyse

parameters
Possible parameters of the algorithm, including:

graph_points -> number of points to include in the graph

Returns#

json

JSON representing the graph points

pm4py.statistics.attributes.pandas.get.get_kde_date_attribute(df, attribute='time:timestamp', parameters=None)[source]#

Gets the KDE estimation for the distribution of a date attribute values

Parameters#

df

Pandas dataframe

attribute

Date attribute to analyse

parameters
Possible parameters of the algorithm, including:

graph_points -> number of points to include in the graph

Returns#

x

X-axis values to represent

y

Y-axis values to represent

pm4py.statistics.attributes.pandas.get.get_kde_date_attribute_json(df, attribute='time:timestamp', parameters=None)[source]#

Gets the KDE estimation for the distribution of a date attribute values (expressed as JSON)

Parameters#

df

Pandas dataframe

attribute

Date attribute to analyse

parameters
Possible parameters of the algorithm, including:

graph_points -> number of points to include in the graph

Returns#

json

JSON representing the graph points