pm4py.algo.discovery.dfg.adapters.pandas package

Submodules

pm4py.algo.discovery.dfg.adapters.pandas.df_statistics module

pm4py.algo.discovery.dfg.adapters.pandas.df_statistics.get_concurrent_events_dataframe(df, start_timestamp_key=None, timestamp_key='time:timestamp', case_id_glue='case:concept:name', activity_key='concept:name', sort_caseid_required=True, sort_timestamp_along_case_id=True, reduce_dataframe=True, max_start_column='@@max_start_column', min_complete_column='@@min_complete_column', diff_maxs_minc='@@diff_maxs_minc', strict=False)[source]

Gets the concurrent events (of the same case) in a Pandas dataframe

Parameters
  • df – Dataframe

  • start_timestamp_key – Start timestamp key (if not provided, defaulted to the timestamp_key)

  • timestamp_key – Complete timestamp

  • case_id_glue – Column of the dataframe to use as case ID

  • activity_key – Activity key

  • sort_caseid_required – Tells if a sort by case ID is required (default: True)

  • sort_timestamp_along_case_id – Tells if a sort by timestamp is required along the case ID (default: True)

  • reduce_dataframe – To fasten operation, keep only essential columns in the dataframe

  • strict – Gets only entries that are strictly concurrent (i.e. the length of the intersection as real interval is > 0)

Returns

Concurrent events dataframe (with @@diff_maxs_minc as the size of the intersection of the intervals)

Return type

conc_ev_dataframe

pm4py.algo.discovery.dfg.adapters.pandas.df_statistics.get_dfg_graph(df, measure='frequency', activity_key='concept:name', case_id_glue='case:concept:name', start_timestamp_key=None, timestamp_key='time:timestamp', perf_aggregation_key='mean', sort_caseid_required=True, sort_timestamp_along_case_id=True, keep_once_per_case=False, window=1)[source]

Get DFG graph from Pandas dataframe

Parameters
  • df – Dataframe

  • measure – Measure to use (frequency/performance/both)

  • activity_key – Activity key to use in the grouping

  • case_id_glue – Case ID identifier

  • start_timestamp_key – Start timestamp key

  • timestamp_key – Timestamp key

  • perf_aggregation_key – Performance aggregation key (mean, median, min, max)

  • sort_caseid_required – Specify if a sort on the Case ID is required

  • sort_timestamp_along_case_id – Specifying if sorting by timestamp along the CaseID is required

  • keep_once_per_case – In the counts, keep only one occurrence of the path per case (the first)

  • window – Window of the DFG (default 1)

Returns

DFG in the chosen measure (may be only the frequency, only the performance, or both)

Return type

dfg

pm4py.algo.discovery.dfg.adapters.pandas.df_statistics.get_partial_order_dataframe(df, start_timestamp_key=None, timestamp_key='time:timestamp', case_id_glue='case:concept:name', activity_key='concept:name', sort_caseid_required=True, sort_timestamp_along_case_id=True, reduce_dataframe=True, keep_first_following=True)[source]

Gets the partial order between events (of the same case) in a Pandas dataframe

Parameters
  • df – Dataframe

  • start_timestamp_key – Start timestamp key (if not provided, defaulted to the timestamp_key)

  • timestamp_key – Complete timestamp

  • case_id_glue – Column of the dataframe to use as case ID

  • activity_key – Activity key

  • sort_caseid_required – Tells if a sort by case ID is required (default: True)

  • sort_timestamp_along_case_id – Tells if a sort by timestamp is required along the case ID (default: True)

  • reduce_dataframe – To fasten operation, keep only essential columns in the dataframe

  • keep_first_following – Keep only the first event following the given event

Returns

Partial order dataframe (with @@flow_time between events)

Return type

part_ord_dataframe

pm4py.algo.discovery.dfg.adapters.pandas.freq_triples module

pm4py.algo.discovery.dfg.adapters.pandas.freq_triples.get_freq_triples(df, activity_key='concept:name', case_id_glue='case:concept:name', timestamp_key='time:timestamp', sort_caseid_required=True, sort_timestamp_along_case_id=True)[source]

Gets the frequency triples out of a dataframe

Parameters
  • df – Dataframe

  • activity_key – Activity key

  • case_id_glue – Case ID glue

  • timestamp_key – Timestamp key

  • sort_caseid_required – Determine if sort by case ID is required (default: True)

  • sort_timestamp_along_case_id – Determine if sort by timestamp is required (default: True)

Returns

Frequency triples from the dataframe

Return type

freq_triples

Module contents