pm4py.utils.format_dataframe(df: DataFrame, case_id: str = 'case:concept:name', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', start_timestamp_key: str = 'start_timestamp', timest_format: Optional[str] = None) DataFrame[source]#

Give the appropriate format on the dataframe, for process mining purposes

  • df (DataFrame) – Dataframe

  • case_id (str) – Case identifier column

  • activity_key (str) – Activity column

  • timestamp_key (str) – Timestamp column

  • start_timestamp_key (str) – Start timestamp column

  • timest_format – Timestamp format that is provided to Pandas

Return type:


import pandas as pd
import pm4py

dataframe = pd.read_csv('event_log.csv')
dataframe = pm4py.format_dataframe(dataframe, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp', start_timestamp_key='start_timestamp', timest_format='%Y-%m-%d %H:%M:%S')