- pm4py.discovery.discover_batches(log: Union[EventLog, DataFrame], merge_distance: int = 900, min_batch_size: int = 2, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource') List[Tuple[Tuple[str, str], int, Dict[str, Any]]] [source]#
Discover batches from the provided log object
We say that an activity is executed in batches by a given resource when the resource executes several times the same activity in a short period of time.
Identifying such activities may identify points of the process that can be automated, since the activity of the person may be repetitive.
The following categories of batches are detected: - Simultaneous (all the events in the batch have identical start and end timestamps) - Batching at start (all the events in the batch have identical start timestamp) - Batching at end (all the events in the batch have identical end timestamp) - Sequential batching (for all the consecutive events, the end of the first is equal to the start of the second) - Concurrent batching (for all the consecutive events that are not sequentially matched)
The approach has been described in the following paper: Martin, N., Swennen, M., Depaire, B., Jans, M., Caris, A., & Vanhoof, K. (2015, December). Batch Processing: Definition and Event Log Identification. In SIMPDA (pp. 137-140).
- The output is a (sorted) list containing tuples. Each tuple contain:
Index 0: the activity-resource for which at least one batch has been detected
Index 1: the number of batches for the given activity-resource
- Index 2: a list containing all the batches. Each batch is described by:
# The start timestamp of the batch # The complete timestamp of the batch # The list of events that are executed in the batch
log – event log / Pandas dataframe
int) – the maximum time distance between non-overlapping intervals in order for them to be considered belonging to the same batch (default: 15*60 15 minutes)
int) – the minimum number of events for a batch to be considered (default: 2)
str) – attribute to be used for the activity
str) – attribute to be used for the timestamp
str) – attribute to be used as case identifier
str) – attribute to be used as resource
- Return type:
List[Tuple[Tuple[str, str], int, Dict[str, Any]]]
import pm4py batches = pm4py.discover_log_skeleton(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp', resource_key='org:resource')