Set Notations review

Finite Set

$\mathbf{X} = \{a_1,a_2,a_3,...,a_n\}$

Power set of X is set of all possible subset of X

$\mathcal{P}(\mathbf{X}) = \{ \{\}, \{a_1\}, \{a_2\}, \{a_c\}, \{a_1,a_2\} ..., \{a_1,a_2,a_3,..,a_n\} \}$

Cardinality : $|\mathcal{P}(\mathbf{X})| = 2^{|X|}$

Multi set of $\mathbf{X}$ is

$\mathcal{B}(\mathbf{X}) = [a_1^{w1}, a_2^{w2}, ..., a_n^{wn}] \qquad$ where $\qquad w_i \in \mathbb{N}^+$

Process mining Formalism

$\mathbf{\Sigma}$ : Set of all observed activities

Eg. $\mathbf{\Sigma} = \{a,b,c\}$

$\mathbf{\Sigma}^+$ : Set of all non-empty sequences over set $\mathbf{\Sigma}$

Eg. $\mathbf{\Sigma}^+ = \{ <a,b,c> , <a,c,b>, <c,b,a>, <b,a,c>, <a,b> ... ,<a>, <b>, <c> \}$

Each sequence in $\mathbf{\Sigma}^+$ is called a trace

$\sigma \in \mathbf{\Sigma}^+$

Event log (L) is a multiset of traces

Eg. $\mathbf{L} = \{ <a,b,c>^3, <b,a,c>^9 \} \subseteq \mathcal{B}(\mathbf{\Sigma}^+) $

Process discovery

$PD: \mathcal{B}(\mathbf{\Sigma}^+) \to \mathcal{M} \qquad$ where $\mathcal{M}$ is a set of all possible process models

Labelled Petri Nets

Directed Bi-partite graph containing arcs (F) between two disjoint sets Places (P) and Transitions (T) with a labelling function (l), which is a partial map of transitions to activities. ( This is done to allow for multiple activities to have same labels ?)

Labelled Petrinet $\mathbf{N} = <\mathbf{P}, \mathbf{T}, \mathbf{F}, l>$

$ l : \mathbf{T} \to \mathbf{\Sigma} \bigcup \{ \tau \} \qquad$ where $\tau$ is silent transition

$\mathbf{P} \cap \mathbf{T} = \phi$

$\mathbf{F} \subseteq (\mathbf{P} \times \mathbf{T}) \cup (\mathbf{T} \cup \mathbf{P})$

Eg.

$l : \mathbf{T} \to \mathbf{\Sigma} = \{A,B,C,D\}$

$\mathbf{P} = \{S1,S2,S3,S4,S5,S6\} $

State of the Petri Net

State of a petrinet is defined by marking (m), which is a multiset of Places

$m \in \mathcal{B}(\mathbf{P})$

Eg. $m = [S4, S2]$

Notations for Input and Output places of a transition

$\bullet t_A = \{S1\} \qquad$ is the set of input places for transition related to activity A

$t_A \bullet = \{S2, S4\} \qquad$ is the set of output places for transition related to activity A

A transition ($t \in \mathbf{T} $) is enabled for marking m in net N, if all its input places contain atleast one token. This is denoted as,

$(N,m)[t \rangle$

When a transition t fires, one token is removed from it's imput places and one token is added to it's output places. This leads to a new marking (m') :

$m \xrightarrow{\text{t}} m' = m - \bullet t + t \bullet$

Also denoted as : $(N,m)[t \rangle (N, m')$

Firing sequence is a sequence of transitions that lead from one marking to another.

$m \xrightarrow{\gamma} m'\qquad$ where $\gamma \in \mathbf{T^+}$

Also denoted as : $(N,m)[\gamma \rangle (N, m')$

General Petrinet Process model - System Net

Petrinet with an initial marking ($m_0$) and a final marking ($m_f$)

$ M : \quad \mathcal{B}(\mathbf{\Sigma}^+) \supseteq \quad \mathbf{L} \to (N, m_0, m_f ) \quad \in \mathcal{M}\qquad$ Where $N$ is the petrinet and L is the event log

Language of a system net (M) is a set of all visible traces( silent moves are not included) with valid firing sequence from initial to final marking

$\mathcal{L}(M) = \{ \quad \sigma = l(\gamma) \quad | \quad \sigma \in \mathbf{\Sigma}^+ \quad \gamma \in \mathbf{T}^+ \quad \text{s.t} \quad m_0 \xrightarrow{\gamma} m_f \quad \}$

where $l$ is the labelling function defined above and $\gamma$ is a firing sequence

$\mathcal{L}(M) \subseteq \mathbf{\Sigma}^+$

Set of all complete firing sequences of a system net (M) is denoted as

$\mathcal{T}(M) = \{ \quad \gamma \quad | \quad \gamma \in \mathbf{T}^+ \quad \text{s.t} \quad m_0 \xrightarrow{\gamma} m_f \quad \}$

Alignment formalism

Let M be a process model discovered by playing-in the event log L with observed set of activities $\mathbf{\Sigma}$,

$L \in \mathcal{B}(\mathbf{\Sigma}^+)$

$M : \quad L \to (N, m_0, m_f) \in \mathcal{M}$

$N = (P,T,F,l) \qquad$ where $l : T \to \mathbf{\Sigma} \bigcup \{ \tau \}$

Let A be the set of activities observed in new set of event traces from log $\tilde{L}$

$\tilde{L} \in \mathcal{B}(\mathbf{A}^+)$

Traces in $\tilde{L}$ are not always contained in the Language of the system net $\mathcal{L}(M)$.

Therefore, an alignment ($\Gamma$) is required, which maps every trace in $\tilde{L}$ to language of SN $\mathcal{L}(M)$

$\Gamma \quad : \quad \tilde{L} \ni \sigma_{\tilde{L}} \to \sigma_M \in \mathcal{L}(M)$

There can be more than one alignment in the model for a trace in the log. Set of all alignments for a trace is denoted as

$\Gamma_{\sigma_{L},M} = \{ \sigma_M \quad | \quad \forall \quad \sigma_{\tilde{L}} \to \sigma_M \quad \sigma_M \in range(\Gamma) \}$

Optimal Alignment

In order to find the optimal alignment, a cost is assigned to every legal move.

A move is denoted as

$(x,(y,t)) \qquad$ where $x \in \mathbf{A} \quad y \in \mathbf{\Sigma} \quad t \in \mathbf{T}$

A set of legal moves is denoted as

$ A_{LM} = $

$\{ (x,(x,t)) \ | \ x \in \mathbf{A} \ \cap \ t \in \mathbf{T} \ \cap \ l(t)=x\} \quad \bigcup \\ \{ (\gg,(x,t)) \ | \ t \in \mathbf{T} \ \cap \ l(t)=x \} \quad \bigcup \\ \{ (x,\gg) \quad | \quad x \in \mathbf{A} \}$

A cost of a legal move($\delta$) is denoted as

$\delta \quad : \quad A_{LM} \to \mathbb{N} $

An alignment ($\tilde{\sigma_{A}}$) can also be seen as a sequence of legal moves,

$\tilde{\sigma_A} \in A_{LM}^+$

Cost of an alignment ($C_M$) is the sum of cost of all the legal moves in sequence

$C_{M} : \quad A_{LM}^+ \ni \tilde{\sigma_A} \to \sum_{i} \delta_i \in \mathbb{R}$

Optimal alignment (${\sigma_{\tilde{L}}}^{*} \in \Gamma_{\sigma_{L},M}$) for a trace($\sigma_{\tilde{L}}$) in model M

$ \sigma_{\tilde{L}}^* = argmin_{\sigma_A} \{C_M (\tilde{\sigma_A}) \quad | \quad \forall \quad \sigma_A \in \Gamma_{\sigma_L , M} \quad \text{s.t} \quad \tilde{\sigma_A} \to \sigma_A = \tilde{\sigma_A} \upharpoonright_{\tilde{\sigma_A}[y] \ / \ \{\gg, \tau \}}\}$

References

[1] Tax, N., Sidorova, N. & van der Aalst, W.M.P. J Intell Inf Syst (2019) 52: 107. https://doi.org/10.1007/s10844-018-0507-6

[2] Aalst, van der, W. M. P. (2013). Decomposing Petri nets for process mining : a generic approach. Distributed and Parallel Databases, 31(4), 471-507. DOI: 10.1007/s10619-013-7127-5