Finite Set
$\mathbf{X} = \{a_1,a_2,a_3,...,a_n\}$
Power set of X is set of all possible subset of X
$\mathcal{P}(\mathbf{X}) = \{ \{\}, \{a_1\}, \{a_2\}, \{a_c\}, \{a_1,a_2\} ..., \{a_1,a_2,a_3,..,a_n\} \}$
Cardinality : $|\mathcal{P}(\mathbf{X})| = 2^{|X|}$
Multi set of $\mathbf{X}$ is
$\mathcal{B}(\mathbf{X}) = [a_1^{w1}, a_2^{w2}, ..., a_n^{wn}] \qquad$ where $\qquad w_i \in \mathbb{N}^+$
$\mathbf{\Sigma}$ : Set of all observed activities
Eg. $\mathbf{\Sigma} = \{a,b,c\}$
$\mathbf{\Sigma}^+$ : Set of all non-empty sequences over set $\mathbf{\Sigma}$
Eg. $\mathbf{\Sigma}^+ = \{ <a,b,c> , <a,c,b>, <c,b,a>, <b,a,c>, <a,b> ... ,<a>, <b>, <c> \}$
Each sequence in $\mathbf{\Sigma}^+$ is called a trace
$\sigma \in \mathbf{\Sigma}^+$
Event log (L) is a multiset of traces
Eg. $\mathbf{L} = \{ <a,b,c>^3, <b,a,c>^9 \} \subseteq \mathcal{B}(\mathbf{\Sigma}^+) $
Process discovery
$PD: \mathcal{B}(\mathbf{\Sigma}^+) \to \mathcal{M} \qquad$ where $\mathcal{M}$ is a set of all possible process models
Directed Bi-partite graph containing arcs (F) between two disjoint sets Places (P) and Transitions (T) with a labelling function (l), which is a partial map of transitions to activities. ( This is done to allow for multiple activities to have same labels ?)
Labelled Petrinet $\mathbf{N} = <\mathbf{P}, \mathbf{T}, \mathbf{F}, l>$
$ l : \mathbf{T} \to \mathbf{\Sigma} \bigcup \{ \tau \} \qquad$ where $\tau$ is silent transition
$\mathbf{P} \cap \mathbf{T} = \phi$
$\mathbf{F} \subseteq (\mathbf{P} \times \mathbf{T}) \cup (\mathbf{T} \cup \mathbf{P})$
Eg.
$l : \mathbf{T} \to \mathbf{\Sigma} = \{A,B,C,D\}$
$\mathbf{P} = \{S1,S2,S3,S4,S5,S6\} $
State of a petrinet is defined by marking (m), which is a multiset of Places
$m \in \mathcal{B}(\mathbf{P})$
Eg. $m = [S4, S2]$
Notations for Input and Output places of a transition
$\bullet t_A = \{S1\} \qquad$ is the set of input places for transition related to activity A
$t_A \bullet = \{S2, S4\} \qquad$ is the set of output places for transition related to activity A
A transition ($t \in \mathbf{T} $) is enabled for marking m in net N, if all its input places contain atleast one token. This is denoted as,
$(N,m)[t \rangle$
When a transition t fires, one token is removed from it's imput places and one token is added to it's output places. This leads to a new marking (m') :
$m \xrightarrow{\text{t}} m' = m - \bullet t + t \bullet$
Also denoted as : $(N,m)[t \rangle (N, m')$
Firing sequence is a sequence of transitions that lead from one marking to another.
$m \xrightarrow{\gamma} m'\qquad$ where $\gamma \in \mathbf{T^+}$
Also denoted as : $(N,m)[\gamma \rangle (N, m')$
Petrinet with an initial marking ($m_0$) and a final marking ($m_f$)
$ M : \quad \mathcal{B}(\mathbf{\Sigma}^+) \supseteq \quad \mathbf{L} \to (N, m_0, m_f ) \quad \in \mathcal{M}\qquad$ Where $N$ is the petrinet and L is the event log
Language of a system net (M) is a set of all visible traces( silent moves are not included) with valid firing sequence from initial to final marking
$\mathcal{L}(M) = \{ \quad \sigma = l(\gamma) \quad | \quad \sigma \in \mathbf{\Sigma}^+ \quad \gamma \in \mathbf{T}^+ \quad \text{s.t} \quad m_0 \xrightarrow{\gamma} m_f \quad \}$
where $l$ is the labelling function defined above and $\gamma$ is a firing sequence
$\mathcal{L}(M) \subseteq \mathbf{\Sigma}^+$
Set of all complete firing sequences of a system net (M) is denoted as
$\mathcal{T}(M) = \{ \quad \gamma \quad | \quad \gamma \in \mathbf{T}^+ \quad \text{s.t} \quad m_0 \xrightarrow{\gamma} m_f \quad \}$
Let M be a process model discovered by playing-in the event log L with observed set of activities $\mathbf{\Sigma}$,
$L \in \mathcal{B}(\mathbf{\Sigma}^+)$
$M : \quad L \to (N, m_0, m_f) \in \mathcal{M}$
$N = (P,T,F,l) \qquad$ where $l : T \to \mathbf{\Sigma} \bigcup \{ \tau \}$
Let A be the set of activities observed in new set of event traces from log $\tilde{L}$
$\tilde{L} \in \mathcal{B}(\mathbf{A}^+)$
Traces in $\tilde{L}$ are not always contained in the Language of the system net $\mathcal{L}(M)$.
Therefore, an alignment ($\Gamma$) is required, which maps every trace in $\tilde{L}$ to language of SN $\mathcal{L}(M)$
$\Gamma \quad : \quad \tilde{L} \ni \sigma_{\tilde{L}} \to \sigma_M \in \mathcal{L}(M)$
There can be more than one alignment in the model for a trace in the log. Set of all alignments for a trace is denoted as
$\Gamma_{\sigma_{L},M} = \{ \sigma_M \quad | \quad \forall \quad \sigma_{\tilde{L}} \to \sigma_M \quad \sigma_M \in range(\Gamma) \}$
Optimal Alignment¶
In order to find the optimal alignment, a cost is assigned to every legal move.
A move is denoted as
$(x,(y,t)) \qquad$ where $x \in \mathbf{A} \quad y \in \mathbf{\Sigma} \quad t \in \mathbf{T}$
A set of legal moves is denoted as
$ A_{LM} = $
$\{ (x,(x,t)) \ | \ x \in \mathbf{A} \ \cap \ t \in \mathbf{T} \ \cap \ l(t)=x\} \quad \bigcup \\ \{ (\gg,(x,t)) \ | \ t \in \mathbf{T} \ \cap \ l(t)=x \} \quad \bigcup \\ \{ (x,\gg) \quad | \quad x \in \mathbf{A} \}$
A cost of a legal move($\delta$) is denoted as
$\delta \quad : \quad A_{LM} \to \mathbb{N} $
An alignment ($\tilde{\sigma_{A}}$) can also be seen as a sequence of legal moves,
$\tilde{\sigma_A} \in A_{LM}^+$
Cost of an alignment ($C_M$) is the sum of cost of all the legal moves in sequence
$C_{M} : \quad A_{LM}^+ \ni \tilde{\sigma_A} \to \sum_{i} \delta_i \in \mathbb{R}$
Optimal alignment (${\sigma_{\tilde{L}}}^{*} \in \Gamma_{\sigma_{L},M}$) for a trace($\sigma_{\tilde{L}}$) in model M
$ \sigma_{\tilde{L}}^* = argmin_{\sigma_A} \{C_M (\tilde{\sigma_A}) \quad | \quad \forall \quad \sigma_A \in \Gamma_{\sigma_L , M} \quad \text{s.t} \quad \tilde{\sigma_A} \to \sigma_A = \tilde{\sigma_A} \upharpoonright_{\tilde{\sigma_A}[y] \ / \ \{\gg, \tau \}}\}$
[1] Tax, N., Sidorova, N. & van der Aalst, W.M.P. J Intell Inf Syst (2019) 52: 107. https://doi.org/10.1007/s10844-018-0507-6
[2] Aalst, van der, W. M. P. (2013). Decomposing Petri nets for process mining : a generic approach. Distributed and Parallel Databases, 31(4), 471-507. DOI: 10.1007/s10619-013-7127-5