The third and remaining drawback in Hidden Markov Model is the Decoding Drawback. On this article we’ll implement Viterbi Algorithm in Hidden Markov Model using Python and R. Viterbi Algorithm is dynamic programming and computationally very efficient. We’ll start with the formal definition of the Decoding Drawback, then go through the answer and lastly implement it. This is the 4th a part of the Introduction to Hidden Markov Model tutorial collection. This one may be the better one to comply with alongside.

We’ve got discovered concerning the three issues of HMM. We went by way of the Evaluation and Learning Drawback in element including implementation using Python and R in my earlier article. In case you need a refresh your reminiscences, please refer my earlier articles.

Derivation and implementation of Baum Welch Algorithm for Hidden Markov Model

Given a sequence of visible symbol (V^T) and the mannequin ( ( theta rightarrow A, B ) ) discover probably the most possible sequence of hidden states (S^T).

Typically we might attempt to discover all the totally different situations of hidden states for the given sequence of seen symbols and then determine probably the most possible one. Nevertheless, identical to we now have seen earlier, will probably be an exponentially complicated drawback ( O(N^T . T) ) to unravel.

We shall be using a way more environment friendly algorithm named Viterbi Algorithm to unravel the decoding drawback. Up to now in HMM we went deep into deriving equations for all of the algorithms in order to know them clearly. Nevertheless Viterbi Algorithm is greatest understood using an analytical example quite than equations. I will present the mathematical definition of the algorithm first, then will work on a selected example.

## Probabilistic View:

The decoding drawback is just like the Forward Algorithm. In Forward Algorithm we compute the probability of the statement sequence, given the hidden sequences by summing over all the possibilities, nevertheless in decoding drawback we need to find probably the most possible hidden state in every iteration of t.

The following equation represents the very best chance along a single path for first t observations which ends at state i.

[[[[omega _i(t)= max_s_1,…,s_T-1 p(s_1,s_2….s_T=i, v_1,v_2 … v_T | theta)

]

We will use the identical strategy because the Ahead Algorithm to calculate ( omega _i(+1) )

[[[[omega _i(t+1)= max_i Massive( omega _i(t) a_ij b_jk v(t+1) Massive)

]

Now to seek out the sequence of hidden states we need to determine the state that maximizes ( omega _i(t) ) at every time step t.

[[[[arg max_t omega(t)

]

Once we complete the above steps for all the observations, we’ll first find the last hidden state by maximum probability, then using backpointer to backtrack the almost definitely hidden path.

The whole lot what I stated above might not make a whole lot of sense now. Go through the example under and then come back to learn this half. I hope it should undoubtedly be easier to know upon getting the intuition.

## Instance:

Our instance shall be similar one used in during programming, the place we now have two hidden states A,B and three visible symbols 1,2,3. Assume we’ve a sequence of 6 seen symbols and the mannequin ( theta ). We need to predict the sequence of the hidden states for the visible symbols.

If we draw the trellis diagram, it should appear to be the fig 1. Observe, right here ( S_1 = A) and ( S_2 = B).

As said earlier, we need to discover out for each time step t and every hidden state what would be the most possible subsequent hidden state.

Assume when t = 2, the chance of transitioning to ( S_2(2) ) from ( S_1(1) ) is larger than transitioning to ( S_1(2) ), so we maintain monitor of this. That is highlighted by the purple arrow from ( S_1(1) ) to ( S_2(2) ) in the under diagram. The other path is in grey dashed line, which is not required now.

Like clever, we repeat the identical for every hidden state. In different words, assuming that at t=1 if ( S_2(1) ) was the hidden state and at t=2 the chance of transitioning to ( S_1(2) ) from ( S_2(1) ) is larger, hence its highlighted in purple.

We will repeat the identical course of for all of the remaining observations. The trellis diagram will seem like following.

The output of the above process is to have the sequences of probably the most possible states (1) [below diagram] and the corresponding chances (2). So as we go through discovering most probable state (1) for each time step, we may have an 2×5 matrix ( in common M x (T-1) ) as under:

The primary number 2 in above diagram indicates that current hidden step 1 (because it’s in 1st row) transitioned from earlier hidden step 2.

Let’s take another instance, the two in the 2nd row 2nd col signifies that the present step 2 ( since it’s in 2nd row) transitioned from earlier hidden step 2. Should you refer fig 1, you’ll be able to see its true since at time three, the hidden state (S_2) transisitoned from (S_2) [ as per the red arrow line]

Just like probably the most possible state ( at each time step ), we may have another matrix of measurement 2 x 6 ( in common M x T ) for the corresponding chances (2). Next we find the final step by comparing the possibilities(2) of the T’th step in this matrix.

Assume, in this example, the final step is 1 ( A ), we add that to our empty path array. then we find the earlier most possible hidden state by backtracking in probably the most probable states (1) matrix. Refer the under fig three for the derived most possible path.The path might have been totally different if the last hidden step was 2 ( B ).

The ultimate most probable path in this case is given in the under diagram, which is analogous as outlined in fig 1.

Now lets take a look at the code. We’ll start with Python first.

## Python:

The code has feedback and its following similar instinct from the example. One implementation trick is to make use of the log scale in order that we dont get the underflow error.

def viterbi(V, a, b, initial_distribution):

T = V.shape[0]
M = a.shape[0]

omega = np.zeros((T, M))

omega[0, :] = np.log(initial_distribution * b[:V[:V[:V[:V[0]])

prev = np.zeros((T – 1, M))

for t in range(1, T):

for j in range(M):

# Similar as Ahead Chance

chance = omega[t – 1] + np.log(a[:, j]) + np.log(b[j, V

# This is our most probable state given earlier state at time t (1)

prev[t – 1, j] = np.argmax(chance)

# This is the chance of probably the most probable state (2)

omega[t, j] = np.max(chance)

# Path Array

S = np.zeros(T)

# Find probably the most possible last hidden state

last_state = np.argmax(omega[T – 1, :])

S[0] = last_state

backtrack_index = 1

for i in vary(T – 2, -1, -1):

S[backtrack_index] = prev[i, int(last_state)]
last_state = prev[i, int(last_state)]
backtrack_index += 1

# Flip the path array since we have been backtracking

S = np.flip(S, axis=zero)

# Convert numeric values to precise hidden states

end result = []
for s in S:

if s == zero:

outcome.append(“A”)

else:

end result.append(“B”)

return outcome

1 2 3 four 5 6 7 eight 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | def viterbi(V, a, b, initial_distribution): T = V.shape[0] M = a.shape[0] omega = np.zeros((T, M)) omega[0, :] = np.log(initial_distribution * b[:V[:V[:V[:V[0]]) prev = np.zeros((T – 1, M)) for t in range(1, T): for j in vary(M): # Similar as Forward Chance chance = omega[t – 1] + np.log(a[:, j]) + np.log(b[j, V |

Here is the complete Python Code:

import pandas as pd

import numpy as np

def ahead(V, a, b, initial_distribution):

alpha = np.zeros((V.shape[0], a.shape[0]))

alpha[0, :] = initial_distribution * b[:V[:V[:V[:V[0]]for t in range(1, V.shape[0]):

for j in vary(a.shape[0]):

# Matrix Computation Steps

# ((1×2) . (1×2)) * (1)

# (1) * (1)

alpha[t, j] = alpha[t – 1].dot(a[:, j]) * b[j, V

def backward(V, a, b):

beta = np.zeros((V.shape[0], a.form[0]))

# setting beta(T) = 1

beta[Vshape[Vshape[Vshape[Vshape[0] – 1]= np.ones((a.shape[0]))

# Loop in backward method from T-1 to

# On account of python indexing the precise loop shall be T-2 to zero

for t in range(V.form[0] – 2, -1, -1):

for j in vary(a.form[0]):

beta[t, j] = (beta[t + 1] * b[:V[:V[:V[:V[t + 1]]).dot(a[j, :])

return beta

def baum_welch(V, a, b, initial_distribution, n_iter=100):

M = a.shape[0]
T = len(V)

for n in range(n_iter):

alpha = ahead(V, a, b, initial_distribution)

beta = backward(V, a, b)

xi = np.zeros((M, M, T – 1))

for t in range(T – 1):

denominator = np.dot(np.dot(alpha[t, :].T, a) * b[:V[:V[:V[:V[t + 1]].T, beta[t + 1, :])

for i in range(M):

numerator = alpha[t, i] * a[i, :] * b[:V[:V[:V[:V[t + 1]].T * beta[t + 1, :].T

xi[i, :, t] = numerator / denominator

gamma = np.sum(xi, axis=1)

a = np.sum(xi, 2) / np.sum(gamma, axis=1).reshape((-1, 1))

# Add further T’th aspect in gamma

gamma = np.hstack((gamma, np.sum(xi[:, :, T – 2], axis=0).reshape((-1, 1))))

Okay = b.shape[1]
denominator = np.sum(gamma, axis=1)

for l in vary(Okay):

b[:, l] = np.sum(gamma[:, V == l], axis=1)

b = np.divide(b, denominator.reshape((-1, 1)))

return (a, b)

def viterbi(V, a, b, initial_distribution):

T = V.form[0]
M = a.shape[0]

omega = np.zeros((T, M))

omega[0, :] = np.log(initial_distribution * b[:V[:V[:V[:V[0]])

prev = np.zeros((T – 1, M))

for t in range(1, T):

for j in vary(M):

# Similar as Ahead Chance

chance = omega[t – 1] + np.log(a[:, j]) + np.log(b[j, V

# This is our most possible state given earlier state at time t (1)

prev[t – 1, j] = np.argmax(chance)

# This is the chance of probably the most possible state (2)

omega[t, j] = np.max(chance)

# Path Array

S = np.zeros(T)

# Discover probably the most probable final hidden state

last_state = np.argmax(omega[T – 1, :])

S[0] = last_state

backtrack_index = 1

for i in vary(T – 2, -1, -1):

S[backtrack_index] = prev[i, int(last_state)]
last_state = prev[i, int(last_state)]
backtrack_index += 1

# Flip the path array since we have been backtracking

S = np.flip(S, axis=zero)

# Convert numeric values to actual hidden states

end result = []
for s in S:

if s == zero:

outcome.append(“A”)

else:

outcome.append(“B”)

return outcome

knowledge = pd.read_csv(‘data_python.csv’)

V = knowledge[‘Visible’].values

# Transition Chances

a = np.ones((2, 2))

a = a / np.sum(a, axis=1)

# Emission Chances

b = np.array(((1, three, 5), (2, four, 6)))

b = b / np.sum(b, axis=1).reshape((-1, 1))

# Equal Chances for the initial distribution

initial_distribution = np.array((0.5, zero.5))

a, b = baum_welch(V, a, b, initial_distribution, n_iter=100)

print(viterbi(V, a, b, initial_distribution))

1 2 three 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | import pandas as pd import numpy as np def forward(V, a, b, initial_distribution): alpha = np.zeros((V.shape[0], a.shape[0])) alpha[0, :] = initial_distribution * b[:V[:V[:V[:V[0]] for t in range(1, V.form[0]): for j in vary(a.shape[0]): # Matrix Computation Steps # ((1×2) . (1×2)) * (1) # (1) * (1) alpha[t, j] = alpha[t – 1].dot(a[:, j]) * b[j, V |

### Output:

I’m only having partial outcome here. Later we’ll examine this with the HMM library.

[‘B’, ‘B’, ‘A’, ‘A’,

…

‘A’, ‘A’,

‘A’, ‘A’, ‘B’, ‘B’, ‘B’, ‘A’,

‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’]

…

‘A’, ‘A’,

‘A’, ‘A’, ‘B’, ‘B’, ‘B’, ‘A’,

‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’]

## R Script:

The R code under doesn’t have any comments. Yow will discover them in the python code ( they’re structurally the identical )

Viterbi=perform(v,a,b,initial_distribution)

T = size(v)

M = nrow(a)

prev = matrix(0, T-1, M)

omega = matrix(0, M, T)

omega[, 1] = log(initial_distribution * b[v[v[v[v[1]])

for(t in 2:T)

for(s in 1:M)

probs = omega[, t – 1] + log(a[, s]) + log(b[s, v

prev[t – 1, s] = which.max(probs)

omega[s, t] = max(probs)

S = rep(0, T)

last_state=which.max(omega[,ncol(omega)])

S[1]=last_state

j=2

for(i in (T-1):1)

S[j]=prev[i,last_state]
last_state=prev[i,last_state]
j=j+1

S[which(S==1)]=’A’

S[which(S==2)]=’B’

S=rev(S)

return(S)

1 2 three four 5 6 7 eight 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | Viterbi=perform(v,a,b,initial_distribution) T = size(v) M = nrow(a) prev = matrix(zero, T-1, M) omega = matrix(0, M, T) omega[, 1] = log(initial_distribution * b[v[v[v[v[1]]) for(t in 2:T) for(s in 1:M) probs = omega[, t – 1] + log(a[, s]) + log(b[s, v |

Full R Code:

ahead = perform(v, a, b, initial_distribution)

T = length(v)

M = nrow(a)

alpha = matrix(zero, T, M)

alpha[1, ] = initial_distribution*b[v[v[v[v[1]]for(t in 2:T)

tmp = alpha[t-1, ] %*% a

alpha[t, ] = tmp * b[, v

return(alpha)

backward = perform(v, a, b)

T = size(v)

M = nrow(a)

beta = matrix(1, T, M)

for(t in (T-1):1)

tmp = as.matrix(beta[t+1, ] * b[, v[t+1]])

beta[t, ] = t(a %*% tmp)

return(beta)

BaumWelch = perform(v, a, b, initial_distribution, n.iter = 100)

for(i in 1:n.iter)

T = size(v)

M = nrow(a)

Okay=ncol(b)

alpha = ahead(v, a, b, initial_distribution)

beta = backward(v, a, b)

xi = array(zero, dim=c(M, M, T-1))

for(t in 1:T-1)

denominator = ((alpha[t,] %*% a) * b[,v[t+1]]) %*% matrix(beta[t+1,])

for(s in 1:M)

numerator = alpha[t,s] * a[s,] * b[,v[t+1]]* beta[t+1,]
xi[s,,t]=numerator/as.vector(denominator)

xi.all.t = rowSums(xi, dims = 2)

a = xi.all.t/rowSums(xi.all.t)

gamma = apply(xi, c(1, three), sum)

gamma = cbind(gamma, colSums(xi[, , T-1]))

for(l in 1:Okay)

b[, l] = rowSums(gamma[, which(v==l)])

b = b/rowSums(b)

return(record(a = a, b = b, initial_distribution = initial_distribution))

Viterbi=perform(v,a,b,initial_distribution)

T = size(v)

M = nrow(a)

prev = matrix(0, T-1, M)

omega = matrix(0, M, T)

omega[, 1] = log(initial_distribution * b[v[v[v[v[1]])

for(t in 2:T)

for(s in 1:M)

probs = omega[, t – 1] + log(a[, s]) + log(b[s, v

prev[t – 1, s] = which.max(probs)

omega[s, t] = max(probs)

S = rep(0, T)

last_state=which.max(omega[,ncol(omega)])

S[1]=last_state

j=2

for(i in (T-1):1)

S[j]=prev[i,last_state]
last_state=prev[i,last_state]
j=j+1

S[which(S==1)]=’A’

S[which(S==2)]=’B’

S=rev(S)

return(S)

knowledge = learn.csv(“data_r.csv”)

M=2; Okay=3

A = matrix(1, M, M)

A = A/rowSums(A)

B = matrix(1:6, M, Okay)

B = B/rowSums(B)

initial_distribution = c(1/2, 1/2)

myout = BaumWelch(knowledge$Visible, A, B, initial_distribution, n.iter = 100)

myout.hidden=Viterbi(knowledge$Visible,myout$a,myout$b,initial_distribution)

1

2

3

four

5

6

7

eight

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

forward = perform(v, a, b, initial_distribution)

T = length(v)

M = nrow(a)

alpha = matrix(zero, T, M)

alpha[1, ] = initial_distribution*b[v[v[v[v[1]]

for(t in 2:T)

tmp = alpha[t-1, ] %*% a

alpha[t, ] = tmp * b[, v

We will examine our output with the HMM library. Right here is the outcome.

library(HMM)

hmm =initHMM(c(“A”, “B”), c(1, 2, three),

startProbs = initial_distribution,

transProbs = A, emissionProbs = B)

true.out = baumWelch(hmm, knowledge$Visible, maxIterations=100, pseudoCount=zero)

true.viterbi = viterbi(true.out$hmm, knowledge$Seen)

sum(true.viterbi != myout.hidden)

library(HMM) hmm =initHMM(c(“A”, “B”), c(1, 2, three), startProbs = initial_distribution, transProbs = A, emissionProbs = B) true.out = baumWelch(hmm, knowledge$Visible, maxIterations=100, pseudoCount=0) true.viterbi = viterbi(true.out$hmm, knowledge$Seen) sum(true.viterbi != myout.hidden) |

### Output:

> sum(true.viterbi != myout.hidden)

[1] 0

>

> sum(true.viterbi != myout.hidden)

[1] zero>

This “Implement Viterbi Algorithm in Hidden Markov Model using Python and R” article was the final a part of the Introduction to the Hidden Markov Model tutorial collection. I consider these articles will assist anyone to know HMM. Right here we went by way of the algorithm for the sequence discrete seen symbols, the equations are little bit totally different for continuous visible symbols. Please publish remark in case you need extra clarification to any of the part.

Do share this article should you find it useful. The complete code might be found at:

Also, listed here are the record of all of the articles in this collection:

- Introduction to Hidden Markov Model
- Ahead and Backward Algorithm in Hidden Markov Model
- Derivation and implementation of Baum Welch Algorithm for Hidden Markov Model
- Implement Viterbi Algorithm in Hidden Markov Model using Python and R