The Dishonest Casino
This example is taken from Durbin et. al. (1999): A dishonest casino uses two dice, one of them is fair the other is loaded.
- The probabilities of the fair die are (1/6,…,1/6).
- The probabilities of the loaded die are (1/10,…,1/10,1/2)
The observer doesn’t know which die is actually taken (the state is hidden), but the sequence of throws (observations) can be used to infer which die (state) was used.
library(HMM)
dishonestCasino()
A Simulated Example
Specify the following HMM.
set.seed(100)
A = matrix(c(0.95,0.05,0.05,0.95),2,2);
B = matrix(c(1/3, 0.1, 1/3, 0.2, 1/3,0.7),2,3);
hmm = initHMM(c("A", "B"), c(1, 2, 3),
startProbs=c(1/2, 1/2),
transProbs=A, emissionProbs=B)
print(hmm)
$States
[1] "A" "B"
$Symbols
[1] 1 2 3
$startProbs
A B
0.5 0.5
$transProbs
to
from A B
A 0.95 0.05
B 0.05 0.95
$emissionProbs
symbols
states 1 2 3
A 0.3333 0.3333 0.3333
B 0.1000 0.2000 0.7000
Generate n=500 obs from this HMM.
set.seed(100)
n = 500;
data = simHMM(hmm, n)
names(data)
[1] "states" "observation"
cbind(data$states[1:5], data$observation[1:5])
[,1] [,2]
[1,] "B" "3"
[2,] "B" "2"
[3,] "B" "3"
[4,] "B" "3"
[5,] "B" "3"
Fit the data with two hidden states
Use BaumWelch (i.e., EM) algorithm to estimate the model parameters.
Note that the algorithm would get stuck, if we initialize the emission distributions to be the same for each latent state.
tmphmm = initHMM(c("A", "B"), c(1, 2, 3))
print(tmphmm)
$States
[1] "A" "B"
$Symbols
[1] 1 2 3
$startProbs
A B
0.5 0.5
$transProbs
to
from A B
A 0.75 0.25
B 0.25 0.75
$emissionProbs
symbols
states 1 2 3
A 0.3333 0.3333 0.3333
B 0.3333 0.3333 0.3333
myfit2 = baumWelch(tmphmm, data$obs)
names(myfit2)
[1] "hmm" "difference"
print(myfit2$hmm)
$States
[1] "A" "B"
$Symbols
[1] 1 2 3
$startProbs
A B
0.5 0.5
$transProbs
to
from A B
A 0.75 0.25
B 0.25 0.75
$emissionProbs
symbols
states 1 2 3
A 0.218 0.28 0.502
B 0.218 0.28 0.502
Next we initilize the emission probability matrix B as follows: generate positive entries for the 2-by-3 matrix using a gamma distribution, and then divide each entry by the corresponding row sum to ensure each row is a probability vector summing to 1.
set.seed(100)
tmpB = matrix(rgamma(n=6, shape=1), 2, 3)
tmpB = tmpB / rowSums(tmpB)
tmphmm = initHMM(c("A", "B"), c(1, 2, 3),
emissionProbs = tmpB)
print(tmphmm)
$States
[1] "A" "B"
$Symbols
[1] 1 2 3
$startProbs
A B
0.5 0.5
$transProbs
to
from A B
A 0.75 0.25
B 0.25 0.75
$emissionProbs
symbols
states 1 2 3
A 0.215 0.6059 0.1791
B 0.360 0.2989 0.3412
myfit2 = baumWelch(tmphmm, data$obs)
names(myfit2)
[1] "hmm" "difference"
print(myfit2$hmm)
$States
[1] "A" "B"
$Symbols
[1] 1 2 3
$startProbs
A B
0.5 0.5
$transProbs
to
from A B
A 0.90261 0.09739
B 0.05345 0.94655
$emissionProbs
symbols
states 1 2 3
A 0.3977 0.3665 0.2357
B 0.1210 0.2333 0.6457
Predict Latent States using Individual Marginal Probabilities
Compute the (marginal) conditional distribution of the hidden state for each observation: P(Z_i = k | X_1, …, X_n). Then classify Z_i to be the most probable state.
mypost = posterior(myfit2$hmm, data$obs)
Predict Latent States using the Viterbi algorithm
Use the Viterbi algorithm to compute the most probable hidden sequence.
vit.out = viterbi(myfit2$hmm, data$obs)
Display the result
# plot the data
plot(data$obs, ylim = c(-6, 3), pch = 3,
xlab = "", ylab = "", bty = "n", yaxt = "n")
axis(2, at = 1:3)
# display true hidden states
text(0, -1.2, adj = 0, cex = 0.8, col = "black", "True: green = A")
for (i in 1:n) {
if (data$states[i] == "A")
rect(i, -1, i + 1, 0, col = "green", border = NA)
else rect(i, -1, i + 1, 0, col = "red", border = NA)
}
# display the post probable path
text(0, -3.2, adj = 0, cex = 0.8, col = "black", "Most probable path")
for (i in 1:n) {
if (vit.out[i] == "A")
rect(i, -3, i + 1, -2, col = "green", border = NA)
else rect(i, -3, i + 1, -2, col = "red", border = NA)
}
# display the marginal most probable state
text(0, -5.2, adj = 0, cex = 0.8, col = "black", "Marginal most probable state")
for (i in 1:n) {
if (mypost[1,i] > 0.5)
rect(i, -5, i + 1, -4, col = "green", border = NA)
else rect(i, -5, i + 1, -4, col = "red", border = NA)
}
Fit the data with three hidden states
The number of hidden states is a tuning parameter. We can try a range of hidden states and then use AIC/BIC to select K, the number of hidden states.
set.seed(100)
tmpB = matrix(rgamma(n=9, shape=1), 3, 3)
tmpB = tmpB / rowSums(tmpB)
tmphmm = initHMM(c("A", "B", "C"), c(1, 2, 3),
emissionProbs = tmpB)
print(tmphmm)
$States
[1] "A" "B" "C"
$Symbols
[1] 1 2 3
$startProbs
A B C
0.3333 0.3333 0.3333
$transProbs
to
from A B C
A 0.6667 0.1667 0.1667
B 0.1667 0.6667 0.1667
C 0.1667 0.1667 0.6667
$emissionProbs
symbols
states 1 2 3
A 0.1685 0.6084 0.2230
B 0.5503 0.1054 0.3443
C 0.2911 0.4256 0.2833
myfit3 = baumWelch(tmphmm, data$obs)
print(myfit3$hmm)
$States
[1] "A" "B" "C"
$Symbols
[1] 1 2 3
$startProbs
A B C
0.3333 0.3333 0.3333
$transProbs
to
from A B C
A 0.90344 0.05597 0.04058
B 0.04410 0.72539 0.23051
C 0.05941 0.17984 0.76075
$emissionProbs
symbols
states 1 2 3
A 0.39866 0.3676 0.2337
B 0.09609 0.2377 0.6662
C 0.14440 0.2298 0.6258
Note that the likelihood of the data (integrated over all possible hidden sequences) could be computed using the forward
command.
# log-likelihood for K=2
f = forward(myfit2$hmm, data$obs) # f: 2xn
A = f[1,n]; B = f[2,n]
A + log(1 + exp(B-A))
[1] -506.1
# log-likelihood for K=3
f = forward(myfit3$hmm, data$obs) # f: 3xn
A = f[1,n]; B = f[-1,n]
A + log(1 + sum(exp(B-A)))
[1] -505.8
The two log-likelihoods are pretty close, so the model with two hidden states, which uses less number of parameters, will be chosen by AIC or BIC, once we add the penalty for the number of parametres.
LS0tCnRpdGxlOiAnU3RhdDU0MjogSGlkZGVuIE1hcmtvdiBNb2RlbCcKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHllcwogICAgdGhlbWU6IHJlYWRhYmxlCi0tLQoKYGBge3IgZ2xvYmFsX29wdGlvbnMsZWNobz1GQUxTRX0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGZpZy53aWR0aD02LCBmaWcuaGVpZ2h0PTYpCm9wdGlvbnMoZGlnaXRzID0gNCkKYGBgCgojIyMgVGhlIERpc2hvbmVzdCBDYXNpbm8KClRoaXMgZXhhbXBsZSBpcyB0YWtlbiBmcm9tIER1cmJpbiBldC4gYWwuICgxOTk5KTogQSBkaXNob25lc3QgY2FzaW5vIHVzZXMgdHdvIGRpY2UsIG9uZSBvZiB0aGVtIGlzIGZhaXIgdGhlIG90aGVyIGlzIGxvYWRlZC4gCgoqIFRoZSBwcm9iYWJpbGl0aWVzIG9mIHRoZSBmYWlyIGRpZSBhcmUgKDEvNiwuLi4sMS82KS4gCiogVGhlIHByb2JhYmlsaXRpZXMgb2YgdGhlIGxvYWRlZCBkaWUgYXJlICgxLzEwLC4uLiwxLzEwLDEvMikKClRoZSBvYnNlcnZlciBkb2Vzbid0IGtub3cgd2hpY2ggZGllIGlzIGFjdHVhbGx5IHRha2VuICh0aGUgc3RhdGUgaXMgaGlkZGVuKSwgYnV0IHRoZSBzZXF1ZW5jZSBvZiB0aHJvd3MgKG9ic2VydmF0aW9ucykgY2FuIGJlIHVzZWQgdG8gaW5mZXIgd2hpY2ggZGllIChzdGF0ZSkgd2FzIHVzZWQuCgpgYGB7ciBmaWcud2lkdGg9Nn0KbGlicmFyeShITU0pCmRpc2hvbmVzdENhc2lubygpCmBgYAoKIyMjIEEgU2ltdWxhdGVkIEV4YW1wbGUKClNwZWNpZnkgdGhlIGZvbGxvd2luZyBITU0uIAoKYGBge3J9CkEgPSBtYXRyaXgoYygwLjk1LDAuMDUsMC4wNSwwLjk1KSwyLDIpOwpCID0gbWF0cml4KGMoMS8zLCAwLjEsIDEvMywgMC4yLCAxLzMsMC43KSwyLDMpOwpobW0gPSBpbml0SE1NKGMoIkEiLCAiQiIpLCBjKDEsIDIsIDMpLAogICAgICAgICAgICBzdGFydFByb2JzPWMoMS8yLCAxLzIpLAogICAgICAgICAgICB0cmFuc1Byb2JzPUEsIGVtaXNzaW9uUHJvYnM9QikKcHJpbnQoaG1tKQpgYGAKCkdlbmVyYXRlIG49NTAwIG9icyBmcm9tIHRoaXMgSE1NLiAKCmBgYHtyfQpzZXQuc2VlZCgxMDApCm4gPSA1MDA7IApkYXRhID0gc2ltSE1NKGhtbSwgbikKbmFtZXMoZGF0YSkKY2JpbmQoZGF0YSRzdGF0ZXNbMTo1XSwgZGF0YSRvYnNlcnZhdGlvblsxOjVdKQpgYGAKCiMjIyBGaXQgdGhlIGRhdGEgd2l0aCB0d28gaGlkZGVuIHN0YXRlcwoKIyMjIyBVc2UgQmF1bVdlbGNoIChpLmUuLCBFTSkgYWxnb3JpdGhtIHRvIGVzdGltYXRlIHRoZSBtb2RlbCBwYXJhbWV0ZXJzLiAKCk5vdGUgdGhhdCB0aGUgYWxnb3JpdGhtIHdvdWxkIGdldCBzdHVjaywgaWYgd2UgaW5pdGlhbGl6ZSB0aGUgZW1pc3Npb24gZGlzdHJpYnV0aW9ucyB0byBiZSB0aGUgc2FtZSBmb3IgZWFjaCBsYXRlbnQgc3RhdGUuCgpgYGB7cn0KdG1waG1tID0gaW5pdEhNTShjKCJBIiwgIkIiKSwgYygxLCAyLCAzKSkKcHJpbnQodG1waG1tKQpteWZpdDIgPSBiYXVtV2VsY2godG1waG1tLCBkYXRhJG9icykKbmFtZXMobXlmaXQyKQpwcmludChteWZpdDIkaG1tKQpgYGAKCk5leHQgd2UgaW5pdGlsaXplIHRoZSBlbWlzc2lvbiBwcm9iYWJpbGl0eSBtYXRyaXggQiBhcyBmb2xsb3dzOiBnZW5lcmF0ZSBwb3NpdGl2ZSBlbnRyaWVzIGZvciB0aGUgMi1ieS0zIG1hdHJpeCB1c2luZyBhIGdhbW1hIGRpc3RyaWJ1dGlvbiwgYW5kIHRoZW4gZGl2aWRlIGVhY2ggZW50cnkgYnkgdGhlIGNvcnJlc3BvbmRpbmcgcm93IHN1bSB0byBlbnN1cmUgZWFjaCByb3cgaXMgYSBwcm9iYWJpbGl0eSB2ZWN0b3Igc3VtbWluZyB0byAxLiAKCmBgYHtyfQpzZXQuc2VlZCgxMDApCnRtcEIgPSBtYXRyaXgocmdhbW1hKG49Niwgc2hhcGU9MSksIDIsIDMpCnRtcEIgPSB0bXBCIC8gcm93U3Vtcyh0bXBCKQp0bXBobW0gPSBpbml0SE1NKGMoIkEiLCAiQiIpLCBjKDEsIDIsIDMpLAogICAgICAgICAgICAgICBlbWlzc2lvblByb2JzID0gdG1wQikKcHJpbnQodG1waG1tKQpteWZpdDIgPSBiYXVtV2VsY2godG1waG1tLCBkYXRhJG9icykKbmFtZXMobXlmaXQyKQpwcmludChteWZpdDIkaG1tKQpgYGAKCgojIyMjIFByZWRpY3QgTGF0ZW50IFN0YXRlcyB1c2luZyBJbmRpdmlkdWFsIE1hcmdpbmFsIFByb2JhYmlsaXRpZXMKCkNvbXB1dGUgdGhlIChtYXJnaW5hbCkgY29uZGl0aW9uYWwgZGlzdHJpYnV0aW9uIG9mIHRoZSBoaWRkZW4gc3RhdGUgZm9yIGVhY2ggb2JzZXJ2YXRpb246IFAoWl9pID0gayB8IFhfMSwgLi4uLCBYX24pLiBUaGVuIGNsYXNzaWZ5IFpfaSB0byBiZSB0aGUgbW9zdCBwcm9iYWJsZSBzdGF0ZS4gCgpgYGB7cn0KbXlwb3N0ID0gcG9zdGVyaW9yKG15Zml0MiRobW0sIGRhdGEkb2JzKQpgYGAKCiMjIyMgUHJlZGljdCBMYXRlbnQgU3RhdGVzIHVzaW5nIHRoZSBWaXRlcmJpIGFsZ29yaXRobQoKVXNlIHRoZSBWaXRlcmJpIGFsZ29yaXRobSB0byBjb21wdXRlIHRoZSBtb3N0IHByb2JhYmxlIGhpZGRlbiBzZXF1ZW5jZS4KCmBgYHtyfQp2aXQub3V0ID0gdml0ZXJiaShteWZpdDIkaG1tLCBkYXRhJG9icykKYGBgCgojIyMjIERpc3BsYXkgdGhlIHJlc3VsdAoKYGBge3IgZmlnLndpZHRoPTZ9CiMgcGxvdCB0aGUgZGF0YQpwbG90KGRhdGEkb2JzLCB5bGltID0gYygtNiwgMyksIHBjaCA9IDMsIAogICAgIHhsYWIgPSAiIiwgeWxhYiA9ICIiLCBidHkgPSAibiIsIHlheHQgPSAibiIpCmF4aXMoMiwgYXQgPSAxOjMpCgojIGRpc3BsYXkgdHJ1ZSBoaWRkZW4gc3RhdGVzCnRleHQoMCwgLTEuMiwgYWRqID0gMCwgY2V4ID0gMC44LCBjb2wgPSAiYmxhY2siLCAiVHJ1ZTogZ3JlZW4gPSBBIikKZm9yIChpIGluIDE6bikgewogIGlmIChkYXRhJHN0YXRlc1tpXSA9PSAiQSIpCiAgICByZWN0KGksIC0xLCBpICsgMSwgMCwgY29sID0gImdyZWVuIiwgYm9yZGVyID0gTkEpCiAgZWxzZSByZWN0KGksIC0xLCBpICsgMSwgMCwgY29sID0gInJlZCIsIGJvcmRlciA9IE5BKQp9CgojIGRpc3BsYXkgdGhlIHBvc3QgcHJvYmFibGUgcGF0aAp0ZXh0KDAsIC0zLjIsIGFkaiA9IDAsIGNleCA9IDAuOCwgY29sID0gImJsYWNrIiwgIk1vc3QgcHJvYmFibGUgcGF0aCIpCmZvciAoaSBpbiAxOm4pIHsKICBpZiAodml0Lm91dFtpXSA9PSAiQSIpCiAgICByZWN0KGksIC0zLCBpICsgMSwgLTIsIGNvbCA9ICJncmVlbiIsIGJvcmRlciA9IE5BKQogIGVsc2UgcmVjdChpLCAtMywgaSArIDEsIC0yLCBjb2wgPSAicmVkIiwgYm9yZGVyID0gTkEpCn0KCiMgZGlzcGxheSB0aGUgbWFyZ2luYWwgbW9zdCBwcm9iYWJsZSBzdGF0ZQp0ZXh0KDAsIC01LjIsIGFkaiA9IDAsIGNleCA9IDAuOCwgY29sID0gImJsYWNrIiwgIk1hcmdpbmFsIG1vc3QgcHJvYmFibGUgc3RhdGUiKQpmb3IgKGkgaW4gMTpuKSB7CiAgaWYgKG15cG9zdFsxLGldID4gMC41KQogICAgcmVjdChpLCAtNSwgaSArIDEsIC00LCBjb2wgPSAiZ3JlZW4iLCBib3JkZXIgPSBOQSkKICBlbHNlIHJlY3QoaSwgLTUsIGkgKyAxLCAtNCwgY29sID0gInJlZCIsIGJvcmRlciA9IE5BKQp9CmBgYAoKIyMjIEZpdCB0aGUgZGF0YSB3aXRoIHRocmVlIGhpZGRlbiBzdGF0ZXMKClRoZSBudW1iZXIgb2YgaGlkZGVuIHN0YXRlcyBpcyBhIHR1bmluZyBwYXJhbWV0ZXIuIFdlIGNhbiB0cnkgYSByYW5nZSBvZiBoaWRkZW4gc3RhdGVzIGFuZCB0aGVuIHVzZSBBSUMvQklDIHRvIHNlbGVjdCBLLCB0aGUgbnVtYmVyIG9mIGhpZGRlbiBzdGF0ZXMuICAKCmBgYHtyfQpzZXQuc2VlZCgxMDApCnRtcEIgPSBtYXRyaXgocmdhbW1hKG49OSwgc2hhcGU9MSksIDMsIDMpCnRtcEIgPSB0bXBCIC8gcm93U3Vtcyh0bXBCKQp0bXBobW0gPSBpbml0SE1NKGMoIkEiLCAiQiIsICJDIiksIGMoMSwgMiwgMyksIAogICAgICAgICAgICAgICAgIGVtaXNzaW9uUHJvYnMgPSB0bXBCKQpwcmludCh0bXBobW0pCm15Zml0MyA9IGJhdW1XZWxjaCh0bXBobW0sIGRhdGEkb2JzKQpwcmludChteWZpdDMkaG1tKQpgYGAKCk5vdGUgdGhhdCB0aGUgbGlrZWxpaG9vZCBvZiB0aGUgZGF0YSAoaW50ZWdyYXRlZCBvdmVyIGFsbCBwb3NzaWJsZSBoaWRkZW4gc2VxdWVuY2VzKSBjb3VsZCBiZSBjb21wdXRlZCB1c2luZyB0aGUgYGZvcndhcmRgIGNvbW1hbmQuIAoKYGBge3J9CiMgbG9nLWxpa2VsaWhvb2QgZm9yIEs9MgpmID0gZm9yd2FyZChteWZpdDIkaG1tLCBkYXRhJG9icykgIyBmOiAyeG4KQSA9IGZbMSxuXTsgQiA9IGZbMixuXQpBICsgbG9nKDEgKyBleHAoQi1BKSkKCiMgbG9nLWxpa2VsaWhvb2QgZm9yIEs9MwpmID0gZm9yd2FyZChteWZpdDMkaG1tLCBkYXRhJG9icykgIyBmOiAzeG4KQSA9IGZbMSxuXTsgQiA9IGZbLTEsbl0KQSArIGxvZygxICsgc3VtKGV4cChCLUEpKSkKYGBgClRoZSB0d28gbG9nLWxpa2VsaWhvb2RzIGFyZSBwcmV0dHkgY2xvc2UsIHNvIHRoZSBtb2RlbCB3aXRoIHR3byBoaWRkZW4gc3RhdGVzLCB3aGljaCB1c2VzIGxlc3MgbnVtYmVyIG9mIHBhcmFtZXRlcnMsIHdpbGwgYmUgY2hvc2VuIGJ5IEFJQyBvciBCSUMsIG9uY2Ugd2UgYWRkIHRoZSBwZW5hbHR5IGZvciB0aGUgbnVtYmVyIG9mIHBhcmFtZXRyZXMu