The Dishonest Casino

This example is taken from Durbin et. al. (1999): A dishonest casino uses two dice, one of them is fair the other is loaded.

The observer doesn’t know which die is actually taken (the state is hidden), but the sequence of throws (observations) can be used to infer which die (state) was used.

library(HMM)
dishonestCasino()

A Simulated Example

Specify the following HMM.

set.seed(100)
A = matrix(c(0.95,0.05,0.05,0.95),2,2);
B = matrix(c(1/3, 0.1, 1/3, 0.2, 1/3,0.7),2,3);
hmm = initHMM(c("A", "B"), c(1, 2, 3),
            startProbs=c(1/2, 1/2),
            transProbs=A, emissionProbs=B)
print(hmm)
$States
[1] "A" "B"

$Symbols
[1] 1 2 3

$startProbs
  A   B 
0.5 0.5 

$transProbs
    to
from    A    B
   A 0.95 0.05
   B 0.05 0.95

$emissionProbs
      symbols
states      1      2      3
     A 0.3333 0.3333 0.3333
     B 0.1000 0.2000 0.7000

Generate n=500 obs from this HMM.

set.seed(100)
n = 500; 
data = simHMM(hmm, n)
names(data)
[1] "states"      "observation"
cbind(data$states[1:5], data$observation[1:5])
     [,1] [,2]
[1,] "B"  "3" 
[2,] "B"  "2" 
[3,] "B"  "3" 
[4,] "B"  "3" 
[5,] "B"  "3" 

Fit the data with two hidden states

Use BaumWelch (i.e., EM) algorithm to estimate the model parameters.

Note that the algorithm would get stuck, if we initialize the emission distributions to be the same for each latent state.

tmphmm = initHMM(c("A", "B"), c(1, 2, 3))
print(tmphmm)
$States
[1] "A" "B"

$Symbols
[1] 1 2 3

$startProbs
  A   B 
0.5 0.5 

$transProbs
    to
from    A    B
   A 0.75 0.25
   B 0.25 0.75

$emissionProbs
      symbols
states      1      2      3
     A 0.3333 0.3333 0.3333
     B 0.3333 0.3333 0.3333
myfit2 = baumWelch(tmphmm, data$obs)
names(myfit2)
[1] "hmm"        "difference"
print(myfit2$hmm)
$States
[1] "A" "B"

$Symbols
[1] 1 2 3

$startProbs
  A   B 
0.5 0.5 

$transProbs
    to
from    A    B
   A 0.75 0.25
   B 0.25 0.75

$emissionProbs
      symbols
states     1    2     3
     A 0.218 0.28 0.502
     B 0.218 0.28 0.502

Next we initilize the emission probability matrix B as follows: generate positive entries for the 2-by-3 matrix using a gamma distribution, and then divide each entry by the corresponding row sum to ensure each row is a probability vector summing to 1.

set.seed(100)
tmpB = matrix(rgamma(n=6, shape=1), 2, 3)
tmpB = tmpB / rowSums(tmpB)
tmphmm = initHMM(c("A", "B"), c(1, 2, 3),
               emissionProbs = tmpB)
print(tmphmm)
$States
[1] "A" "B"

$Symbols
[1] 1 2 3

$startProbs
  A   B 
0.5 0.5 

$transProbs
    to
from    A    B
   A 0.75 0.25
   B 0.25 0.75

$emissionProbs
      symbols
states     1      2      3
     A 0.215 0.6059 0.1791
     B 0.360 0.2989 0.3412
myfit2 = baumWelch(tmphmm, data$obs)
names(myfit2)
[1] "hmm"        "difference"
print(myfit2$hmm)
$States
[1] "A" "B"

$Symbols
[1] 1 2 3

$startProbs
  A   B 
0.5 0.5 

$transProbs
    to
from       A       B
   A 0.90261 0.09739
   B 0.05345 0.94655

$emissionProbs
      symbols
states      1      2      3
     A 0.3977 0.3665 0.2357
     B 0.1210 0.2333 0.6457

Predict Latent States using Individual Marginal Probabilities

Compute the (marginal) conditional distribution of the hidden state for each observation: P(Z_i = k | X_1, …, X_n). Then classify Z_i to be the most probable state.

mypost = posterior(myfit2$hmm, data$obs)

Predict Latent States using the Viterbi algorithm

Use the Viterbi algorithm to compute the most probable hidden sequence.

vit.out = viterbi(myfit2$hmm, data$obs)

Display the result

# plot the data
plot(data$obs, ylim = c(-6, 3), pch = 3, 
     xlab = "", ylab = "", bty = "n", yaxt = "n")
axis(2, at = 1:3)
# display true hidden states
text(0, -1.2, adj = 0, cex = 0.8, col = "black", "True: green = A")
for (i in 1:n) {
  if (data$states[i] == "A")
    rect(i, -1, i + 1, 0, col = "green", border = NA)
  else rect(i, -1, i + 1, 0, col = "red", border = NA)
}
# display the post probable path
text(0, -3.2, adj = 0, cex = 0.8, col = "black", "Most probable path")
for (i in 1:n) {
  if (vit.out[i] == "A")
    rect(i, -3, i + 1, -2, col = "green", border = NA)
  else rect(i, -3, i + 1, -2, col = "red", border = NA)
}
# display the marginal most probable state
text(0, -5.2, adj = 0, cex = 0.8, col = "black", "Marginal most probable state")
for (i in 1:n) {
  if (mypost[1,i] > 0.5)
    rect(i, -5, i + 1, -4, col = "green", border = NA)
  else rect(i, -5, i + 1, -4, col = "red", border = NA)
}

Fit the data with three hidden states

The number of hidden states is a tuning parameter. We can try a range of hidden states and then use AIC/BIC to select K, the number of hidden states.

set.seed(100)
tmpB = matrix(rgamma(n=9, shape=1), 3, 3)
tmpB = tmpB / rowSums(tmpB)
tmphmm = initHMM(c("A", "B", "C"), c(1, 2, 3), 
                 emissionProbs = tmpB)
print(tmphmm)
$States
[1] "A" "B" "C"

$Symbols
[1] 1 2 3

$startProbs
     A      B      C 
0.3333 0.3333 0.3333 

$transProbs
    to
from      A      B      C
   A 0.6667 0.1667 0.1667
   B 0.1667 0.6667 0.1667
   C 0.1667 0.1667 0.6667

$emissionProbs
      symbols
states      1      2      3
     A 0.1685 0.6084 0.2230
     B 0.5503 0.1054 0.3443
     C 0.2911 0.4256 0.2833
myfit3 = baumWelch(tmphmm, data$obs)
print(myfit3$hmm)
$States
[1] "A" "B" "C"

$Symbols
[1] 1 2 3

$startProbs
     A      B      C 
0.3333 0.3333 0.3333 

$transProbs
    to
from       A       B       C
   A 0.90344 0.05597 0.04058
   B 0.04410 0.72539 0.23051
   C 0.05941 0.17984 0.76075

$emissionProbs
      symbols
states       1      2      3
     A 0.39866 0.3676 0.2337
     B 0.09609 0.2377 0.6662
     C 0.14440 0.2298 0.6258

Note that the likelihood of the data (integrated over all possible hidden sequences) could be computed using the forward command.

# log-likelihood for K=2
f = forward(myfit2$hmm, data$obs) # f: 2xn
A = f[1,n]; B = f[2,n]
A + log(1 + exp(B-A))
[1] -506.1
# log-likelihood for K=3
f = forward(myfit3$hmm, data$obs) # f: 3xn
A = f[1,n]; B = f[-1,n]
A + log(1 + sum(exp(B-A)))
[1] -505.8

The two log-likelihoods are pretty close, so the model with two hidden states, which uses less number of parameters, will be chosen by AIC or BIC, once we add the penalty for the number of parametres.

LS0tCnRpdGxlOiAnU3RhdDU0MjogSGlkZGVuIE1hcmtvdiBNb2RlbCcKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHllcwogICAgdGhlbWU6IHJlYWRhYmxlCi0tLQoKYGBge3IgZ2xvYmFsX29wdGlvbnMsZWNobz1GQUxTRX0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGZpZy53aWR0aD02LCBmaWcuaGVpZ2h0PTYpCm9wdGlvbnMoZGlnaXRzID0gNCkKYGBgCgojIyMgVGhlIERpc2hvbmVzdCBDYXNpbm8KClRoaXMgZXhhbXBsZSBpcyB0YWtlbiBmcm9tIER1cmJpbiBldC4gYWwuICgxOTk5KTogQSBkaXNob25lc3QgY2FzaW5vIHVzZXMgdHdvIGRpY2UsIG9uZSBvZiB0aGVtIGlzIGZhaXIgdGhlIG90aGVyIGlzIGxvYWRlZC4gCgoqIFRoZSBwcm9iYWJpbGl0aWVzIG9mIHRoZSBmYWlyIGRpZSBhcmUgKDEvNiwuLi4sMS82KS4gCiogVGhlIHByb2JhYmlsaXRpZXMgb2YgdGhlIGxvYWRlZCBkaWUgYXJlICgxLzEwLC4uLiwxLzEwLDEvMikKClRoZSBvYnNlcnZlciBkb2Vzbid0IGtub3cgd2hpY2ggZGllIGlzIGFjdHVhbGx5IHRha2VuICh0aGUgc3RhdGUgaXMgaGlkZGVuKSwgYnV0IHRoZSBzZXF1ZW5jZSBvZiB0aHJvd3MgKG9ic2VydmF0aW9ucykgY2FuIGJlIHVzZWQgdG8gaW5mZXIgd2hpY2ggZGllIChzdGF0ZSkgd2FzIHVzZWQuCgpgYGB7ciBmaWcud2lkdGg9Nn0KbGlicmFyeShITU0pCmRpc2hvbmVzdENhc2lubygpCmBgYAoKIyMjIEEgU2ltdWxhdGVkIEV4YW1wbGUKClNwZWNpZnkgdGhlIGZvbGxvd2luZyBITU0uIAoKYGBge3J9CkEgPSBtYXRyaXgoYygwLjk1LDAuMDUsMC4wNSwwLjk1KSwyLDIpOwpCID0gbWF0cml4KGMoMS8zLCAwLjEsIDEvMywgMC4yLCAxLzMsMC43KSwyLDMpOwpobW0gPSBpbml0SE1NKGMoIkEiLCAiQiIpLCBjKDEsIDIsIDMpLAogICAgICAgICAgICBzdGFydFByb2JzPWMoMS8yLCAxLzIpLAogICAgICAgICAgICB0cmFuc1Byb2JzPUEsIGVtaXNzaW9uUHJvYnM9QikKcHJpbnQoaG1tKQpgYGAKCkdlbmVyYXRlIG49NTAwIG9icyBmcm9tIHRoaXMgSE1NLiAKCmBgYHtyfQpzZXQuc2VlZCgxMDApCm4gPSA1MDA7IApkYXRhID0gc2ltSE1NKGhtbSwgbikKbmFtZXMoZGF0YSkKY2JpbmQoZGF0YSRzdGF0ZXNbMTo1XSwgZGF0YSRvYnNlcnZhdGlvblsxOjVdKQpgYGAKCiMjIyBGaXQgdGhlIGRhdGEgd2l0aCB0d28gaGlkZGVuIHN0YXRlcwoKIyMjIyBVc2UgQmF1bVdlbGNoIChpLmUuLCBFTSkgYWxnb3JpdGhtIHRvIGVzdGltYXRlIHRoZSBtb2RlbCBwYXJhbWV0ZXJzLiAKCk5vdGUgdGhhdCB0aGUgYWxnb3JpdGhtIHdvdWxkIGdldCBzdHVjaywgaWYgd2UgaW5pdGlhbGl6ZSB0aGUgZW1pc3Npb24gZGlzdHJpYnV0aW9ucyB0byBiZSB0aGUgc2FtZSBmb3IgZWFjaCBsYXRlbnQgc3RhdGUuCgpgYGB7cn0KdG1waG1tID0gaW5pdEhNTShjKCJBIiwgIkIiKSwgYygxLCAyLCAzKSkKcHJpbnQodG1waG1tKQpteWZpdDIgPSBiYXVtV2VsY2godG1waG1tLCBkYXRhJG9icykKbmFtZXMobXlmaXQyKQpwcmludChteWZpdDIkaG1tKQpgYGAKCk5leHQgd2UgaW5pdGlsaXplIHRoZSBlbWlzc2lvbiBwcm9iYWJpbGl0eSBtYXRyaXggQiBhcyBmb2xsb3dzOiBnZW5lcmF0ZSBwb3NpdGl2ZSBlbnRyaWVzIGZvciB0aGUgMi1ieS0zIG1hdHJpeCB1c2luZyBhIGdhbW1hIGRpc3RyaWJ1dGlvbiwgYW5kIHRoZW4gZGl2aWRlIGVhY2ggZW50cnkgYnkgdGhlIGNvcnJlc3BvbmRpbmcgcm93IHN1bSB0byBlbnN1cmUgZWFjaCByb3cgaXMgYSBwcm9iYWJpbGl0eSB2ZWN0b3Igc3VtbWluZyB0byAxLiAKCmBgYHtyfQpzZXQuc2VlZCgxMDApCnRtcEIgPSBtYXRyaXgocmdhbW1hKG49Niwgc2hhcGU9MSksIDIsIDMpCnRtcEIgPSB0bXBCIC8gcm93U3Vtcyh0bXBCKQp0bXBobW0gPSBpbml0SE1NKGMoIkEiLCAiQiIpLCBjKDEsIDIsIDMpLAogICAgICAgICAgICAgICBlbWlzc2lvblByb2JzID0gdG1wQikKcHJpbnQodG1waG1tKQpteWZpdDIgPSBiYXVtV2VsY2godG1waG1tLCBkYXRhJG9icykKbmFtZXMobXlmaXQyKQpwcmludChteWZpdDIkaG1tKQpgYGAKCgojIyMjIFByZWRpY3QgTGF0ZW50IFN0YXRlcyB1c2luZyBJbmRpdmlkdWFsIE1hcmdpbmFsIFByb2JhYmlsaXRpZXMKCkNvbXB1dGUgdGhlIChtYXJnaW5hbCkgY29uZGl0aW9uYWwgZGlzdHJpYnV0aW9uIG9mIHRoZSBoaWRkZW4gc3RhdGUgZm9yIGVhY2ggb2JzZXJ2YXRpb246IFAoWl9pID0gayB8IFhfMSwgLi4uLCBYX24pLiBUaGVuIGNsYXNzaWZ5IFpfaSB0byBiZSB0aGUgbW9zdCBwcm9iYWJsZSBzdGF0ZS4gCgpgYGB7cn0KbXlwb3N0ID0gcG9zdGVyaW9yKG15Zml0MiRobW0sIGRhdGEkb2JzKQpgYGAKCiMjIyMgUHJlZGljdCBMYXRlbnQgU3RhdGVzIHVzaW5nIHRoZSBWaXRlcmJpIGFsZ29yaXRobQoKVXNlIHRoZSBWaXRlcmJpIGFsZ29yaXRobSB0byBjb21wdXRlIHRoZSBtb3N0IHByb2JhYmxlIGhpZGRlbiBzZXF1ZW5jZS4KCmBgYHtyfQp2aXQub3V0ID0gdml0ZXJiaShteWZpdDIkaG1tLCBkYXRhJG9icykKYGBgCgojIyMjIERpc3BsYXkgdGhlIHJlc3VsdAoKYGBge3IgZmlnLndpZHRoPTZ9CiMgcGxvdCB0aGUgZGF0YQpwbG90KGRhdGEkb2JzLCB5bGltID0gYygtNiwgMyksIHBjaCA9IDMsIAogICAgIHhsYWIgPSAiIiwgeWxhYiA9ICIiLCBidHkgPSAibiIsIHlheHQgPSAibiIpCmF4aXMoMiwgYXQgPSAxOjMpCgojIGRpc3BsYXkgdHJ1ZSBoaWRkZW4gc3RhdGVzCnRleHQoMCwgLTEuMiwgYWRqID0gMCwgY2V4ID0gMC44LCBjb2wgPSAiYmxhY2siLCAiVHJ1ZTogZ3JlZW4gPSBBIikKZm9yIChpIGluIDE6bikgewogIGlmIChkYXRhJHN0YXRlc1tpXSA9PSAiQSIpCiAgICByZWN0KGksIC0xLCBpICsgMSwgMCwgY29sID0gImdyZWVuIiwgYm9yZGVyID0gTkEpCiAgZWxzZSByZWN0KGksIC0xLCBpICsgMSwgMCwgY29sID0gInJlZCIsIGJvcmRlciA9IE5BKQp9CgojIGRpc3BsYXkgdGhlIHBvc3QgcHJvYmFibGUgcGF0aAp0ZXh0KDAsIC0zLjIsIGFkaiA9IDAsIGNleCA9IDAuOCwgY29sID0gImJsYWNrIiwgIk1vc3QgcHJvYmFibGUgcGF0aCIpCmZvciAoaSBpbiAxOm4pIHsKICBpZiAodml0Lm91dFtpXSA9PSAiQSIpCiAgICByZWN0KGksIC0zLCBpICsgMSwgLTIsIGNvbCA9ICJncmVlbiIsIGJvcmRlciA9IE5BKQogIGVsc2UgcmVjdChpLCAtMywgaSArIDEsIC0yLCBjb2wgPSAicmVkIiwgYm9yZGVyID0gTkEpCn0KCiMgZGlzcGxheSB0aGUgbWFyZ2luYWwgbW9zdCBwcm9iYWJsZSBzdGF0ZQp0ZXh0KDAsIC01LjIsIGFkaiA9IDAsIGNleCA9IDAuOCwgY29sID0gImJsYWNrIiwgIk1hcmdpbmFsIG1vc3QgcHJvYmFibGUgc3RhdGUiKQpmb3IgKGkgaW4gMTpuKSB7CiAgaWYgKG15cG9zdFsxLGldID4gMC41KQogICAgcmVjdChpLCAtNSwgaSArIDEsIC00LCBjb2wgPSAiZ3JlZW4iLCBib3JkZXIgPSBOQSkKICBlbHNlIHJlY3QoaSwgLTUsIGkgKyAxLCAtNCwgY29sID0gInJlZCIsIGJvcmRlciA9IE5BKQp9CmBgYAoKIyMjIEZpdCB0aGUgZGF0YSB3aXRoIHRocmVlIGhpZGRlbiBzdGF0ZXMKClRoZSBudW1iZXIgb2YgaGlkZGVuIHN0YXRlcyBpcyBhIHR1bmluZyBwYXJhbWV0ZXIuIFdlIGNhbiB0cnkgYSByYW5nZSBvZiBoaWRkZW4gc3RhdGVzIGFuZCB0aGVuIHVzZSBBSUMvQklDIHRvIHNlbGVjdCBLLCB0aGUgbnVtYmVyIG9mIGhpZGRlbiBzdGF0ZXMuICAKCmBgYHtyfQpzZXQuc2VlZCgxMDApCnRtcEIgPSBtYXRyaXgocmdhbW1hKG49OSwgc2hhcGU9MSksIDMsIDMpCnRtcEIgPSB0bXBCIC8gcm93U3Vtcyh0bXBCKQp0bXBobW0gPSBpbml0SE1NKGMoIkEiLCAiQiIsICJDIiksIGMoMSwgMiwgMyksIAogICAgICAgICAgICAgICAgIGVtaXNzaW9uUHJvYnMgPSB0bXBCKQpwcmludCh0bXBobW0pCm15Zml0MyA9IGJhdW1XZWxjaCh0bXBobW0sIGRhdGEkb2JzKQpwcmludChteWZpdDMkaG1tKQpgYGAKCk5vdGUgdGhhdCB0aGUgbGlrZWxpaG9vZCBvZiB0aGUgZGF0YSAoaW50ZWdyYXRlZCBvdmVyIGFsbCBwb3NzaWJsZSBoaWRkZW4gc2VxdWVuY2VzKSBjb3VsZCBiZSBjb21wdXRlZCB1c2luZyB0aGUgYGZvcndhcmRgIGNvbW1hbmQuIAoKYGBge3J9CiMgbG9nLWxpa2VsaWhvb2QgZm9yIEs9MgpmID0gZm9yd2FyZChteWZpdDIkaG1tLCBkYXRhJG9icykgIyBmOiAyeG4KQSA9IGZbMSxuXTsgQiA9IGZbMixuXQpBICsgbG9nKDEgKyBleHAoQi1BKSkKCiMgbG9nLWxpa2VsaWhvb2QgZm9yIEs9MwpmID0gZm9yd2FyZChteWZpdDMkaG1tLCBkYXRhJG9icykgIyBmOiAzeG4KQSA9IGZbMSxuXTsgQiA9IGZbLTEsbl0KQSArIGxvZygxICsgc3VtKGV4cChCLUEpKSkKYGBgClRoZSB0d28gbG9nLWxpa2VsaWhvb2RzIGFyZSBwcmV0dHkgY2xvc2UsIHNvIHRoZSBtb2RlbCB3aXRoIHR3byBoaWRkZW4gc3RhdGVzLCB3aGljaCB1c2VzIGxlc3MgbnVtYmVyIG9mIHBhcmFtZXRlcnMsIHdpbGwgYmUgY2hvc2VuIGJ5IEFJQyBvciBCSUMsIG9uY2Ugd2UgYWRkIHRoZSBwZW5hbHR5IGZvciB0aGUgbnVtYmVyIG9mIHBhcmFtZXRyZXMu