Markov Chains

Revision as of 23:47, 22 October 2024 by Tsun26 (talk | contribs) (Created page with "==Markov Chains== Loosely put a Markov Chain is a mathematical process involving transitions, governed by certain probabilistic rules, between different states. Markov chai...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Markov Chains

 Loosely put a Markov Chain is a mathematical process involving transitions, governed by certain probabilistic rules, between different states. Markov chains are a special type of stochastic process that satisfy the Markov property, which states that the future state of the system depends only on its present state, and not on its history of past states. These types of systems appear frequently in math competitions, and are even more prevalent in various real-world applications.

Formulation

  When a Markov chain "jumps" from one state to another, we call it a step. The number of steps a chain has undergone is conventionally denoted by a nonnegative index known as the time of the system. We use $S=\{1, 2, \cdots, N\}$ to denote all the possible states of the system, $t$ to denote the time index of the system, and $X_t$ to denote the state of the system at time $t$. Notice that the state of the system at time $t$ is random, so $X_t$ is a random variable. Here, we assume that we are dealing with a discrete-time Markov chain, which is equivalent to assuming $t$ is an integer. To describe the probabilities for our process we need to calculate

\[P\{X_0=i_0, X_1=i_1,\cdots, X_t=i_t\}\]

for every $t$ and sequence of states $(i_0, i_1, \cdots, i_t)$. Conditional probability tells us that the above is equal to

\[P\{X_0=i_0\}P(X_1=i_1|X_0=i_0)P(X_2=i_2|X_0=i_0,X_1=i_1)\cdots P(X_t=i_t|X_0=i_0\cdots X_{t-1}=i_{t-1})\]

Define

\[\phi(i)=P\{X_0=i\}, i=1, \cdots, N\]

We typically use $\phi(i)$ instead of $P\{X_0=i\}$ to simplify our notation. In reality, $\phi(i)$ is the initial probability distribution usually given by the context.

Since our process satisfies the Markov property, we know that \[P(X_t=i_t|X_0=i_0\cdots X_{t-1}=i_{t-1})=P(X_t=i_t|X_{t-1}=i_{t-1})\]

Now we make the additional assumption that $P(X_t=i|X_{t-1}=j)$ depends only on $i, j$, but not $t$. This assumption is called time homogeneity, and it just means that the "transition" probability from one state to another is invariant with respect to time. As a result, we can write $P(X_t=i|X_{t-1}=j)=p(i, j)$ for some function $p$ that from now on we will call the transition probability.

Using our new notation, it's not hard to see:

\[P\{X_0=i_0, X_1=i_1,\cdots, X_t=i_t\}=p(i_t, i_{t-1})p(i_{t-1}, i_{t-2})\cdots p(i_1, i_0)\phi(i_0)\]

Notice that $P\{X_0=i_0\}$ used to be our first term, but now it's our last term. Although we could have preserved the original order by using $p(j, i)$ instead of $p(i, j)$ (as seen in some textbooks), we choose to use $p(i, j)$ over $p(j, i)$ to avoid complications with matrix calculations that may arise later on.

Now consider the problem of calculating the probability distribution of $X_t$ for a given time $t$ and initial probability transition $\phi$. There are several ways to approach this problem, all essentially the same, but different in form.

Approach 1 (Pure Matrix Multiplication)

Our goal is to calculate $P\{X_t=i_t\}$ for any given time $t$ and state $i_t$. By using the law of total probability, or just intuition, it's easy to see:

\[P\{X_t=i_t\}=\sum_{i_{t-1},\cdots,i_1, i_0\in S}^{} P\{X_0=i_0, X_1=i_1,\cdots, X_{t-1}=i_{t-1}, X_t=i_t\}\]

\[=\sum_{i_{t-1},\cdots,i_1, i_0\in S}^{} p(i_t, i_{t-1})p(i_{t-1}, i_{t-2})\cdots p(i_1, i_0)\phi(i_0)\]

People well-versed in matrix multiplication will notice that the above summand is just the $i_t$th term of the matrix product $\textbf{P}^t\phi$, where $\phi$ is the initial probability distribution vector given by


\begin{equation} \phi = \begin{bmatrix} \phi(1) \\ \phi(2) \\ \vdots \\ \phi(N) \end{bmatrix} \end{equation}


and $\textbf{P}$ is the transition matrix given by

\begin{equation} \textbf{P} = \begin{bmatrix} p(1, 1) & p(1, 2) & \cdots & p(1, N) \\ p(2, 1) & p(2, 2) & \cdots & p(2, N) \\ \vdots \\ p(N, 1) & p(N, 2) & \cdots & p(N, N) \end{bmatrix} \end{equation}

$\textbf{P}^t\phi$ is a column vector, and we know from above that $P\{X_t=i_t\}=(\textbf{P}^t\phi)_{i_{t} 1}$ for any value of $i_t$. Define $\phi_t(i)=P\{X_t=i\}$, and

\begin{equation} \phi_t = \begin{bmatrix} \phi_t(1) \\ \phi_t(2) \\ \vdots \\ \phi_t(N) \end{bmatrix} \end{equation}

$\phi_t$ denotes the probability distribution of $X_t$, and our results above can be rewritten as $\phi_t=\textbf{P}^t\phi$