When I first started out to learn CRFs there was a scarcity of material that I could consume. They were either too technical for my ability or were repeating what I already knew. This article hopes to bridge that gap and allow you to read material on CRFs. We assume knowledge of some math related to set theory and probability theory.
- Factorization of functions is when a function is shown to be the product of some other functions.
- Probability
- It is the chance of something happening.
- Graphs
- A Graph is defined as a set of Vertices, Edges. The edges may be directed or undirected.
- A Bipartile graph is a one in which the vertices can be split into two groups where members of a group are note connected to any other member of that group.
- A factor graph is a Bipartile Graph representing the factorization of a function.
- Graphical Model
- Probabilistic Graphical Models are probabilistic models where a graph represents the conditional dependence structure between variables.
- Two branches of graphical representations of distributions are common.
- Bayesian networks
- Hidden Markov Models (HMMs) and Neural networks are special cases of Bayesian networks
- Markov networks/ Markov random fields
- Markov networks may be cyclic and are undirected whereas Bayesian networks are directed and acyclic.
- Markov property
- At it's core the Markov property asserts memory-less-ness.
- It says that each observation must not be influenced by the past ones.
- Just as Capital sigma is used to denote the sum of a series, product of a series is denoted py PI.
Feel free to delve deeper into that body of knowledge before understanding what a CRF is. Understand what they represent before stepping onto this next lot.
A Conditional Random field is an undirected probabilistic graphical model. It is used to predict a set of classification labels for a set of inputs. Instead of considering a single input variable individually it considers the effect of neighbours.
A well worn example is the English sentence. Classifying the words of a sentence as 'verb', 'noun' etc while individually looking at the word is classification. A CRF would look at the neighbouring words too and thus predict the classification for the entire sentence together and not just one word at a time.
A CRF is a graphical model where the set of vertices can be split into two disjoint sets X and Y such that X is the set of inputs and Y is the set of outputs and then the conditional probabilities are modeled according to P(Y|X).
A very good example of CRF with python can be seen in this notebook
(Jupyter Notebook).