Abstract
These notes provide a short, focused introduction to modelling stochastic gene expres-
sion, including a derivation of the master equation, the recovery of deterministic dynamics,
birth-and-death processes, and Langevin theory. The notes were last updated around 2010
and written for lectures given at summer schools held at McGill University’s Centre for
Non-linear Dynamics in 2004, 2006, and 2008.
Introduction
A system evolves stochastically if its dynamics is partly generated by a force of random strength
or by a force at random times or by both. For stochastic systems, it is not possible to exactly
determine the state of the system at later times given its state at the current time. Instead, to
describe a stochastic system, we use the probability that the system is in a certain state and can
predict how this probability changes with time. Calculating this probability is often difficult,
and we usually focus on finding the moments of the probability distribution, such as the mean
and variance, which are commonly measured experimentally.
Any chemical reaction is stochastic. Reactants come together by diffusion, their motion
driven by collisions with other molecules. Once together, these same collisions alter the internal
energies of the reactants, and so their propensity to react. Both effects cause individual reaction
events to occur randomly.
Is stochasticity important in biology? Intuitively, stochasticity is only significant when typical
numbers of molecules are low. Then individual reactions, which at most change the numbers
of molecules by one or two, matter. Low numbers are frequent in vivo: gene copy number is
typically one or two, and transcription factors often number in the tens, at least in bacteria.
:1 There are now many reviews on biochemical stochasticity[1, 2, 3, 4].
60 Unambiguously measuring stochastic gene expression, however, can be challenging [5]. Naively,
7. we could place Green Fluorescent Protein (GFP) on a bacterial chromosome downstream of a
promoter that is activated by the system of interest. By measuring the variation in fluores-
07 cence across a population of cells, we could quantify stochasticity. Every biochemical reaction,
80 however, is potentially stochastic. Fluorescence variation could be because of stochasticity in
6v the process under study or could result from the general background ‘hum’ of stochasticity:
1 stochastic effects in ribosome synthesis could lead to different numbers of ribosomes and so to
differences in gene expression in each cell; stochastic effects in the cell cycle machinery may
desynchronize the population; stochastic effects in signaling networks could cause each cell to
respond uniquely, and so on.
1
, Variation has then two classes: intrinsic stochasticity, the stochasticity inherent in the
dynamics of the system and that arises from fluctuations in the timing of individual reactions, and
extrinsic stochasticity, the stochasticity originating from reactions of the system of interest
with other stochastic systems in the cell or its environment [6, 5]. In principle, intrinsic and
extrinsic stochasticity can be measured by creating a copy of the network of interest in the same
cellular environment as the original network [5]. We can define intrinsic and extrinsic variables
for the system of interest, with fluctuations in these variables together generating intrinsic and
extrinsic stochasticity [6]. The intrinsic variables of a system will typically specify the copy
numbers of the molecular components of the system. For gene expression, the level of occupancy
of the promoter by transcription factors, the numbers of mRNA molecules, and the number of
proteins are all intrinsic variables. Imagining a second copy of the system – an identical gene
and promoter elsewhere in the genome – then the instantaneous values of the intrinsic variables
of this copy of the system will usually differ from those of the original system. At any point
in time, for example, the number of mRNAs transcribed from the first copy of the gene will
usually be different from the number of mRNAs transcribed from the second copy. Extrinsic
variables, however, describe processes that equally affect each copy of the system. Their values
are therefore the same for each copy. For example, the number of cytosolic RNA polymerases
is an extrinsic variable because the rate of gene expression from both copies of the gene will
increase if the number of cytosolic RNA polymerases increases and decrease if the number of
cytosolic RNA polymerases decreases. In contrast, the number of transcribing RNA polymerases
is an intrinsic variable because we expect the number of transcribing RNA polymerases to be
different for each copy of the gene at any point in time.
Stochasticity is quantified by measuring an intrinsic variable for both copies of the system.
For gene expression, the number of proteins is typically measured by using fluorescent proteins
as markers [7, 5, 8, 9]. Imaging a population of cells then allows estimation of the distribution of
protein levels at steady-state. Fluctuations of the intrinsic variable will in vivo have both intrinsic
and extrinsic sources. The number of proteins will fluctuate because of intrinsic stochasticity
generated during gene expression, but also because of stochasticity in, for example, the number
of cytosolic RNA polymerases or ribosomes or proteosomes. We will use the term ‘noise’ to
mean an empirical measure of stochasticity defined by the coefficient of variation (the standard
deviation divided by the mean) of a stochastic process. An estimate of intrinsic stochasticity is
the intrinsic noise which is defined as a measure of the difference between the value of an intrinsic
variable for one copy of the system and its counterpart in the second copy. For gene expression,
typically the intrinsic noise is the mean absolute difference (suitably normalized) at steady-
state between the number of proteins expressed from one copy of the gene and the number of
proteins expressed from the other copy [5]. Such a definition supports the intuition that intrinsic
fluctuations cause variation in one copy of the system to be uncorrelated with variation in the
other copy. Extrinsic noise is defined as the correlation coefficient between the intrinsic variable
of one copy of the system and its counterpart for the other copy because extrinsic fluctuations
equally affect both copies of the system and consequently cause correlations between variation
in one copy and variation in the other. The intrinsic and extrinsic noise should be related to the
coefficient of variation of the intrinsic variable of the original system of interest. This so-called
total noise is given by the square root of the sum of the squares of the intrinsic and the extrinsic
noise [6].
Such two-colour measurements of stochasticity have been applied to bacteria and yeast where
gene expression has been characterized by using two copies of a promoter placed in the genome
2
, with each copy driving a distinguishable allele of Green Fluorescent Protein [5, 9]. Both intrinsic
and extrinsic noise can be substantial giving, for example, a total noise of around 0.4, and so
the standard deviation of protein numbers is 40% of the mean. Extrinsic noise is usually higher
than intrinsic noise. There are some experimental caveats: both copies of the system should
be placed ‘equally’ in the genome so that the probabilities of transcription and replication are
equal. This ‘equality’ is perhaps best met by placing the two genes adjacent to each other [5].
Although conceptually there are no difficulties, practically problems arise with feedback. If the
protein synthesized in one system can influence its own expression, the same protein will also
influence expression in a second copy of the system. The two copies of the system have lost
the (conditional) independence they require to be two simultaneous measurements of the same
stochastic process.
A stochastic description of chemical reactions
For any network of chemical reactions, the lowest level of description commonly used in systems
biology is the chemical master equation. This equation assumes that the system is well-stirred
and so ignores spatial effects. It governs how the probability of the system being in any particular
state changes with time. A system state is defined by the number of molecules present for each
chemical species, and it will change every time a reaction occurs. From the master equation
we can derive the deterministic approximation (a set of coupled differential equations) which is
often used to describe system dynamics. The dynamics of the mean of each chemical species
approximately obeys these deterministic equations as the numbers of molecules of all species
increase [10, 11]. The master equation itself is usually only solvable analytically for linear
systems: systems having only first-order chemical reactions.
Nevertheless, several approximations exist, all of which exploit the tendency of fluctuations
to decrease as the numbers of molecules increase. The most systematic is the linear noise
approach of van Kampen [12]. If the concentration of each chemical species is fixed, then
changing the system volume, Ω, alters the number of molecules of every chemical species. The
linear noise approximation is based on a systematic expansion of the master equation in the
inverse of the system volume, Ω−1. It leads to diffusion-like equations that accurately describe
small fluctuations around any stable attractor of the system. For systems that tend to steady-
state, a Langevin approach is also often used [13, 14, 15]. Here additive, white stochastic terms
are included in the deterministic equations, with the magnitude of these terms being determined
by the chemical reactions. At steady-state and for sufficiently high numbers of molecules, the
Langevin and linear noise approaches are equivalent.
Unfortunately, all these methods become intractable, in general, once the number of chemical
species in the system reaches more than three (we then need to analytically calculate the inverse
of at least a 4 × 4 matrix or its eigenvalues). Rather than numerically solve the master equa-
tion, the Gillespie algorithm [16], a Monte Carlo method, is often used to simulate intrinsic
fluctuations by generating one sample time course from the master equation. By doing many
simulations and averaging, the mean and variance for each chemical species can be calculated
as a function of time. Extrinsic fluctuations can be modelled as fluctuations in the parameters
of the system, such as the kinetic rates [17, 18]. They can be included by a minor modification
of the Gillespie algorithm that feeds in a pre-simulated time series of extrinsic fluctations and
so generates both intrinsic and extrinsic fluctuations [18].
3