Biology 301

Introduction to Demography

To study changes in populations with age or stage structure requires demographic theory, which relates probabilities of survival and reproduction at each age to the growth and composition of a population.

Today we'll introduce some of the concepts and parameters important in studying the age structure and growth of a population.

We'll focus on the demography of Canada. I obtained the data primarily from census data published by Statistics Canada (catalogs 84-210, 84-211, 91-213-XPB) and from the International Database maintained by the US Census Bureau.

(Other web sites of interest are:

Introduction to Demography

Total Population Size:

Consider a population measured at particular points in time, for instance every five years during a census. One can track the population size from census to census to get a historical perspective on population growth:

This shows that Canada has grown in the recent past. For instance, between the 1990 and 1995 censuses, the population grew by 1.3% per year (although about half of this growth was due to migration).

Introduction to Demography

In studying a natural population, there are many complications that we are going to ignore. For instance, we are going to ignore problems associated with emigration (out of the population) and immigration (into the population). Furthermore, males and females generally differ in their demographic parameters but we're going to focus on females.

We'll use five year census points and base our analysis on the 1991 census, so that we may compare our projections to the 1996 census data.

Age Distribution:

A population can be described by how many individuals there currently are in each age class.

Introduction to Demography

This census data groups individuals into five year age classes.

By convention, the number of females at time t in age class x is n_x,t.

Introduction to Demography

Interestingly, this age distribution has changed substantially since 1951:

Introduction to Demography

Mortality:

Another critical piece of data that we can get from the census is how many individuals die per year, by age class:

Introduction to Demography

Generally, the number of deaths in the population is less useful than the proportion of individuals in each age class that die between each census period.

Introduction to Demography

The proportion of individuals in each age class that die between each census period is obtained by multiplying the number that died per year by the length of the age class and dividing by the total number of individuals in each age class:

The probability of dying from age class x to age class x+1 is denoted by d_x.

Another important parameter is the probability of surviving from age class x to age class x+1, p_x, which is simply 1-d_x.

Sometimes, survival data are given from birth to the current age class. The probability of surviving from birth to the beginning of age class x is generally written as l_x.

Fortunately, one can determine l_x from p_x and vice versa, using the fact that:

Introduction to Demography

Fertility:

Finally, we must determine the number of babies that individuals in each age class have:

These figures are for all births. We are following females, however, and only 195,916 of these births (48.96%) were girls. [A male-biased sex ratio at birth is a general phenomenon among humans.] We'll use this proportion to esimate the number of female offspring born in 1991 to mothers in each age class.

Introduction to Demography

Again, the total number of daughters born is less useful than the number of daughters born per female in each age class during the census period.

This is obtained by multiplying the number of daughters born by the length of the age class and dividing by the total number of females in each age class:

The number of female offspring per female in age class x is denoted by m_x.

Notice that the total period fertility rate ( m_x) is only 0.88. That is, even if all females survive from birth until the end of their reproductive period, the number of females born to a cohort of women will not fully replace that cohort.

With these parameters of a population in hand, we can turn next to analysing how a population will change over time.

Introduction to Demography

In summary, we have a whole bunch of data about the Canadian population, from which we can infer:

n_x,t the number of females at time t in age class x
d_x the probability of dying from age class x to age class x+1
p_x the probability of surviving from age class x to age class x+1 (p_x = 1-d_x)
l_x the probability of surviving from birth to the beginning of age class x
m_x the number of offspring per female in age class x

Following Hastings, we will assume that individuals censused within an age class are actually at the beginning of that age class.

In particular, the number of individuals within the 0-4 age class will be the total number of individuals born since the previous census. The number of individuals in the 5-9 age class will then be those that were in the 0-4 age class and that survived as 0-, 1-, 2-, 3-, and 4-year olds.

Other texts make different assumptions, e.g. that censused individuals are at the end of their age class.

Introduction to Demography

Step 1: The first step is to determine the transition matrix for the population from census point to census point.

First, the females in age class x reproduce (at rate m_x). The females then move up to age class x+1 if they survive to the next census (with probability p_x ).

Therefore, given the population composition at census t, at census t+1

_x,t

(2) the number of individuals in other age classes is n_x-1,t p_x-1

(3) the number of individuals in the last age class () is n_-1,t p_-1 + n_,t p, since some individuals live past 90.

Population changes can more easily be described using matrix notation:

L is known as a Leslie Matrix for population projection.

Introduction to Demography

We can then proceed through our steps to find the general solution:

Step 2: Determine the eigenvalues of the Leslie matrix.

Step 3: Make a diagonal matrix, D, with one eigenvalue in each of the diagonal positions.

Step 4: Determine the eigenvectors associated with each eigenvalue.

Step 5: Make a transformation matrix, A, whose columns are the eigenvectors (placed in the same order as the eigenvalues in matrix D).

Step 6: Write the general solution of the linear equations as:

For large matrices, however, this can be a real mess! So we will use a simpler approximation.

Introduction to Demography

If we place the leading eigenvalue (i.e. the eigenvalue with largest magnitude) in the first position, D^t will become more and more similar over time to the following matrix:

That is, because ₁ is larger than all the other eigenvalues, ₁^t will be much larger than _i^t (for i other than 1), and the influence of these other eigenvalues will become negligible after enough time has passed.

Note: Whether or not this will give a sufficiently accurate approximation depends on how much time has passed and on whether the other eigenvalues are near the leading eigenvalue in magnitude.

An approximate projection of the population is then:

where is the first column of the A matrix and is the first row of the A^-1 matrix.

Note: The first column of A is also called the right eigenvector, which must solve L = . Similarly, the first row of A^-1 is called the left eigenvector, which must solve L = . To use this approximate solution, the length of the left eigenvector must be adjusted so that = 1.

The left eigenvector contains information about the "reproductive value" of each age class. When multiplied by the initial population vector, it gives the initial population size adjusted for the fact that some age classes (e.g., teenagers) have greater reproductive value than others (e.g., post-menopausal).

Once the general solution is found, the iteration is simple: just multiply by ₁ every generation.

Introduction to Demography

Let's first analyse the Canadian data using a simplified version of the census data.

We'll group ages from 0-29, 30-59, and 60+ to get only three classes. Note that the analysis will be less accurate with fewer age classes.

The general Leslie matrix for this case would be:

For the Canadian population, I get:

p = {0.9851, 0.9348, 0.1399}

n₁₉₉₁ = {5798100, 5460500, 2430500}

One way of finding out the age distribution is to numerically power up the matrix and calculate

to get:

n₂₀₂₁

n₂₀₅₁ = {4618060, 4931180, 6101160} [in two censuses]

n₂₀₈₁ = {4159900, 4549370, 5463250} [in three censuses]

This indicates that the proportion of females in the oldest age class (60+) should have risen rapidly from 18% in 1991 to 34% in 2021 to 39% in 2051, where it should remain roughly constant.

It also indicates that the population size of females should have grown by 18% between 1991 and 2021, but decrease by 3% between 2021 and 2051, decrease by 9% between 2051 and 2081, and by roughly 9% every 30 years thereafter.

Currently, the population is very young and is growing more rapidly because of this. As the Canadian population ages, its growth rate will decline.

Introduction to Demography

Let's now use the simpler method described above, where we assume that the leading eigenvalue dominates the system.

For this matrix, the eigenvalues are {0.9064, -0.3231, 0.1399}.

After an initial period of time, a linear system will approach a stable distribution, where the ratio of individuals in each class is constant and given by the right eigenvector associated with the leading eigenvalue.

In this case, the right eigenvector associated with the leading eigenvalue is: = {0.2931, 0.3185, 0.3884} [This eigenvector has been normalized to sum to one.]

In demography, this distribution is known as the STABLE AGE DISTRIBUTION.

Once at the stable age distribution, every age class and hence the whole population will be multiplied by the leading eigenvalue every census.

In this example, the leading eigenvalue is 0.9064, indicating that the population will shrink by 9.36% every census.

We can then use these values to project the population into the future.

Introduction to Demography

To use the approximate projection, we first find the left eigenvector: = {2.5156, 0.8250, 0} (chosen such that = 1). This indicates that 0-29 year olds have about three times the reproductive value of 30-59 year olds, and that the reproductive value of females over 60 is zero.

Adjusting the initial population vector by the reproductive value gives n₁₉₉₁ = 1.91 10⁷. This is larger than the female population size in 1991 of 1.37 10⁷, because the 1991 population is fairly young relative to the stable age distribution.

Our estimates of the future population are then ^t times the stable age distribution, = {0.2931, 0.3185, 0.3884}, times the adjusted initial population size of 1.91 10⁷.

n₂₀₂₁

n₂₀₅₁ = {4596840, 4995890, 6092580} [in two censuses]

n₂₀₈₁ = {4166750, 4528460, 5522540} [in three censuses]

This correctly predicts that the proportion in the oldest age class (60+) should be 39%.

This approximation suggests that the population size of females should grow by 26% (rather than 18%) between 1991 and 2021, then decrease by 9% (rather than 3%) between 2021 and 2051, and then decrease by 9% every 30 years thereafter.

Although this method should technically only work well for longer time periods, it still does very well in this case even over the short time period measured here.

After only two censuses, the approximate method performs nearly as well as the recursions using the full Leslie matrix. There is a short, initial phase, however, in which the non-leading eigenvalues still exert an influence on the population dynamics.

The phase is particularly short in this example, since the non-leading eigenvalues are all small (-0.3231 and 0.1399) and so these terms rapidly become negligible in the general solution.

In summary: the leading eigenvalue does estimate long term growth, its right eigenvector does estimate the long term ratio between classes, but the short term growth can be influenced by the fact that the population isn't yet at a stable age distribution.

Introduction to Demography

The above analysis with only three age classes provides only a crude demographic picture of the Canadian population.

Let's now return to an analysis using the full data matrix.

Introduction to Demography

This is an example where Mathematica comes in really handy!

The leading eigenvalue for this matrix is 0.9772.

It's right eigenvector gives the stable age distribution: = {0.05059, 0.05140, 0.05256, 0.05374, 0.05489, 0.05607, 0.05724, 0.05841, 0.05953, 0.06052, 0.06130, 0.06171, 0.06146, 0.06025, 0.05747, 0.05260, 0.04422, 0.04603} (normalized to sum to one).

This tells us many things. There will about 5.1% of the population under age 5 at the stable age distribution and 20.8% under age 20. Similarly, there will be 20.0% of the population over age 70 and 32.2% over age 60.

We can compare these figures to their 1991 values of 9.2% of the population over age 70 and 17.8% over age 60, to infer that the population will in the future be composed of nearly twice as many older individuals then there were in 1991, expressed as a fraction of the population.

Furthermore, when the population does reach the stable age distribution, it will shrink by 2.28% during every five year census period.

The left eigenvector of reproductive values is = {3.2481, 3.1968, 3.1265, 3.0554, 2.7764, 2.0597, 0.9944, 0.2724, 0.0339, 0.0014, 0, 0, 0, 0, 0, 0, 0, 0} (normalized so that = 1). When multiplied by the vector of female population size from 1991, the adjusted initial population size becomes n₁₉₉₁ = 1.82 10⁷ (again larger than the initial female population size of 1.36 10⁷).

Taking this adjusted initial population size, 1.82 10⁷, and multiplying by the stable age distribution times (0.9772)^t provides an estimate of the female age distribution in future censuses.

Of course, with Mathematica, it is straightforward to iterate the Leslie matrix to predict future population states.

Introduction to Demography

Finally, we can do a check by comparing the expected age distribution in 1996 to the actual census:

This comparison shows that our projection falls short a bit. Firstly, the actual population growth was 10.4% over this five year period whereas the Leslie matrix projection estimated only 3.9% growth. Secondly, the shapes of the two age distributions are different.

These discrepancies are due to three main factors: (1) immigration and emigration, (2) too coarse an age distribution, and (3) a changing life table. In fact, we know that the net immigration rate over this time period was approximately 500,000 females. Furthermore, the birth rate has continued to drop over this period, especially for women under 30.

Back to biology 301 home page.