Most of the likelihood methods in diversitree (and other programs) are prone to be mislead by unreplicated associations; for example

ML doesn't count the number of instances of transitions, just strength of evidence for the alternate models. These are the sort of unreplicated comparisons that the comparative method is meant to avoid, and which are used to motivate Felsenstein (1985). This issue was discussed by Read & Nee (1995) but it is not widely appreciated.

Here, I show a couple of toy examples as to how the problem can arise under multistate Markov model and BiSSE, and how phylogenetically independent contrasts is not immune to this problem (but that it may be easily detectable).

Mkn multitrait / Pagel (1994) discrete

First, load diversitree and simulate a pure birth tree (with rate .1).

library(diversitree)
## Loading required package: deSolve
## Loading required package: ape
## Loading required package: subplex
set.seed(1)
n.spp <- 30
phy <- ladderize(tree.yule(.1, max.taxa=n.spp))

This will be tree used for all examples

plot(phy, show.node.label=TRUE, no.margin=TRUE, cex=.8)
Simulated pure birth tree

Simulated pure birth tree

nd14 will be our target node; below this point, the clade will behave "differently" to the rest of the clade.

target <- "nd14"

Convert this name to Ape's node index:

idx <- match(target, phy$node.label) + n.spp

Identify descendant species of this node

desc <- sort(get.descendants(target, phy, tips.only=TRUE))

Here is our focal group, highlighted in red

plot(phy, tip.color=ifelse(1:n.spp %in% desc, "#ef2929", "black"),
     no.margin=TRUE)
nodelabels(node=idx, pch=19, cex=2, col="#ef2929")
Focal group at node 'nd14'

Focal group at node 'nd14'

Trait 1 is shared across this group, and with nobody else.

t1 <- as.integer(1:n.spp %in% desc)
names(t1) <- phy$tip.label
t1
##  sp1  sp2  sp3  sp4  sp5  sp6  sp7  sp8  sp9 sp10 sp11 sp12 sp13 sp14 sp15 
##    0    0    0    1    0    0    0    0    0    0    0    0    0    0    1 
## sp16 sp17 sp18 sp19 sp20 sp21 sp22 sp23 sp24 sp25 sp26 sp27 sp28 sp29 sp30 
##    1    1    1    1    1    0    0    0    0    1    1    0    0    0    0 

Determine rate coefficients for the evolution of this trait, assuming a Mk2 model:

lik1 <- make.mk2(phy, t1)
fit1 <- find.mle(lik1, rep(.1, 2))
zapsmall(coef(fit1))
##      q01      q10 
## 0.004849 0.000003 

Now, suppose we have two perfectly codistribted traits "a" and "b";

traits <- data.frame(a=t1, b=t1)

Make a 2-trait likelihood function, assuming that the two traits are independent:

lik2 <- make.mkn.multitrait(phy, traits, depth=0)
argnames(lik2)
## [1] "qa01.0" "qa10.0" "qb01.0" "qb10.0"

And fit this model with ML:

p2 <- rep(coef(fit1), 2)
names(p2) <- NULL
fit2 <- find.mle(lik2, p2)

We get the same basic coefficients when treating the traits separately:

zapsmall(coef(fit1))
##      q01      q10 
## 0.004849 0.000003 
zapsmall(coef(fit2))
##   qa01.0   qa10.0   qb01.0   qb10.0 
## 0.004849 0.000003 0.004849 0.000003 
all.equal(coef(fit2)[1:2], coef(fit1), check.attr=FALSE)
## [1] TRUE

Now, allow correlated evolution by allowing the q01 and q10 parameters to vary according to the state of the other trait:

lik3 <- make.mkn.multitrait(phy, traits, depth=1)
p3 <- rep(0, 8)
names(p3) <- argnames(lik3)
p3[argnames(lik2)] <- coef(fit2)

Fit the model

fit3 <- find.mle(lik3, rep(.1, 8))

Basically once trait 'a' has shifted from 0 → 1, we instantaneously shift in state 'b'.

zapsmall(coef(fit3))
##  qa01.0  qa01.b  qa10.0  qa10.b  qb01.0  qb01.a  qb10.0  qb10.a 
##       0       0       0       0       0 7358338     481    -481 

This model is a statistical improvement over the uncorrelated model.

anova(fit3, uncorrelated=fit2)
##              Df  lnLik  AIC ChiSq Pr(>|Chi|)
## full          8  -6.46 28.9                 
## uncorrelated  4 -12.93 33.9  12.9      0.012

Likelihood improvement:

dll <- fit3$lnLik - fit2$lnLik
dll
## [1] 6.463

Simulate 100 traits on the tree with an equal forward/backward transition rate of r

r <- .02
t2 <- replicate(100,
                sim.character(phy, c(r, r), x=0, model="mk2"),
                simplify=FALSE)

And fit these the same way:

f <- function(t2) {
  lik0 <- make.mkn.multitrait(phy, data.frame(a=t1, b=t2), depth=0)
  p0 <- rep(r, 4)
  fit0 <- find.mle(lik0, p0)

  lik1 <- make.mkn.multitrait(phy, data.frame(a=t1, b=t2))
  p1 <- rep(0, 8)
  names(p1) <- argnames(lik1)
  p1[argnames(lik0)] <- coef(fit0)
  fit1 <- find.mle(lik1, p1)

  fit1$lnLik - fit0$lnLik
}

This takes a minute or so, so run it in parallel:

library(multicore)
ans <- unlist(mclapply(t2, f))

Histogram of likelihood improvements, with previous case in red, and 5% cut-off in blue:

par(mar=c(4.1, 4.1, .5, .5))
hist(ans, xlim=range(ans, dll), main="", las=1,
     xlab="Likelihood impovement")
abline(v=dll, col="red")
abline(v=qchisq(1/20, 4, lower=FALSE)/2, col="blue")

Most of the time, a random character would not be significantly associated with our special trait. But two perfectly co-occurring traits have extremely high support for a model that treats their evolution as non-independent, even though this is unreplicated.

BiSSE

With the tree above, fit a BiSSE model

lik <- make.bisse(phy, t1, control=list(method="CVODES"))
p <- starting.point.bisse(phy)
fit.ssd <- find.mle(lik, p)

Fit the state independent model (constrain both speciation and extinction rates to not vary with the trait).

lik.sid <- constrain(lik, lambda1 ~ lambda0, mu1 ~ mu0)
fit.sid <- find.mle(lik.sid, p[argnames(lik.sid)])

There is no significant association between the trait and diversification here:

anova(fit.ssd, sid=fit.sid)
##      Df lnLik AIC ChiSq Pr(>|Chi|)
## full  6 -96.2 204                 
## sid   4 -97.4 203   2.5       0.29

Next, tweak the tree so that the focal clade is apparently different. This is a bit of a pain to do. First, identify all edges descended from that node

i <- phy$edge[,2] %in% sort(get.descendants(target, phy))
plot(phy, no.margin=TRUE,
     edge.color=ifelse(i, "#ef2929", "black"),
     tip.color=ifelse(1:n.spp %in% desc, "#ef2929", "black"))     
Edges descended from focal node

Edges descended from focal node

Compress these red edges towards the present, and lengthen the stem lineage accordingly. Scale branches by a factor 'p' (must be less than 1)

p <- .3

j <- match(idx, phy$edge[,2])
phy2 <- phy
phy2$edge.length[i] <- phy2$edge.length[i] * p
phy2$edge.length[j] <-
  phy$edge.length[j] + branching.times(phy)[target] * (1 - p)

The modified tree is still ultrametric

is.ultrametric(phy2)
## [1] TRUE

Here is the compressed tree:

plot(phy2, no.margin=TRUE,
     edge.color=ifelse(i, "#ef2929", "black"),
     tip.color=ifelse(1:n.spp %in% desc, "#ef2929", "black"))     

Fit BiSSE models to the tree, first with state-dependent diversification

lik2 <- make.bisse(phy2, t1, control=list(method="CVODES"))
p <- starting.point.bisse(phy2)
fit2.ssd <- find.mle(lik2, p)

And without

lik2.sid <- constrain(lik2, lambda1 ~ lambda0, mu1 ~ mu0)
fit2.sid <- find.mle(lik2.sid, p[argnames(lik2.sid)])

There is now significant association between the trait and diversification, even though this is not replicated!

anova(fit2.ssd, sid=fit2.sid)
##      Df lnLik AIC ChiSq Pr(>|Chi|)
## full  6 -86.4 185                 
## sid   4 -91.2 190  9.53     0.0085

Generalise this a bit to look how the likelihood improvement varies with the scaling factor.

g <- function(p) {
  phy2 <- phy
  phy2$edge.length[i] <- phy2$edge.length[i] * p
  phy2$edge.length[j] <-
    phy$edge.length[j] + branching.times(phy)[target] * (1 - p)

  lik2 <- make.bisse(phy2, t1, control=list(method="CVODES"))
  p <- starting.point.bisse(phy2)
  fit2.ssd <- find.mle(lik2, p)

  lik2.sid <- constrain(lik2, lambda1 ~ lambda0, mu1 ~ mu0)
  fit2.sid <- find.mle(lik2.sid, p[argnames(lik2.sid)])
  fit2.ssd$lnLik - fit2.sid$lnLik
}

Run this with scaling parameters varying from .1 (extremely compressed towards the future) and 1 (unchanged from the original tree).

p <- seq(1, .1, length=16)
ans.b <- unlist(mclapply(p, g))

The improvement in the likelihood increases rapidly as the tree deviates more and more from the BD model:

par(mar=c(4.1, 4.1, .5, .5))
plot(ans.b ~ p, las=1, xlab="Scaling factor (1 is unchanged)",
     ylab="Likelihood improvement", type="o", pch=19)
abline(h=qchisq(1/20, 2, lower=FALSE)/2, col="blue")
Likelihood improvement vs scaling factor, with critical value at the 5% level indicated with a blue line.

Likelihood improvement vs scaling factor, with critical value at the 5% level indicated with a blue line.

However, we can partition the tree and allow rates to vary here. First check that this model will fit better without accounting for the state-dependent diversification:

lik.bd <- make.bd(phy2)
fit.bd <- find.mle(lik.bd, starting.point.bd(phy2))

High speciation and extinction rates, compared with the values used to simulate the tree (.1):

coef(fit.bd)
## lambda     mu 
## 0.2076 0.1574 

lik.bd2 <- make.bd.split(phy2, target)
fit.bd2 <- find.mle(lik.bd2, c(.1, 0, .1, 0), method="subplex")

The base partition (black branches in the figures) have rates similar to the simulated values. The focal clade (red in the figures) has strongly elevated rates.

coef(fit.bd2)
## lambda.1     mu.1 lambda.2     mu.2 
##   0.1189   0.0561   0.4650   0.1446 

This improvement is significant;

anova(fit.bd2, unsplit=fit.bd)
##         Df lnLik  AIC ChiSq Pr(>|Chi|)
## full     4 -12.1 32.2                 
## unsplit  2 -15.5 35.0  6.81      0.033

Split BiSSE likelihood function

lik.bs <- make.bisse.split(phy2, t1, target)

Constrain trait evolution over the whole tree:

lik.bs.sdd <- constrain(lik.bs, q01.2 ~ q01.1, q10.2 ~ q10.1,
                        mu1.1 ~ mu0.1, mu1.2 ~ mu0.2)

(I also disallowed state-dependent extinction as that causes problems trying to explain the pattern of trait evolution)

And make a version with no state-dependent diversification:

lik.bs.sid <- constrain(lik.bs.sdd,
                        lambda1.1 ~ lambda0.1,
                        lambda1.2 ~ lambda0.2)

The state-independent version has 6 parameters; two speciation rates (one per partition), two extinction rates, and two character transition rates.

argnames(lik.bs.sid)
## [1] "lambda0.1" "mu0.1"     "q01.1"     "q10.1"     "lambda0.2" "mu0.2"    

Starting point, based on the birth-death fit:

p.sid <- rep(0, 6)
names(p.sid) <- argnames(lik.bs.sid)
p.sid[sub("\\.", "0.", names(coef(fit.bd2)))] <- coef(fit.bd2)
p.sid[c("q01.1", "q10.1")] <- .1
fit.bs.sid <- find.mle(lik.bs.sid, p.sid)

The coefficients here look like the birth-death case, with low rates of 0 → 1 transition, and no reverse (1 → 0) transitions.

zapsmall(coef(fit.bs.sid))
## lambda0.1     mu0.1     q01.1     q10.1 lambda0.2     mu0.2 
##    0.1192    0.0566    0.0044    0.0000    0.4647    0.1441 

Expand this point to allow for state-dependent diversification:

p.sdd <- coef(fit.bs.sid, TRUE)[argnames(lik.bs.sdd)]
fit.bs.sdd <- find.mle(lik.bs.sdd, p.sdd)

This is now not significant (though the improvement in likelihood is greater than I would have thought).

anova(fit.bs.sdd, sid=fit.bs.sid)
##      Df lnLik AIC ChiSq Pr(>|Chi|)
## full  8 -85.4 187                 
## sid   6 -87.8 188  4.62      0.099

I think what is going on here is that currently the stem leading to the focal group is included in that group. However, the (fake) diversification shift happens rather closer to the node. So that aspect of the tree still needs explaining.

Continuous traits and phylogenetically independent contrasts

This is the sort of thing that Felsenstein (1985) warns against, but still causes a problem with continous traits, if Brownian motion is not a suitable model.

set.seed(1)
x <- sim.character(phy, .1)
y <- sim.character(phy, .1)
plot(x, y, col=ifelse(1:n.spp %in% desc, "#ef2929", "black"),
     pch=19, las=1)
Random (Brownian) traits, with focal clade in red

Random (Brownian) traits, with focal clade in red

Compute phylogenetically independent contrast with ape's pic function.:

x.pic <- pic(x, phy)
y.pic <- pic(y, phy)

and run a correlation test (regression through the origin gives essentially the same result).

cor.test(x.pic, y.pic)
## 
##  Pearson's product-moment correlation
## 
## data:  x.pic and y.pic 
## t = -0.2981, df = 27, p-value = 0.7679
## alternative hypothesis: true correlation is not equal to 0 
## 95 percent confidence interval:
##  -0.4151  0.3159 
## sample estimates:
##      cor 
## -0.05727 
## 

Now, replace the values of the focal clade with new trait values, offset by some amount, but evolved under BM with the same rate. First, clip out the subtree corresponding to the focal clade:

desc <- sort(get.descendants(target, phy, tips.only=TRUE))
phy.sub <- drop.tip(phy, phy$tip.label[-desc])

These were the state values at the target node from the original simulation:

x0 <- attr(x, "node.state")[target]
y0 <- attr(y, "node.state")[target]

Offset these values by 3 and continue the simulation, replacing the values in the x and y vectors

x[desc] <- sim.character(phy.sub, .1, x0=x0+3)
y[desc] <- sim.character(phy.sub, .1, x0=y0+3)
plot(x, y, col=ifelse(1:n.spp %in% desc, "#ef2929", "black"),
     pch=19, las=1)
Random (Brownian) traits, after perturbing focal clade

Random (Brownian) traits, after perturbing focal clade

Recompute the independent contrasts:

x.pic <- pic(x, phy)
y.pic <- pic(y, phy)

Do a correlation test (again, regression through the origin gives essentially the same answer).

cor.test(x.pic, y.pic)
## 
##  Pearson's product-moment correlation
## 
## data:  x.pic and y.pic 
## t = 2.328, df = 27, p-value = 0.02766
## alternative hypothesis: true correlation is not equal to 0 
## 95 percent confidence interval:
##  0.04979 0.67430 
## sample estimates:
##    cor 
## 0.4088 
## 

This is actually quite significant; not much less so than the correlation of the raw data.

cor.test(x, y)
## 
##  Pearson's product-moment correlation
## 
## data:  x and y 
## t = 2.601, df = 28, p-value = 0.01467
## alternative hypothesis: true correlation is not equal to 0 
## 95 percent confidence interval:
##  0.09621 0.69154 
## sample estimates:
##    cor 
## 0.4412 
## 

Plotting the contrasts, there is a fairly clear outlier.

par(mar=c(4.1, 4.1, .5, .5))
plot(x.pic, y.pic, las=1, pch=19)
Independent contrasts in the distorted tree

Independent contrasts in the distorted tree

This is the node that is the parent of the focal clade and another smaller group.

While similarly mislead, at least independent contrasts provide a simple way of identifying the misleading nodes.

Bibliography

Felsenstein J (1985). "Phylogenies and the comparative method." The American Naturalist, 125, pp. 1-15.

Pagel M (1994). "Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters." Proceedings of the Royal Society of London Series B, 255, pp. 37-45.

Read AF and Nee S (1995). "Inference from binary comparative data." Journal of Theoretical Biology, 173, pp. 99-108. .

Document details

Document compiled on 2012-05-10 21:41:33 PDT With R version and diversitree version 0.9.4. Original source: unreplicated.R

examples, diversitree home