Q: I have really large clades (say 1000+ species); is there any way that I can use the "unresolved clade BiSSE" with my tree?
I don't think that the approach that I use for the unresolved clades will scale to any more than about 200 species (even at that limit, it can be pretty poorly behaved). There are a couple of reasons for this:
First, the time requirements grow exponentially (from memory space grows something like n^4, which is pretty nasty, too). Basically, this means that even if you could find the computational power to do a clade of size n, a clade of size n+1 could take a thousand years, perhaps. This is the bane of a huge number of computational problems.
Second, and more subtly, there is a machine precision issue.
Floating point arithmetic can be unreliable for numbers smaller
than about 1e-8, and totally useless for numbers smaller than
1e-15 (try (1 + 1e-16) - 1
in R on most platforms
to see what I mean). Because you are spreading a single unit of
probability over an increasingly large space, a huge number of
the cells (almost certainly including the one that you end up
caring about) will be these really unreliable numbers. I've seen
this creep in for clades that are smaller than 200 species where
there is moderate extinction rates.
Q: I am getting positive log likelihood values! I thought that log likelihoods had to be negative - is this a problem with my tree or with diversitree?
The log likelihood is just proportional to the probability of observing the data, up to some unknown normalising constant.
For BiSSE-style models, these generally arise in trees that have a very short root-tip distance. This means that per unit tree time the speciation rates must be very large (on the order of log(N)/t) for a tree with N tips and root-tip distance t. At each node, the conditional likelihoods are multiplied by the speciation rates, so there is a multiplication by ((N-1)λ).
If these bother you, just multiply the branch lengths of your tree:
phy$edge.length <- phy$edge.length * 100
The estimated rates will now be a factor of 100 smaller, and the log likelihoods will probably be negative.
Q: I get this warning message when running diversitree:
Warning message: In make.ode(derivs, dll, initfunc, neq, safe) : diversitree is not known to work with deSolve> X.XX.X falling back on slow version
It does seem to run slow. Is there a solution to this, and is this why it is running slow?
This error message appears whenever the installed deSolve version is more recent than the current diversitree.
deSolve's ode solvers look up the memory address of the derivative functions every time they are evaluated. This is a nontrivial operation, and happens for every branch on a tree -- thousands of times during an ML search or MCMC chain. To get around this, I use non-exported Fortran functions in deSolve directly, and remember the address of the derivative functions after the first lookup. However, if the definitions of these change R will crash (not an error, but a complete crash). When the deSolve version is not known to work, this caching is skipped, and the calculations slow down.
To work around this problem, there are two options:
method="CVODES"
to the make.xxx function to use a different backend. This
does not work on windows.