Tag Archives: graphics in publications

12 Publishing Mistakes to Avoid

Graduate students probably feel they are given too much advice on their career goals, but it might be useful to list a few of the mistakes I see often while reviewing papers submitted for publication. Think of it as a cheat sheet to go over before final submission of a paper.

  1. Abstract. Write this first and under the realization that 95% of readers will only read this part of your paper. They need in a concise fashion the whole story, particularly for any data paper WHAT, WHERE, WHEN, HOW and WHY.
  2. Graphics. Choose your graphics carefully. Show them to others to see if they get the point immediately. Label the axes carefully. ‘Population’ could mean population size, population density, population index, or something else. ‘Species diversity’ could mean anything from the vast array of species diversity measures.
  3. Precision. If you are plotting data, a single point on a graph is not very informative without some measure of statistical precision. Dot plots without a measure of precision are fraudulent. Indicate at least in the figure legend what exact measure of precision you have used.
  4. Colour and Symbol Shape. If you have 2 or more sets of data, use colour and different symbol shapes to distinguish them. Check that the size of symbols is adequate for the reductions they will use in the journal printing. Journals that charge for colour will often print in black and white for free but use the colour in the PDF version.
  5. Histograms. Use histograms freely in your papers by only after reading Cleveland (1994) who recommends never using histograms. More comments are given in my blog ” On Graphics in Ecological Presentations”.
  6. Scale of Graph. if you wish to cheat there are some simple ways of making your data look better. See Cleveland et al. (1986) for a scatter-plot example.
  7. Tables. Tables should be simple if possible. Columns of meaningless numbers do not help the reader understand your conclusions. Most people understand graphs very quickly but tables very slowly.
  8. Discussion. Be your own critic lest your reviewers do this job for you. If some published papers reach conclusions other than you have, discuss why this might be the case. Recognize that no one study is perfect. Indicate where future research might go.
  9. Literature Cited. Check that all your literature cited in the paper are in the bibliography and none are missed. Check the required format of the references since many editors go into orbit if you use the wrong format or fail to include the doi.
  10. Supplementary Material. Consider carefully what you put in supplementary material. Standards are changing and simple excel tables of mean values are often not enough to be useful for additional analysis.
  11. Covering Letter. A last minute but critical piece of the puzzle because you need to capture in a few sentences why the editor should have your paper reviewed or decide to send it right back to you as not of interest. Remember that editors are swamped with papers and rejection rates are often 60-90% at the first cut.
  12. Select the Right Journal. This is perhaps the hardest part. Not everything in ecology can be published in Science or Nature, and given the electronic world of the Web of Science, good work will be picked up in other journals. If you have millions, you can use the journals that you must pay to publish in, but I personally think this is capitalism gone amok. Romesburg (2016, 2017) presents critical data on the issue of commercial journals in science. Read these papers and put them on your Facebook site.

 

Cleveland, W.S., Diaconis, P. & McGill, R. (1982) Variables on scatterplots look more highly correlated when the scales are increased. Science, 216, 1138-1141. http://www.jstor.org/stable/1689316

Cleveland, W.S. (1994) The Elements of Graphing Data. AT&T Bell Laboratories, Murray Hill, New Jersey. ISBN: 9780963488411

Romesburg, H.C. (2016) How publishing in open access journals threatens science and what we can do about it. Journal of Wildlife Management, 80, 1145-1151. doi: 10.1002/jwmg.21111

Romesburg, H.C. (2017) How open access is crucial to the future of science: A reply. Journal of Wildlife Management, 81, 567-571. doi: 10.1002/jwmg.21244

 

What do the Data Points Mean?

In Statistics 101 we were told that each data point in a scatter plot should have a precise meaning. Hopefully all ecologists agree with this, and if so I proceed to ask two questions about the ecology literature:

  1. What fraction of scatter plots in ecology papers define what the dots on the plot mean? Are they individual measurements, are they means of several measurements? Are they predictions from a mathematical model?
  2. Given that we know what the dots are, are we shown confidence limits for the points, or do we assume they are absolutely precise with no possible error?

With these two simple questions in mind I did a short, non-random search of recent ecology journals. Perhaps if a graduate ecology class is reading this blog, they could do a much wider search so that we might even be able to tell some of the editors of our journals how they score on Statistics 101 Quiz # 1. I went through 3 issues of Ecology (2015, issues 4, 5, and 6), 3 issues of the Journal of Animal Ecology (2015, issues 4 to 6), and 3 issues of Ecology Letters (2016, issues 1, 2, and 3). I scored each figure in each paper. The first question above is harder to score, so I divided the answer into three groups: clearly defined in figure legend, not defined in figure legend but clear in the paper itself, and not clearly defined anywhere. I kept the second question above on a simpler scale by asking if there were or were not confidence limits or S.E. on the dots in the scatter diagram. I considered histogram bars as ‘data points’ equivalent to scatter plots and scored these with these same 2 questions. I scored figures with multiple plots in the same figure as just one data source for my survey. I ignored maps, simulation data, and papers with only models. I got these results:

    Data points Confidence Limits or S.E.
Journal Number of papers Clearly defined in figure legend Yes No
Ecology 80 179
(95%)
98
(50%)
96
(50%)
Journal of Animal Ecology 84 195
(98%)
119
(60%)
81
(40%)
Ecology Letters 33 64
(94%)
29
(43%)
39
(57%)

The good news is that virtually all the data points in figures that contained empirical data were clearly defined, so the first question was not problematic. The potentially bad news is that around half of the data figures did not contain any measure of statistical precision for the data points.

There could be many reasons why confidence limits could not be applied to data points on graphs in papers. In some cases it would clutter the plot too much. In other cases the data points are completely accurate and have no error although this might be unusual in ecological data. Whatever the reason, some mention of the reason should be given in the text or the figure legend.

There were many limitations to this brief survey. It is clear that some subdisciplines of ecology adhere to Statistics 101 recommendations more carefully than others, but I did not tally these subdisciplines. One could make a thesis out of this sort of tally. Often I could not decipher if the data point was for an experimental unit or for a sampling unit but I have not analyzed for this error here.

So what do we conclude from this non-random survey? The take home message for authors is to make sure that the data points or histograms in their published figures are clearly defined in the figure legend and include if possible some measure of probable error. The message for reviewers and journal editors is to check that data points presented in submitted papers are properly identified and labeled with some measure of precision.

On Graphics in Ecological Presentations

In the greater scheme of things, how you plot your data in a paper or in a PowerPoint presentation may not be the most important thing to worry about. But if you believe that small things matter, perhaps you should read on. The standard of presentation of data in graphs in ecological presentations is often less good than is desirable. Many authors have tried to help and for more instructions please read Cleveland (1993, 1994).

Begin with a few elementary rules that I should not have to state but are often ignored:

  1. Label the axes and give the units of measure
  2. Do not use a font size that requires a microscope to read.
  3. Do not present point data without some measure of possible error.

Beyond these general rules there are many that become more specific. I want to call attention here to two rules that are often violated even in our best ecological journals. The first and simplest is never to plot in logs. It is bad enough to plot an axis in log-10 units (most people can work out that 2 in log-10 means 100 in real units), but I have never met anyone who can decipher log-e units (what does 4.38 in log-e units mean in real units?). The solution is simple. Label the scales in real units so that for example the scale may read 1-10-100-1000 with equal spacing so the axis is scaled in logs but the units are given in real measurements. In this way the reader has some idea of the scale of changes shown on the graph.

The second and perhaps more controversial problem I find with ecological graphics is the use of histograms for data that should be illustrated as point estimates (with confidence limits). If we take the advice of Cleveland (1993, page 8) histograms would be rare in scientific publications:

“The histogram is a widely used graphical method that is at least a century old. But maturity and ubiquity do not guarantee the efficacy of a tool……The venerable histogram, an old favourite, but a weak competitor, will not be encountered again [in this book].” (Cleveland 1993, p. 8)

He goes on to evaluate a whole array of graphical methods most of which are rarely seen in ecological papers. The box plot is perhaps the most common example he recommends and is available in many graphing packages. But note that EXCEL is not a very good standard for graphics, and while some if its graphics might be useful, caution is recommended. Many graphics options are available in R (http://www.r-project.org/ ) and some in SIGMAPLOT. Discussions about graphics packages on the web are extensive and everyone has their favourite package along with complaints about other packages. The general point is to think carefully about the graphics you use to convey your message to make it as clear as possible.

What exactly is wrong with histograms? They are misleading if the scale of the axis does not start at zero. The width of the bars is misleading if the scales are categories or precise values. The information in each histogram bar is entirely concentrated in the top of the bar and the included error bars. The amount of replication is difficult to evaluate, and distributions of data that are skewed are not presented. Finally, outliers are not identified. Perhaps the message is that if you have data that you think should be presented as a histogram, check Cleveland (1994) to see if there is not a better way to present it to your audience.

A final observation on graphics. I realize that at the present time in movies and games 3-D images and animations are quite incredible. But remember these are for entertainment not for communication. If you think your PowerPoint requires 3-D graphs with animations, be sure to check whether you are aiming more for entertainment than clear communication.

Cleveland, W.S. 1993. Visualizing Data. Hobart Press, Summit, New Jersey.

Cleveland, W.S. 1994. The Elements of Graphing Data. AT&T Bell Laboratories, Murray Hill, New Jersey.