FIGURE 2. The 25 species of ground beetles (Family: Carabidae) used to train and test the classification models. Each specimen shown was cropped from their original photograph and the background was removed. This figure is originally published in Blair et al. (2020) (CC BY 3.0)
Jarrett D. Blair, Kaitlyn M. Gaynor, Meredith S. Palmer, Katie E. Marshall. 2024. A gentle introduction to computer vision-based specimen classification in ecological datasets. Journal of Animal Ecology.
Classifying specimens is a critical component of ecological research, biodiversity monitoring and conservation. However, manual classification can be prohibitively time-consuming and expensive, limiting how much data a project can afford to process.
Computer vision, a form of machine learning, can help overcome these problems by rapidly, automatically and accurately classifying images of specimens. Given the diversity of animal species and contexts in which images are captured, there is no universal classifier for all species and use cases. As such, ecologists often need to train their own models. While numerous software programs exist to support this process, ecologists need a fundamental understanding of how computer vision works to select appropriate model workflows based on their specific use case, data types, computing resources and desired performance capabilities. Ecologists may also face characteristic quirks of ecological datasets, such as long-tail distributions, ‘unknown’ species, similarity between species and polymorphism within species, which impact the efficacy of computer vision.
Despite growing interest in computer vision for ecology, there are few resources available to help ecologists face the challenges they are likely to encounter. Here, we present a gentle introduction for species classification using computer vision. In this manuscript and associated GitHub repository, we demonstrate how to prepare training data, basic model training procedures, and methods for model evaluation and selection. Throughout, we explore specific considerations ecologists should make when training classification models, such as data domains, feature extractors and class imbalances.
With these basics, ecologists can adjust their workflows to achieve research goals and/or account for uncertainty in downstream analysis. Our goal is to provide guidance for ecologists for getting started in or improving their use of machine learning for visual classification tasks.