Three C·I·B researchers, Mark Robertson, Cang Hui and Vernon Visser developed a new R package that can be
used for assessing and improving the quality of datasets consisting of occurrence records.
The package can be used to identify likely alternative positions for points that represent obvious errors in
Museums and herbarium collections provides records of where species occurred, which are often used for mapping
biodiversity patterns. These collections datasets are freely available and are becoming easily accessible through portals such as the
Global Biodiversity Information Facility (http://gbif.org/). Unfortunately these datasets
contain many errors and suffer from several data quality issues. Despite the large number of users of these datasets there are only a
few software tools dedicated to error detection and correction of such datasets.
The package, called biogeo includes features such as
error detection, such as mismatches between the recorded country and the country where the record is plotted, records of terrestrial
species that fall into the sea and outlier detection. A key feature of the package is the ability to identify likely alternative positions
for points that represent obvious errors in the dataset and functions to explore records in geographical and environmental space in order
to identify possible errors in the dataset. Functions are also available for converting coordinates that are in various text formats into
degrees, minutes and seconds and then into decimal degrees.
The package was developed for the R environment, so at least some experience with R is useful, but is not essential. The
package comes with a tutorial that is aimed at the first-time user that provides examples of how to use the various functions in the
package to detect and correct errors in collections datasets.
The package is available from the Comprehensive R Archive Network https://cran.r-project.org/
A paper describing common data quality issues and highlighting the features of the package was published in the journal,
Read the paper:
Robertson, M. P., Visser, V. and
Hui, C. 2016. Biogeo: an R package for assessing and improving data quality of occurrence record datasets. – Ecography
39: DOI: 10.1111/ecog.02118.
For more information, contact Mark Robertson at email@example.com