SALVIAS TaxonScrubber

TaxonScrubber Updates - Sept. 21, 2004

What is TaxonScrubber?

SALVIAS TaxonScrubber is a stand-alone application for automated standardization of taxonomic names. In addition to removing spelling errors in species names, TaxonScrubber splits concatenated information into separate fields, and can be used to restructure flat-file specimen data prior to importing to a relational database. Although designed primarily for standardizing inventory data for the SALVIAS plots database, TaxonScrubber can be used whenever large numbers of taxonomic records need to be error-checked and reformated.

How TaxonScrubber works

TaxonScrubber performs four basic actions:

  1. Splitting of concatenated fields.  Epithets and authorities contained in single fields are split into separate fields. For example, the input string "Quercus alba L." is split into three fields, Genus = "Quercus", Species_epithet = "alba", Sp_auth = "L.". TaxonScrubber can splits up to two subspecific levels off of a single name (e.g., Quercus alba var. gunnisonii Torr. fo. Rugosa).
  2. Recognition and removal of standard annotations.  TaxonScrubber contains an extensive library of Latin and English botanical annotations, their spelling variants, and abbreviations. Annotations such as "cf.", "aff.", "vel. sp. aff.", etc., are removed and stored in a separate field. Informal annotations of uncertainty, such as question marks, are treated as "cf." Any text not recognized as a standard annotation is stored in an additional annotation field, and flagged for inspection by the user.
  3. Standardization of spelling. Once fields have been split, and extraneous text removed, TaxonScrubber matches names to a standard list of validly published names (currently, TaxonScrubber uses a world list of plant names; however, later releases of TaxonScrubber will have the option of loading name lists for other taxa). After flagging all names which match to the standard list, TaxonScrubber's "Hand scrub" utility provides pull-down menus for correcting remaining names to the standard world list. Names still unmatched at the end of the process can then be flagged as morphospecies names (e.g., Miconia sp.3), or as indets (e.g., Miconia sp.).
  4. Standardization of higher taxonomy. TaxonScrubber standardizes all family names to match taxonomic concepts and spellings of the Missouri Botanical Garden's TROPICOS database. Future versions will allow the user to update higher taxonomy according to alternative taxonomic concepts (for example, APG familial concepts; see The Angiosperm Phylogeny Website).

During the scrubbing process, TaxonScrubber generates new fields containing the results of the splitting and cleaning process, and various "flag fields" indicating the status of each name component (Family, genus, specific epithet, etc). These fields may be retained or deleted as needed upon export of the formatted the cleaned file.

Other TaxonScrubber features

  1. File management. TaxonScrubber imports, names, backs up, and manages source files within the database environment. Original files are left untouched until the user has completed the scrubbing process, and chooses to export the scrubbed file and replace the original.
  2. Archiving of source names. Prior to scrubbing, TaxonScrubber archives the original names, unchanged, for comparison with the "srubbed versions". After scrubbing, these fields can be deleted--or not--at the user's discretion.
  3. Hand-scrubbing. TaxonScrubber features tools for manual inspection of taxonomic fields, including filters which display only records containing selected standard annotations, and matching to pull-down menus of standard names or names within the original file.

New with TaxonScrubber 1.1

  1. Matching of names to alternative taxonomies and synonymized lists. Version 1.1 of TaxonScrubber permits loading of alternative reference taxonomies, and provides information on name status and synonymy when the source taxonomy is also synonymized. Currently available lists includes a provisional list of vascular plant species of the world (invalid names flagged, but no synonymy) and a synonymized checklist of the gymnosperms and flowering plants of Peru. See below for more details). Other regional and monographic reference taxonomies in preparation will be released over the coming months.
  2. Matching to standard names includes authorities. The first version TaxonScrubber matched names only down to the specific epithet. Version 1.1 will optioinally match authorities as well, when these are present in the source data and reference taxonomies.
  3. Improved parsing of name compenents. Several parsing problems with the first release of TaxonScrubber have been corrected. For example, TaxonScrubber now correctly distinguishes by context various identical abbreviations of "filius" ("son of", a component of authority citations such as L.f.), and "forma" (a rank indicator).
  4. More compact taxonomic reference tables. Reorganization of taxonomic reference tables and the queries has reduced the size of the taxonomy database by half.

To view the interface of TaxonScrubber, click on the thumbnail...


Download the latest version of TaxonScrubber: Version 1.2 (updated September 2004)

TaxonScrubber 1.2 features faster name parsing and matching, and corrects an error in the application of standardized family names.

Installation Notes:

Download TaxonScrubber version 1.2:

Download taxonomic databases (updated for TaxonScrubber ver. 1.2):

Currently we offer two taxonomic databases: a world list of vascular plant names from the Missouri Botanical Garden's TROPICOS database (TROPICOS) anda synonymyzed checklist of the vascular plants of Peru (based on Brako and Zarucchi 1993. In the future we may offer additional taxonomic reference lists, including taxa other than plants.

Citing TaxonScrubber 1.2:

Boyle, B.L. 2004 Sep 21. TaxonScrubber Version 1.2. The SALVIAS Project, http:/www.salvias.net. (Accessed [date_downloaded]).

Comments or questions concerning TaxonScrubber? Please contact Brad Boyle, bboyle@email.arizona.edu




© The SALVIAS Project 2003