TaxonScrubber Updates - June 21, 2006

What is TaxonScrubber?

SALVIAS TaxonScrubber is a stand-alone application for automated standardization of taxonomic names. In addition to removing spelling errors in species names, TaxonScrubber splits concatenated information into separate fields, and can be used to restructure flat-file specimen data prior to importing to a relational database. Although designed primarily for standardizing inventory data for the SALVIAS plots database, TaxonScrubber can be used whenever large numbers of taxonomic records need to be error-checked and reformated.

How TaxonScrubber works

TaxonScrubber performs four basic actions:

  1. Splitting of concatenated fields.  Epithets and authorities contained in single fields are split into separate fields. For example, the input string "Quercus alba L." is split into three fields, Genus = "Quercus", Species_epithet = "alba", Sp_auth = "L.". TaxonScrubber can splits up to two subspecific levels off of a single name (e.g., Quercus alba var. gunnisonii Torr. fo. Rugosa).
  2. Recognition and removal of standard annotations.  TaxonScrubber contains an extensive library of Latin and English botanical annotations, their spelling variants, and abbreviations. Annotations such as "cf.", "aff.", "vel. sp. aff.", etc., are removed and stored in a separate field. Informal annotations of uncertainty, such as question marks, are treated as "cf." Any text not recognized as a standard annotation is stored in an additional annotation field, and flagged for inspection by the user.
  3. Standardization of spelling. Once fields have been split, and extraneous text removed, TaxonScrubber matches names to a standard list of validly published names (currently, TaxonScrubber uses a world list of plant names; however, later releases of TaxonScrubber will have the option of loading name lists for other taxa). After flagging all names which match to the standard list, TaxonScrubber's "Hand scrub" utility provides pull-down menus for correcting remaining names to the standard world list. Names still unmatched at the end of the process can then be flagged as morphospecies names (e.g., Miconia sp.3), or as indets (e.g., Miconia sp.).
  4. Standardization of higher taxonomy. TaxonScrubber standardizes all family names to match taxonomic concepts and spellings of the Missouri Botanical Garden's TROPICOS database. Future versions will allow the user to update higher taxonomy according to alternative taxonomic concepts (for example, APG familial concepts; see The Angiosperm Phylogeny Website).

During the scrubbing process, TaxonScrubber generates new fields containing the results of the splitting and cleaning process, and various "flag fields" indicating the status of each name component (Family, genus, specific epithet, etc). These fields may be retained or deleted as needed upon export of the formatted the cleaned file.

Other TaxonScrubber features

  1. File management. TaxonScrubber imports, names, backs up, and manages source files within the database environment. Original files are left untouched until the user has completed the scrubbing process, and chooses to export the scrubbed file and replace the original.
  2. Archiving of source names. Prior to scrubbing, TaxonScrubber archives the original names, unchanged, for comparison with the "srubbed versions". After scrubbing, these fields can be deleted--or not--at the user's discretion.
  3. Hand-scrubbing. TaxonScrubber features tools for manual inspection of taxonomic fields, including filters which display only records containing selected standard annotations, and matching to pull-down menus of standard names or names within the original file.

New with TaxonScrubber 2.0

  1. Table names read directly from database window. The intermediate 'Choose source table' form has been eliminated. Actions are now performed directly on the source tables as selected from the list in the home screen of TaxonScrubber.
  2. Clickable View/Edit. View/Edit is the default option upon double-clicking any table in the home screen of TaxonScrubber.
  3. Flagging accepted names. TaxonScrubber can now label names as 'accepted/not accepted' if this information is provided in original taxonomic source.

Download the latest version of TaxonScrubber: Version 2.0 (updated June 2006)

TaxonScrubber 2.0 features faster name parsing and matching, and corrects an error in the application of standardized family names.

Installation Notes:

Download TaxonScrubber version 2.0:

Download taxonomic reference databases (TaxonScrubber ver. 2.0 format):

Citing TaxonScrubber 2.0:

Boyle, B.L. 2006. TaxonScrubber, Version 2.0. The SALVIAS Project, (Accessed [date_downloaded]).

Comments or questions concerning TaxonScrubber? Please contact Brad Boyle,

