
SALVIAS TaxonScrubber
TaxonScrubber Updates - June 21, 2006
- TaxonScrubber Version 2.0 now available for download
- New names databases now available for download:
- IPNI World Plant Checklist
- Plants of Peru
- Plants of Costa Rica
- Birds of North and Middle America
- See below for more information
What is TaxonScrubber?
SALVIAS TaxonScrubber is a stand-alone application for automated standardization of taxonomic
names. In addition to removing spelling errors in species names, TaxonScrubber splits concatenated
information into separate fields, and can be used to restructure flat-file specimen data prior to importing
to a relational database. Although designed primarily for standardizing inventory
data for the SALVIAS plots database, TaxonScrubber can be used whenever large numbers of taxonomic
records need to be error-checked and reformated.
How TaxonScrubber works
TaxonScrubber performs four basic actions:
- Splitting of concatenated fields. Epithets and authorities contained in single fields are
split into separate fields. For example, the input string "Quercus alba L." is split into three fields,
Genus = "Quercus", Species_epithet = "alba", Sp_auth = "L.". TaxonScrubber can splits up to two
subspecific levels off of a single name (e.g., Quercus alba var. gunnisonii Torr. fo. Rugosa).
- Recognition and removal of standard annotations. TaxonScrubber contains an extensive library of Latin
and English botanical annotations, their spelling variants, and abbreviations. Annotations such as "cf.", "aff.",
"vel. sp. aff.", etc., are removed and stored in a separate field. Informal annotations of uncertainty, such as
question marks, are treated as "cf." Any text not recognized as a standard annotation
is stored in an additional annotation field, and flagged for inspection by the user.
- Standardization of spelling. Once fields have been split, and extraneous text removed, TaxonScrubber matches
names to a standard list of validly published names (currently, TaxonScrubber uses a world list of
plant names; however, later releases of TaxonScrubber will have the option of loading name lists for other taxa).
After flagging all names which match to the standard list, TaxonScrubber's "Hand scrub" utility provides pull-down
menus for correcting remaining names to the standard world list. Names still unmatched at the end of the process can then be
flagged as morphospecies names (e.g., Miconia sp.3), or as indets (e.g., Miconia sp.).
- Standardization of higher taxonomy. TaxonScrubber standardizes all family names to match taxonomic concepts
and spellings of the Missouri Botanical Garden's TROPICOS database. Future versions will allow the user to update
higher taxonomy according to alternative taxonomic concepts (for example, APG familial concepts; see The Angiosperm Phylogeny Website).
During the scrubbing process, TaxonScrubber generates new fields containing the results of the splitting
and cleaning process, and various "flag fields" indicating the status of each name component (Family, genus, specific epithet, etc). These fields may be retained or deleted as needed upon export of the formatted the cleaned file.
Other TaxonScrubber features
- File management. TaxonScrubber imports, names, backs up, and manages source files within the database
environment. Original files are left untouched until the user has completed the scrubbing process, and chooses
to export the scrubbed file and replace the original.
- Archiving of source names. Prior to scrubbing, TaxonScrubber archives the original names, unchanged, for comparison with the "srubbed versions". After scrubbing, these fields can be deleted--or not--at the user's discretion.
- Hand-scrubbing. TaxonScrubber features tools for manual inspection of taxonomic
fields, including filters which display only records containing selected standard annotations, and matching to
pull-down menus of standard names or names within the original file.
New with TaxonScrubber 2.0
- Table names read directly from database window. The intermediate 'Choose source table' form has been eliminated. Actions are now performed directly on the source tables as selected from the list in the home screen of TaxonScrubber.
- Clickable View/Edit. View/Edit is the default option upon double-clicking any table in the home screen of TaxonScrubber.
- Flagging accepted names. TaxonScrubber can now label names as 'accepted/not accepted' if this information is provided in original taxonomic source.
To view the interface of TaxonScrubber, click on the thumbnail...

Download the latest version of TaxonScrubber: Version 2.0 (updated June 2006)
TaxonScrubber 2.0 features faster name parsing and matching, and corrects an error in the application of standardized family names.
Installation Notes:
- TaxonScrubber is a pc application for MS Access 2000, 2002 or 2003 only. Access must be installed on your machine to run TaxonScrubber. Note: TaxonScrubber will not run on the 2007 version of Access. I am no longer actively developing TaxonScrubber and am unlikely to find the time to resolve the numerous incompatibilities caused by Microsoft's "upgrade" of their product. For now, the only option if you wish to use TaxonScrubber is to use one of the earlier versions of Access.
- To run TaxonScrubber, you will need to download two files: the main application (filename "TaxonScrubber_v20") and a taxonomic database file beginning with the name "TS_Taxon_Tables_").
- Taxonomic databases have been reformatted extensively and will not work with the earlier versions of TaxonScrubber (1.0, 1.1, 1.1b and 1.2); nor will the old reference tables work with TaxonScrubber version 2.0. You will need to download the updated versions of the taxonomic databases as well as the main application.
- Once you have both files on your computer, click on TaxonScrubber. You will be provided with instructions for loading the reference lists into your database.
Download TaxonScrubber version 2.0:
TaxonScrubber (2.6 MB; 11.7 MB uncompressed)
Download taxonomic reference databases (TaxonScrubber ver. 2.0 format):
World plant list. >1.2 million plant names. All unique species, varieties, subspecies, formas, subvarieties and subformas from the IPNI database as of September 1, 2005. Includes Kew, Gray Card Index, and Australian Plant Names Index. Reformatted for TaxonScrubber ver. 2.0 in June 2006. (Warning! This is a very large file: 58 MB; 197 MB uncompressed)
Peru plant names. Based on the synonymyzed checklist of the vascular plants of Peru (Brako and Zarucchi 1993). Re-formatted for TaxonScrubber ver. 2.0 in June 2006. (2.7 MB; 9.9 MB uncompressed)
Costa Rican plant names. Checklist of the vascular plants of Costa Rica, as provided by the National Biodiversity Institute of Costa Rica (INBio). Accessed via the (Atta) database, 1 June 2006. Re-formatted for TaxonScrubber ver. 2.0 in June 2006. (2.4 MB; 9.2 MB uncompressed)
Birds of North America, Mexico, and Central America. American Ornithologist Union (AOU) checklist version 46). Re-formatted for TaxonScrubber ver. 2.0 in June 2006. (115 KB; 788 KB uncompressed)
Citing TaxonScrubber 2.0:
Boyle, B.L. 2006. TaxonScrubber, Version 2.0. The SALVIAS Project, http://www.salvias.net/pages/taxonscrubber.html. (Accessed [date_downloaded]).
Comments or questions concerning TaxonScrubber? Please contact Brad Boyle, bboyle@email.arizona.edu