Where can I find out more about TaxonScrubber?

Standalone application for automated standardization of taxonomy names, developed by SALVIAS. TaxonScrubber can also be used to restructure flat-file specimen data before importing it into a relational database by removing spelling errors in species names. TaxonScrubber can be used to error-check and reformat large numbers of taxonomic records, even though it was created primarily for standardizing inventory data for the SALVIAS plots database.

The workings of TaxonScrubber

It has four primary functions:

Concatenated fields can be broken apart. There are now separate fields for epithets and authorities. When a user enters the string “Quercus alba L.”, it is broken down into three fields: Genus, Species epithet, and Author. TaxonScrubber can separate a single name into up to two subspecific levels (e.g., Quercus alba var. gunnisonii Torr. fo. Rugosa).

Recognizing and removing standard annotations from documents. All of the botanical annotations in the TaxonScrubber database are available in Latin and English. “cf.” and other such annotations are removed and stored in a separate field, as are annotations like “aff.” and “vel. sp.” “cf” is used to denote informal annotations of uncertainty, such as question marks. An additional annotation field is created for any text that is not recognized as a standard annotation and flagged for the user’s inspection.

Spelling should be standardized. TaxonScrubber matches names to a list of validly published names after fields have been split and extraneous text removed (currently, TaxonScrubber uses a world list of plant names; however, later releases of TaxonScrubber will have the option of loading name lists for other taxa). Using TaxonScrubber’s “Hand scrub” utility, you can correct any remaining names to be in line with the standard world list after you’ve flagged those that match the standard list. At this point, unmatched names can be classified as morphospecies names (like Miconia sp.3) or indets (like Miconia sp.3) (e.g., Miconia sp.).

Higher taxonomy should be standardized. To ensure that all family names are standardized to the Missouri Botanical Garden’s TROPICOS database, TaxonScrubber is used. Users will be able to make changes to the higher taxonomy in future versions to accommodate new taxonomic concepts (for example, APG familial concepts; see The Angiosperm Phylogeny Website).

There are several “flag fields” that are generated by TaxonScrubber when it is scrubbing a taxon’s name, which indicate the status of the name’s components (Family, genus, specific epithet, etc). The formatted and cleaned file’s export can include or exclude these fields as needed.

Aspects of TaxonScrubber not previously mentioned

Organizing and maintaining a file system. Files imported by TaxonScrubber are renamed, backed up, and maintained by TaxonScrubber in the database. Until the user exports the scrubbed file and replaces the original, the original files are not touched during the scrubbing process.

Keeping track of the names of the people who contributed to the project. TaxonScrubber preserves the original names after scrubbing so that they can be compared to the “srubbed versions”. It is entirely up to the user whether or not to delete these fields after scrubbing them.

Hand-scrubbing. Tools for manual inspection of taxonomic fields are provided by TaxonScrubber, such as filters that display only records containing selected standard annotations, and matching to pull-down menus of standard names or names contained in the original file.

TaxonScrubber 1.1 includes a brand-new feature.

Synonyms and alternative taxonomies are matched to names. When the source taxonomy is also synonymized, TaxonScrubber version 1.1 allows for the loading of alternative reference taxonomies. In the current list, there is a provisional list of vascular plant species of the world (invalid names flagged, but no synonymy) as well as a synonymized checklist of the gymnosperms and flowering plants from Peru. See below for additional information. Over the next few months, additional regional and monographic reference taxonomies will be released.

Authorities are included in the standard name match. When it was first released, TaxonScrubber could only match epithets. It is possible that authorities will be matched as well, if they are present in the source data and reference taxonomies in Version 1.1.1.

The parsing of name components has been improved. A number of parsing issues that existed in TaxonScrubber’s initial release have been addressed. Abbreviations like “forma” and “filius” are now correctly distinguished by context in TaxonScrubber, for example (a rank indicator).

Taxonomic reference tables that are more compact. The taxonomy database has been halved in size thanks to a reorganization of reference tables and queries.