Biological name normalization
Taxonomy Normalizer Agent
agents.validation.taxonomy_normalizer
Normalizes messy organism, taxon, and functional-group names into reviewable workflow data. It preserves original submitted names, explains merge and keep decisions, flags ambiguous cases, and returns compact hierarchy context for downstream use.
When to use it
Good fit
- Cleaning user-submitted, imported, or extracted organism lists before creating records.
- Preparing organism names for downstream search, image, review, or knowledge-base workflows.
- Detecting when a broad group appears beside a more specific member.
- Preserving original names while giving reviewers a normalized label and an explanation.
- Routing ambiguous, overbroad, or unresolved biological names into a human review queue.
Use something else when
- Final taxonomic authority for regulated, clinical, legal, conservation, or publication-grade classification decisions.
- Assuming every functional or ecological group can be resolved to one strict taxon.
- Treating warnings as failed jobs. Warnings usually mean the job succeeded and found names that need judgment.
- Using match_confidence as a statistical probability. It is a calibration hint for workflow routing.
- Discarding original_items_merged or original_name_map when auditability matters.
Use before downstream record creation, search, image selection, education, research curation, data cleanup, or knowledge-base workflows that need normalized biological names with original-name provenance.
Required inputs
- inputs.items
- Array of names to normalize. Each item should include name and may include context and source.
- inputs.items[].name
- Organism, taxon, or functional-group name.
Optional inputs
- inputs.items[].context
- Why the name was selected. Context helps distinguish an organism name from a behavior, structure, or broad biological idea.
- inputs.items[].source
- Caller-supplied origin label, such as user input, extraction, search, import, or review.
- inputs.abstraction_preference
- One of specific, general, or mixed. Defaults to specific.
- inputs.dedup_strategy
- One of keep_specific, keep_general, keep_both_with_link, or merge_content. Defaults to keep_specific.
- inputs.functions_context
- Design, research, or workflow functions that help decide whether a broad group or a more specific organism is more useful.
- configuration.metadata
- Optional caller-owned metadata used to correlate the job with your own workflow state.
What comes back
The completed job result is structured for integration. Preserve citations, warnings, confidence fields, and source context when the agent returns them.
Representative output fields
- job_id
- Identifier used to poll GET /v1/jobs/{job_id}.
- status
- Job state, such as queued, running, completed, failed, or cancelled.
- agent_id
- The invoked agent, here agents.validation.taxonomy_normalizer.
- poll_url
- Relative polling URL for the job result.
- estimated_cost_usd
- Pre-execution estimate. The final charge can differ.
- created_at / completed_at
- Job lifecycle timestamps when available.
- result.normalized_items
- Normalized organism, taxon, or functional-group records.
- result.original_name_map
- Map from original submitted names to normalized output names.
- result.dedup_actions
- Audit trail of merge, keep, remove, or link decisions.
- result.hierarchy_graph
- Compact node and edge view for explaining parent-child relationships.
- result.warnings
- Ambiguity, lookup, or functional-group cautions. These are part of successful results.
- result.recommendations
- Suggested next steps for cleanup, review, or downstream use.
- result.charge_usd
- Customer-visible charge for the completed billable job when a charge applies.
Normalized item fields
- normalized_items[].name
- Display-ready normalized name.
- normalized_items[].taxonomic_level
- Rank or grouping label, such as species, family, or functional_group.
- normalized_items[].scientific_name
- Scientific name when resolved, otherwise null.
- normalized_items[].taxon_id
- Reference identifier when available, otherwise null.
- normalized_items[].parent
- Parent taxon or broad container when available.
- normalized_items[].children
- Child labels included when the result needs to explain hierarchy handling.
- normalized_items[].original_items_merged
- Original submitted names merged into this normalized item.
- normalized_items[].hierarchy_note
- Short explanation of hierarchy, abstraction, or normalization reasoning.
- normalized_items[].evidence_source
- Calibration label for the kind of supporting match or rule used.
- normalized_items[].evidence_url
- Optional reference URL for the match.
- normalized_items[].authority
- Optional label for the reference or rule set used to support the match.
- normalized_items[].evidence_notes
- Notes that help builders decide whether to store automatically or review first.
- normalized_items[].match_confidence
- Qualitative confidence label such as high, medium, or low. Treat it as routing guidance, not a measured probability.
- warnings[].severity
- Machine-readable routing hint such as info, review, or warning.
- warnings[].review_required
- Boolean hint for whether a reviewer should inspect the item before authoritative use.
How output is produced
- Authenticates the API key and confirms agents:invoke permission.
- Checks whether the account can start a billable normalization job before work begins.
- Accepts the request as an asynchronous job and returns job_id, status, agent_id, poll_url, estimated_cost_usd, and created_at.
- Reads a list of names, each optionally paired with context and source information.
- Cleans obvious naming noise and groups repeated variants before deeper review.
- Compares candidate names with a reference taxonomy when a strict taxonomic match is possible.
- Recognizes useful functional or ecological group labels that are not strict taxa.
- Detects parent-child and broad-group-member overlaps across the submitted list.
- Applies the requested abstraction preference and deduplication strategy.
- Returns normalized items, original-name mapping, decision records, hierarchy context, warnings, recommendations, and charge_usd when the completed job is billable.
Trust and caution
- This endpoint prepares names for workflows; it is not a final naming or classification authority.
- The endpoint preserves original names rather than replacing them silently.
- Decision records explain what was merged, kept, removed, or linked.
- Functional and ecological groups can be useful workflow groupings even when strict taxonomy is not appropriate.
- Common names can be ambiguous. Thin context increases the chance that a result needs review.
- A completed job with warnings can still be a good result.
- hierarchy_graph is compact explanatory context, not a full classification tree.
- Generated notes and recommendations should be reviewed before high-stakes use.
Runtime
Runtime usually scales with the number of unique names, hierarchy comparisons, and context length. Short lists often complete quickly, but callers should always build around async polling.
Cost behavior
The queued response includes estimated_cost_usd for display or logging before completion. Use charge_usd from the completed result for final customer-visible accounting.
Billing notes
- The queued response includes estimated_cost_usd as a preflight estimate.
- Completed billable jobs include charge_usd in result. Treat charge_usd as the customer-visible final charge.
- The final charge can be lower or higher than the preflight estimate depending on the work performed.
- Organization credits are deducted for billable completed work according to the account's public API pricing.
- Customer-visible billing fields describe what the caller pays; detailed execution accounting is not part of the public response.
{
"inputs": {
"items": [
{
"name": "social insects",
"context": "collective decision-making",
"source": "research extraction"
},
{
"name": "ants",
"context": "trail following and foraging",
"source": "search"
},
{
"name": "termites",
"context": "mound construction",
"source": "search"
}
],
"abstraction_preference": "specific",
"dedup_strategy": "keep_specific",
"functions_context": [
"coordinate foraging",
"build structures"
]
},
"configuration": {
"metadata": {
"client_run_id": "demo-taxonomy-001"
}
}
}{
"job_id": "abba745476294f51883ff964560b335f",
"status": "queued",
"agent_id": "agents.validation.taxonomy_normalizer",
"poll_url": "/v1/jobs/abba745476294f51883ff964560b335f",
"estimated_cost_usd": 0.12,
"created_at": "2026-05-10T19:05:51Z"
}{
"job_id": "abba745476294f51883ff964560b335f",
"agent_id": "agents.validation.taxonomy_normalizer",
"status": "completed",
"created_at": "2026-05-10T19:05:51Z",
"completed_at": "2026-05-10T19:06:25Z",
"result": {
"normalized_items": [
{
"name": "ants (Formicidae)",
"taxonomic_level": "family",
"scientific_name": "Formicidae",
"taxon_id": null,
"parent": "Hymenoptera",
"children": [],
"original_items_merged": [
"ant colony"
],
"hierarchy_note": "Original phrase described colony-level behavior; normalized to an organism group while preserving the original wording.",
"evidence_source": "reference_taxonomy",
"evidence_url": null,
"authority": "reference taxonomy",
"evidence_notes": "Family-level match is useful for image selection and organism browsing, but keep the original phrase as context.",
"match_confidence": "medium"
},
{
"name": "mycorrhizal fungi",
"taxonomic_level": "functional_group",
"scientific_name": null,
"taxon_id": null,
"parent": "Fungi",
"children": [],
"original_items_merged": [
"mycorrhizal fungi"
],
"hierarchy_note": "Functional group spanning multiple lineages.",
"evidence_source": "functional_group_rules",
"evidence_url": null,
"authority": "functional group handling",
"evidence_notes": "Useful as a workflow grouping, but too broad for claims that require one taxon.",
"match_confidence": "medium"
}
],
"original_name_map": {
"ant colony": "ants (Formicidae)",
"mycorrhizal fungi": "mycorrhizal fungi"
},
"dedup_actions": [
{
"action": "merged",
"original": [
"ant colony"
],
"result": "ants (Formicidae)",
"reason": "Input was a colony or behavior phrase rather than a clean organism label."
}
],
"hierarchy_graph": {
"nodes": [
{
"id": "Formicidae",
"label": "ants",
"level": "family"
},
{
"id": "Hymenoptera",
"label": "Hymenoptera",
"level": "order"
}
],
"edges": [
{
"from": "Formicidae",
"to": "Hymenoptera",
"type": "IS_A"
}
]
},
"warnings": [
{
"warning_type": "functional_group_ambiguous",
"description": "Mycorrhizal fungi is not a single taxon; it spans multiple fungal lineages and relationship types.",
"affected_items": [
"mycorrhizal fungi"
],
"severity": "review",
"review_required": true
}
],
"recommendations": [
"Use the normalized organism-group label for image selection, and keep the original phrase as context.",
"Refine broad functional groups before making claims that require taxonomic precision."
],
"charge_usd": 0.0523152
}
}Errors and retries
Common failures
- 401: the API key is missing or invalid.
- 403: the key does not have agents:invoke scope or this agent is not available to the caller.
- 402: the account does not have enough credits to start the job.
- 429: the client exceeded the allowed request rate.
- 503: the platform could not complete preflight checks for the job.
- failed: the accepted job could not complete, often because required input was missing, input shape was invalid, normalization failed, or billing could not be completed.
- 404: the job ID is wrong, expired, or not owned by the caller.
Retry guidance
- After a job_id is returned, poll GET /v1/jobs/{job_id} with bounded exponential backoff.
- Honor Retry-After on 429 responses.
- For transient failed jobs, poll the original job_id first so you do not duplicate work that already completed.
- For ambiguity warnings, improve name, context, or functions_context instead of blind retry.
- For broad functional-group warnings, ask the user or reviewer whether a strict taxon is required.
- Do not blindly retry POST /invoke after a network timeout because the public contract does not currently include an idempotency key.
Composition ideas
Normalize before saving records
Collect names from users, documents, or extraction workflows; invoke with context and source; store normalized_items, original_name_map, and dedup_actions; route review_required warnings before creating or merging records.
Normalize before image or search workflows
Send behavior phrases and organism names together, then use normalized_items[].scientific_name or normalized_items[].name for downstream lookup while keeping original_items_merged for captions and audit trails.
Human-in-the-loop taxonomy review
Use warnings[].severity and warnings[].review_required for queue priority, show dedup_actions as reviewer-readable decision history, and persist final reviewer decisions beside the API result.
Classroom or training activity
Ask learners to submit common names, strict taxa, and functional groups, then compare hierarchy_graph, dedup_actions, warnings, and different abstraction settings.
Application fit
- Organism browser cleanup before record creation.
- Research curation queues for literature-extracted organism lists.
- Education tools that show why broad groups and specific organisms should be linked, separated, or reviewed.
- Biomimicry and design exploration workspaces that need clean biological examples before search or media steps.
- Media and image workflows that need imageable organism labels instead of behavior phrases.
- Knowledge-base ingestion with original-name provenance.
- Data-quality dashboards for imported organism lists.
- AI assistants that need a structured normalization step before saving biological examples.
- Evaluation workflows that check generated organism lists for broad categories, duplicates, and missing provenance.