A standardized visual protocol for freshwater skin disease in bottlenose dolphins: Introducing the Dolphin FRESH assessment

Table of Contents

  1. Key Highlights
  2. Introduction
  3. Why a photographic protocol was necessary
  4. How FRESH was developed: multidisciplinary design and evidence base
  5. The core of FRESH: screening plus a scoring rubric
  6. Primary indicators: what to look for and why they matter
  7. How the Scoring Rubric converts observations into metrics
  8. Practical guidance for photo selection and pre-scoring
  9. Testing and validation: performance of the rubric
  10. Challenges identified and stakeholder priorities
  11. Conservation and management applications
  12. Training, implementation, and technological integration
  13. Limitations and appropriate interpretation
  14. Next steps for research and application
  15. Case studies and real-world examples illustrating FRESH utility
  16. Recommendations for researchers and managers adopting FRESH
  17. Governance and data sharing considerations
  18. FRESH as a platform for collaborative research
  19. FAQ

Key Highlights

  • A multidisciplinary team developed the Dolphin FRESH (Freshwater-Related Evaluation of Skin Health) Protocol to standardize photographic screening and semi-quantitative scoring of freshwater-associated skin lesions in free-ranging bottlenose dolphins.
  • FRESH focuses on three primary, visually identifiable indicators—Overgrowth, Target-like Lesions, and Light Discoloration—paired with a scoring rubric that produces an FSD Severity Score for comparative analysis and monitoring.
  • Systematic testing showed high agreement between field biologists and medical experts, enabling application across photo-ID datasets to monitor lesion progression, investigate exposure histories, and support management decisions.

Introduction

Freshwater exposure is increasingly affecting coastal bottlenose dolphins as extreme precipitation events and coastal engineering alter estuarine salinity regimes. When dolphins encounter prolonged or sudden reductions in salinity, physiological stress manifests in a distinctive suite of clinical features, including electrolyte disturbances, organ edema, and characteristic skin lesions sometimes culminating in secondary infection or death. Field researchers have documented such freshwater-related health effects across multiple regions—ranging from the northern Gulf of Mexico to Australia—and managers need reliable, consistent methods to detect, categorize, and track those effects in living populations without invasive sampling.

The Dolphin FRESH Protocol responds to that need. Designed by veterinarians, pathologists, epidemiologists, resource managers, and experienced field biologists, FRESH translates clinical understanding of freshwater skin disease (FSD) into a practical photographic scoring system. The protocol enables non-medical observers with photo-ID experience to screen images for primary freshwater indicators, apply a structured scoring rubric, and generate standardized metrics to compare cases across time, space, and study programs. This article explains how FRESH was built, what it measures, how it performed in testing, and why the tool matters for conservation and management as coastal environments change.

Why a photographic protocol was necessary

Research on freshwater impacts to dolphins has historically relied on a mix of strandings, capture-release health assessments, and opportunistic out-of-habitat (OOH) cases. These sources provide critical clinical detail but are inherently limited: few individuals undergo comprehensive pre-exposure health monitoring, and many free-swimming animals are only partially visible at the surface. Studies that attempt visual-only assessments of skin lesions have used varied methods and inconsistent terminology, making cross-study comparison difficult.

Field biologists and clinicians often speak different diagnostic languages. Clinicians use terminology tied to pathophysiology and prognostic significance; field researchers use practical visual descriptors. The result has been inconsistent use of terms like “severity,” and ambiguity when comparing photos from different programs. A standardized photographic protocol bridges that gap. It provides a shared vocabulary, prioritizes clinically relevant visual indicators, and offers a semi-quantitative approach that both groups can apply to large historical and contemporary photo-ID archives.

How FRESH was developed: multidisciplinary design and evidence base

Design principles

  • Build directly on clinical descriptions of freshwater effects in dolphins and on efforts to standardize epidermal lesion terminology.
  • Limit the protocol to features reliably visible in photographs and informative about freshwater exposure.
  • Make it accessible to non-clinicians with sufficient photographic identification experience.
  • Ensure the tool can be applied to single time points and repeated across sightings to track progression or recovery.

Development process A team of 13 marine mammal biologists, six medical experts (including veterinarians and pathologists), and five natural resource managers collaborated through a three-stage process: pre-workshop pilot assessments, a three-day hybrid workshop, and post-workshop testing and refinement. Reference images used for the protocol included cases accompanied by clinical data (necropsy, in-hand assessments) and field observations from documented freshwater exposure events. Those reference cases covered multiple geographies and prior analyses, ensuring the rubric reflected real-world variability.

Critical validation during the pilot stage showed medical experts could identify FSD from photos with substantial agreement: an 89% concordance after excluding low-confidence cases and a Watson’s kappa of 0.78 (95% CI 0.58–0.91). That result justified expanding the approach into a formal protocol.

The core of FRESH: screening plus a scoring rubric

Two-part structure FRESH consists of:

  1. A Screening Decision Tree that guides users through photo-quality checks and a preliminary determination of whether any observed anomalies match primary FSD indicators.
  2. A Scoring Rubric that captures presence, counts, and coverage of defined indicators and produces a numeric FSD Severity Score.

Screening decision rules Photo-series selection is essential. Each “case” represents one individual at a single time point and should include multiple images from different angles when available. Images must be of sufficient quality for confident evaluation: focused, well-lit, and showing at least 10% of the dolphin’s body above water. The screening tree directs users to classify cases as:

  • Negative (no skin anomalies),
  • Negative for FSD but potentially another lesion type (refer to an alternate lesion matrix),
  • Suspect (low confidence on a potential primary indicator),
  • Complete Scoring Rubric (medium-to-high confidence in at least one primary indicator).

Users are instructed to err on the side of conservatism: mark a case “suspect” rather than over-commit if confidence is low.

Primary indicators: what to look for and why they matter

FRESH focuses on three primary visual indicators. Each indicator is tied to observed clinical changes associated with freshwater exposure and chosen for consistent recognition in photographs.

  1. Overgrowth (films and mats)
  • Appearance: Thin, flat films or thick, raised mats of coloration—brown, tan, green, yellow, orange, red—or similar hues consistent with algal, fungal, or other epibiotic growth.
  • Clinical relevance: Overgrowth signals altered skin surface condition. In freshwater exposure, impaired epidermal integrity and shifts in microbial or algal colonization can produce conspicuous mats or films. Overgrowth may hide other lesion types and contributes to overall dermal stress.
  1. Target-like Lesions
  • Appearance: A central area of discoloration surrounded by concentric rings or scalloped borders; may be smooth, rough, eroded, or ulcerated. Targets may coalesce into larger patterns.
  • Clinical relevance: Ringed or target-shaped lesions reflect localized epidermal degeneration that progresses outward. Ulceration suggests advanced tissue loss and risks secondary infection. Targets overlap visually with some poxvirus presentations but differ in border pigmentation and other features; the rubric includes flags for potential differentials.
  1. Light Discoloration (pallor)
  • Appearance: Skin lighter than the typical shade of gray for the population—ranging from light gray to white—either as uniform pallor with indistinct margins or as distinct multifocal patches that may coalesce.
  • Clinical relevance: Pallor reflects loss of pigment or thinning/degeneration of epidermis and is commonly reported in freshwater-exposed dolphins. Uniform pallor can signal widespread osmotic stress affecting broad skin areas.

Supplementary observations

  • Rough, irregular texture or dark discoloration not clearly part of the three primary indicators.
  • Eye condition (cloudiness or squinting) to screen for corneal edema or ocular discomfort associated with freshwater effects.
  • Nodules, tattoo-like lesions, and non-target ulcerations: these trigger flags for further review because they may indicate alternative diagnoses (poxvirus, bacterial or fungal infections, trauma).

How the Scoring Rubric converts observations into metrics

Question flow and scoring The rubric follows a specific order to prioritize features that can obscure others (for example, mats layering over discoloration). Users answer descriptive questions that capture:

  • Presence/absence of each primary indicator.
  • Counts of lesions in defined size classes (<2 cm, 2–10 cm, >10 cm).
  • Percentage coverage estimates within broad bands (1–10%, 11–49%, 50–89%, 90–100% for mats; slightly different bands for films and other indicators).
  • Texture or severity descriptors (smooth versus rough/eroded/ulcerated) and whether lesions are localized to an injury.

Weighted scores are assigned to specific metrics (for instance, higher points for larger coverage bands) and summed into:

  • Sub-scores for each primary indicator.
  • A total FSD Severity Score for the case.

Severity categories Total scores are binned into three severity categories:

  • Low: 1–5
  • Medium: 6–12
  • High: >12

“Cannot be Determined” option For many metrics, users can select “Cannot be Determined” (CBD). Choosing CBD preserves the evaluator’s uncertainty without artificially altering the numeric score, and flags that particular metric as indeterminate.

Flags for differential diagnoses Several questions do not score numerically but trigger flags for additional review. For example, confident identification of nodules or non-target ulceration prompts clinical review because these features suggest bacterial/fungal infection or trauma rather than primary FSD.

Practical guidance for photo selection and pre-scoring

Who should use FRESH Target users are researchers with moderate-to-extensive photo-ID experience for their study population. They must be able to recognize typical skin coloration, identify photo artifacts (glare, water droplets, shadows), and discern lesions from image quality issues. Medical training is not required.

Photo selection

  • Gather all photos of the individual at the same sighting/time point.
  • Favor continuous high-speed shooting modes and full-body coverage rather than only dorsal-fin-focused shots typical of many photo-ID surveys.
  • Ensure photos permit evaluation of fine skin details and cover at least 10% of the dolphin’s visible body across the series.

Establish local quality criteria Each program should define and consistently apply photo-quality thresholds—based on confidence in skin detail visibility and the number of acceptable images per case—to avoid bias across time and between evaluators.

Testing and validation: performance of the rubric

Initial testing design A preliminary rubric underwent systematic testing on 16 cases. Evaluators included 16 field biologists with photo-ID experience and three medical experts (one pathologist and two veterinarians). All test cases had independent supporting evidence for the presence or absence of FSD; evaluators worked only from photographs, with supporting records withheld.

Key performance results

  • Primary indicator recognition: Field biologists and medical experts identified primary indicators and applied the rubric in all FSD cases—no freshwater cases were missed.
  • Agreement metrics: After excluding low-confidence cases, medical expert photo-only identification achieved substantial agreement in the pilot (Watson’s kappa = 0.78).
  • Score comparisons: No significant difference emerged between field biologists and medical experts in resultant FSD Severity Scores (Wilcoxon Signed-Rank Test: n = 16, V = 40, p = 0.159).
  • Experience effect: Level of evaluator experience (photo-ID, stranding response, health assessment) did not predict correct identification of FSD indicators in logistic regressions (p-values between 0.239 and 1).

Iterative improvements after testing Testing revealed common sources of inconsistency:

  • Differentiating “eroded” versus “ulcerated” in target-like lesions. The rubric now treats non-smooth targets the same for scoring and flags confident ulceration for specialist review.
  • Estimating percent coverage when indicators overlap. The question order and presentation of coverage metrics were revised to reduce ambiguity.
  • Indicator classification ambiguity (some evaluators scored a lesion as Light Discoloration while others scored it as Target-like). Weighting adjustments ensure total scores are comparable regardless of which indicator is chosen when ambiguous.

A second-round test on the updated rubric with five biologists produced consistent results and exposed no major remaining issues. The developers recommend applying multiple evaluators to the same challenging case when practical to reconcile divergent scores.

Challenges identified and stakeholder priorities

Stakeholder engagement A stakeholder session involved 14 biologists and managers from 10 organizations across the U.S. Gulf and Atlantic coasts. Participants prioritized implementation challenges and suggested actionable solutions.

Top challenges and response directions

  1. When to evaluate a lesion using the FSD assessment protocol
  • Stakeholders asked for clear thresholds and examples that delineate when an observed anomaly warrants FRESH scoring versus referral to other lesion-matrix tools.
  1. Training people to reliably use the protocol
  • Participants emphasized the need for accessible, interactive training, including practice datasets and mechanisms to test evaluator reliability.
  1. Photo quality, tagging, sorting, and management criteria
  • Standardized best-practice guidance for photographic acquisition and digital asset management is critical for consistent application.
  1. Maintaining a database linking scores, sightings, and environmental data
  • Centralized databases or integration with existing photo-ID systems would allow longitudinal tracking of individuals and environmental correlates.

Stakeholder wishlist Recommendations from this session formed a “wishlist” including: a written manual with example images, standardized electronic forms with integrated instructions, training videos and datasets, “train the trainer” workshops, FinBase integration for linking lesion scores to sighting data, a centralized curated repository for lesion scoring, and pursuit of AI tools to aid photo-quality assessment, lesion detection, and multi-image 3D reconstructions.

Conservation and management applications

Population monitoring and baseline establishment FRESH enables consistent cross-time comparisons to establish baseline prevalence of FSD indicators within populations. Baselines are necessary to detect unusual mortality events (UMEs) or spikes in lesion prevalence that may indicate environmental perturbation.

Example: Galveston Bay application The Galveston Bay Dolphin Research Program is already applying FRESH to photos collected between 2013–2020 to examine relationships between lesion scores and salinity changes within dolphins’ estimated core ranges. These analyses will help define exposure–response patterns and identify individuals that either develop or resist FSD indicators after freshwater events.

Triage, response, and stranding interpretation Although FRESH does not diagnose prognosis for individual dolphins, severity scores can inform triage decisions in live-stranding scenarios or response planning. For dead or OOH dolphins, FRESH scores may supplement necropsy findings to help determine whether freshwater exposure contributed to mortality.

Infrastructure and management decisions FRESH-derived insights can inform environmental impact assessments for coastal engineering projects (e.g., diversion channels, coastal barriers) that change estuarine salinity regimes. Managers can use FRESH data to project potential health impacts on dolphin populations and incorporate mitigation measures into planning.

Preparing for UMEs The 2019 Northern Gulf of Mexico freshwater UME underscored the need for operational tools to detect and monitor freshwater-related health effects. Routine application of FRESH across programs will improve readiness, allow early detection of concerning trends, and support coordinated responses.

Training, implementation, and technological integration

Training needs Current online training is limited to a written guide with example images. Stakeholders identified the need for an interactive training platform that allows potential users to test their skills against validated cases. Recommended training components:

  • Written manual with clear definitions and annotated photo examples.
  • A standardized form or electronic survey (e.g., ArcGIS Survey123 implementation) with integrated instructions and tooltips.
  • A curated practice dataset with known outcomes for reliability testing.
  • Webinars and regional “train the trainer” sessions to build local capacity.

Integration with photo-ID systems FRESH would be most effective if integrated into popular photo-ID databases such as FinBase. Integration would permit automated linking of lesion scores to sighting metadata, environmental measurements, and individual life histories.

Artificial intelligence prospects and caveats Machine learning offers significant potential to accelerate lesion detection and preliminary scoring, but algorithm development faces well-known pitfalls:

  • Model performance strongly depends on image quality, sample size, and labeling consistency.
  • Clinical AI suffers when training datasets contain non-standard or unverified labels, creating bias and reducing generalizability.

FRESH provides standardized definitions and an expert-labeled dataset foundation that could serve as a trustworthy training corpus for automated detection tools. Planned AI objectives include automated photo-quality assessment, lesion coverage estimation, lesion detection across image series, and combining multiple images into a more complete representation of an individual. Any AI deployment should be accompanied by human oversight, transparency in training labels, and validation across diverse populations.

Limitations and appropriate interpretation

FRESH measures visual indicators associated with freshwater exposure; it does not diagnose systemic health outcomes or predict survival. The protocol intentionally avoids assigning prognostic labels; the “severity” categories refer strictly to the visual extent and characteristics of skin indicators captured by the scoring rubric.

Differential diagnoses remain a real-world challenge. Some lesion types caused by poxvirus, bacterial or fungal infections, or trauma can mimic FSD. The rubric includes flags to identify suspicious features that warrant clinical review or alternate lesion-matrix assessment. Over time, as FRESH is applied alongside clinical and environmental data, correlations between specific scores and physiological outcomes will permit refinement of weightings and predictive value.

Population- and site-specific context matters. Baseline skin coloration and lesion prevalence vary among populations. Users must interpret FRESH scores in light of local norms, life-history data, and known exposure histories when available. Repeated application to the same individual across time points offers the most informative insights into progression and recovery.

Next steps for research and application

  • Expand training resources: create interactive online modules, annotated practice datasets, and regional training workshops to build consistent scoring capacity.
  • Integrate FRESH into photo-ID databases (FinBase or equivalents) to link lesion scores with sighting, demographic, and environmental data.
  • Curate a centralized, expert-validated repository of FRESH-scored cases to support algorithm development and comparative research.
  • Apply FRESH to long-term photo archives in regions with known freshwater events (e.g., Galveston Bay, Pensacola Bay, Lake Pontchartrain) to quantify prevalence trends and potential links to salinity fluctuations.
  • Pair FRESH scoring with synchronous clinical sampling and necropsy data where available to calibrate indicator weightings and evaluate links between visual severity and physiological outcomes.
  • Explore AI-assisted workflows for image triage and lesion detection while ensuring transparency and validation of training labels.

Case studies and real-world examples illustrating FRESH utility

Galveston Bay, Texas Upper Galveston Bay dolphins experienced notable skin changes following Hurricane Harvey. Applying FRESH to an archive of 2013–2020 photographs allows researchers to relate lesion presence and progression to measured salinity changes across core ranges. The program’s high site fidelity population offers an opportunity to track individuals across repeated exposures and quantify recovery dynamics.

Pensacola Bay flood events After a record-breaking flood in Pensacola Bay, researchers documented increases in skin lesions and elevated mortality rates. Retrospective scoring of pre- and post-flood photographs with FRESH can help quantify changes in lesion prevalence and severity and support causal inference when paired with environmental time series.

Northern Gulf of Mexico UME (2019) The 2019 freshwater unusual mortality event highlighted the vulnerability of coastal dolphins to large-scale salinity disturbances. FRESH provides a framework to screen archival images across the northern Gulf to identify areas and cohorts at increased risk during similar future events, improving surveillance and response planning.

Out-of-habitat and relocation cases OOH dolphins found in rivers or brackish systems present clinical challenges. FRESH applied to photographic documentation of these cases can supplement in-hand findings and guide rehabilitation or relocation decisions when clinical resources are limited.

Australian case comparisons Freshwater skin disease has been described in multiple hemispheres. Australian case definitions and pathology-based criteria informed FRESH’s clinical grounding. Applying the protocol across international datasets supports comparative pathology, helps identify population-specific manifestations, and refines the global understanding of freshwater exposure impacts.

Recommendations for researchers and managers adopting FRESH

  • Ensure photo quality: adopt boat-based photo protocols emphasizing full-body coverage and continuous shooting; train photographers to prioritize skin detail capture when feasible.
  • Standardize case selection: define program-specific thresholds for photo inclusion (number of images, percent body visible, minimum image clarity) and apply those consistently across time.
  • Use multiple evaluators for ambiguous or high-stakes cases: consensus scoring improves reliability.
  • Retain raw images and metadata: maintain a link between lesion scores and sighting metadata (time, location, environmental measures) to enable rigorous analyses.
  • Flag and refer atypical presentations: when lesions trigger differential flags (e.g., nodules, ulcerations absent other FSD indicators), seek pathology consultation.
  • Treat FRESH scores as part of a broader evidence set: combine photographic scoring with environmental monitoring, movement data, and clinical information when available.

Governance and data sharing considerations

A coordinated approach to data curation amplifies FRESH’s value. Programs should consider contributing scored datasets to a curated repository managed by a central curator or consortium. Such a repository would standardize forms, provide vetted training sets, and underpin AI development while preserving necessary data-use agreements and permitting institutional ownership of contributed images.

Privacy and permitting Many photographic datasets are generated under Marine Mammal Protection Act permits or institutional agreements. Data sharing must respect permit terms and contributor conditions. Programs should document permissions and create clear data-request procedures.

Quality assurance A central curator can perform quality checks, facilitate cross-program scoring comparisons, and maintain a validated set of training cases. Regular inter-laboratory calibration exercises and blind scoring trials will maintain scoring consistency as the user base grows.

FRESH as a platform for collaborative research

FRESH was built collaboratively and should continue to evolve through multi-institutional use. Priorities include:

  • Expanding the evidence base linking visual indicators to clinical outcomes.
  • Refining scoring weightings based on larger datasets with known health endpoints.
  • Developing interoperable data standards to link lesion data with oceanographic and meteorological datasets for exposure-response modeling.

Pooling scored photo datasets across regions will permit meta-analyses that can establish thresholds for concern, identify vulnerable subpopulations (by age, sex, or site fidelity), and inform regional management actions tied to projected salinity changes.

FAQ

Q: What exactly does FRESH measure? A: FRESH standardizes the visual detection and semi-quantitative scoring of three primary skin indicators associated with freshwater exposure—Overgrowth, Target-like Lesions, and Light Discoloration—using a structured rubric that produces sub-scores and a total FSD Severity Score for an individual at a single time point.

Q: Can FRESH diagnose freshwater skin disease or predict survival? A: No. FRESH provides standardized visual metrics of skin indicators. The severity categories reflect the extent of visible indicators, not a clinical prognosis. As FRESH is applied to datasets paired with clinical outcomes, links between scores and health will be evaluated and the rubric refined accordingly.

Q: Who can apply the protocol? A: Researchers with moderate-to-extensive photo-ID experience for the population under study are the primary users. Medical training is not required, but familiarity with photographic artifacts and local skin-color baselines is important. Training resources and practice datasets are recommended before independent application.

Q: What photo quality is required? A: Each case should include multiple images of an individual at a single time point, with the series collectively showing at least 10% of the body above water. Images need to be in-focus, well-lit, and permit discernment of fine skin details. Programs should establish and consistently apply photo-quality criteria.

Q: How does FRESH handle ambiguous lesions that resemble other diseases? A: The rubric includes flags for features suggestive of alternative etiologies (e.g., nodules, tattoo-like lesions, ulcerations without other FSD indicators). When flagged, cases should be referred for specialist review or evaluated using standardized lesion matrices that target other conditions.

Q: Can FRESH be automated with AI? A: FRESH provides labeled definitions and a structured dataset ideal for training algorithms, but AI models require careful curation and large, consistent training sets. Automated tools can support photo triage and lesion detection, but human oversight and transparency in training labels are essential to avoid bias.

Q: How should FRESH scores be stored and linked to other data? A: Scores should be retained with the original images and associated sighting metadata (time, GPS location, environmental data, and individual identity). Integration into photo-ID databases like FinBase is recommended to facilitate longitudinal analyses and linkage to movement and exposure data.

Q: How can programs ensure scoring consistency across evaluators and sites? A: Develop local training exercises using a shared practice dataset; perform periodic inter-evaluator comparisons; use multiple independent scorers for ambiguous cases and discuss discrepancies; and participate in centralized calibration exercises if a curated repository is established.

Q: What are the next steps for improving FRESH? A: Expand interactive training materials and practice datasets, integrate the rubric into photo-ID systems, develop a curated repository of scored images, pursue validated AI assistance tools, and pair FRESH scoring with clinical and environmental datasets to calibrate severity weightings against health outcomes.

Q: Where can I access the FRESH protocol and training materials? A: The FRESH protocol, scoring rubric, and supporting materials including an interactive ArcGIS Survey123 form and templates are available through the project’s public resources—programs should consult the FRESH online portal for the latest documents and training datasets.

Q: Who should I contact for permission to use reference images or to contribute data? A: Data and photograph contributors are listed under the protocol’s acknowledgments and data availability statements. For access to restricted materials used in development, contact the corresponding author or the program lead responsible for curation as specified in the protocol documentation.


FRESH offers a standardized, practical approach to visually assessing freshwater-related skin lesions in bottlenose dolphins. When applied across photo-ID archives and in tandem with environmental and clinical data, the protocol will allow researchers and managers to quantify lesion prevalence, monitor trends, evaluate exposure–response relationships, and support evidence-based conservation decisions in coastal systems experiencing changing salinity regimes.