Navigating the Data Deluge: Practical AI Applications in Pathology You Can Use Now!
Pathologists and laboratorians are increasingly overwhelmed by the sheer volume, complexity, and rapid evolution of diagnostic data. From lengthy pathology reports to intricate genomic profiles, extracting actionable insights is a significant and growing challenge. This report provides a practical solution accessible today: harnessing the power of large language models (LLMs) through effective prompt engineering. We'll focus on concrete steps you can take now to improve your workflow and diagnostic accuracy, rather than theoretical future advancements. While AI-powered laboratories and retrieval-augmented generation (RAG) systems hold long-term promise, the key to unlocking the immediate benefits of current LLMs lies in crafting precise, context-rich prompts. The following examples will showcase direct interactions with LLMs, demonstrating how you can provide the necessary context to achieve these gains.
Specifically, we'll cover:
By mastering prompt engineering, pathologists can transform data hubs overwhelmed by information into efficient diagnosticians, harnessing the power of AI to deliver better patient care.
The Pathologist as Data Hub: Streamlining Second Opinions with Prompt Engineering
As the "data hub" for diagnostic information, laboratorians synthesize vast amounts of data. This challenge is particularly acute in second opinion cases, where information from multiple sources can be overwhelming. Consider the breadth of data typically involved:
The key to navigating this data deluge—and avoiding the "information overload and decision paralysis" described by Abels et al. (2019)—lies in leveraging AI tools, specifically through prompt engineering. Rather than manually sifting through extensive reports, pathologists can use carefully crafted prompts to direct LLMs to:
Leveraging Current AI Tools: Prompt Engineering is Key
While fully integrated AI systems are on the horizon, current AI tools, powered by LLMs and RAG, offer significant benefits now in diagnostic support, report generation, and literature summarization. However, their effectiveness hinges on prompt engineering—the art of crafting precise, context-rich inputs.
At present time, prompt engineering is the key to unlocking the power of AI in pathology. By crafting precise and informative inputs, pathologists can effectively manage the data deluge. Let's explore some specific examples:
1. Structured Data Extraction and Summarization:
2. Identifying Clinically Relevant Patterns and Discrepancies:
3. Prioritizing Information for Treatment Decisions:
4. Integrating Multi-Omics Data:
5. Generating Differential Diagnoses and Prognostic Assessments:
Key Prompt Engineering Patterns for Pathology
Beyond these specific use cases, several general prompt engineering patterns are particularly valuable in pathology:
1. Retrieval Pattern (Diagnostic Certainty):
2. Mitigating "Hallucinations":
3. Persona-Based Prompting (Education/Communication):
4. Error Handling Pattern (Confidence Levels):
5. Multi-Pass Query Refinement (Iterative Diagnosis):
6. Hybrid Prompting with Few-Shot Examples (Pattern Recognition):
Moving Forward: Multimodality, Curation, and Continuous Evaluation
For complex cases, combining text and image data through multimodal prompts will be crucial. To maximize the effectiveness of these prompt engineering patterns, pathologists should prioritize:
This iterative process ensures that AI tools consistently deliver accurate and reliable results, empowering pathologists to confidently navigate the data deluge, improve diagnostic accuracy, and ultimately enhance patient care.
A Call to Action: Secure and Responsible AI Implementation
Before embarking on this AI-powered journey, prioritize security and vigilance. Safeguarding patient data is paramount. Before sharing any data with third-party AI tools, implement robust de-identification procedures to remove all Protected Health Information (PHI). Start with a security-first mindset, thoroughly evaluating the privacy practices of any AI tools you consider. Be aware that many AI companies collect user data for model training or human review (often labeled as "telemetry"). Diligently opt out of such data collection settings whenever possible.
For the highest level of control over patient data, transition to local AI processing within your secure lab network. This significantly reduces external data exposure, but it's essential to acknowledge the substantial technical expertise and infrastructure required. One potential avenue to explore is carefully experimenting with locally deployed AI tools. For example, the open-source Nous Hermes 2 Mistral DPO LLM could be evaluated for pre-processing tasks like PHI removal from text-based reports or extracted metadata. However, be aware that local deployment is not a guarantee of perfect security. Rigorous testing, validation, and ongoing monitoring are absolutely essential to ensure the model's effectiveness, prevent unintended data leakage, and confirm that it meets all applicable regulatory requirements (e.g., HIPAA) before deploying any local LLM for handling PHI. Consider consulting with AI/ML security experts such as Rekonn.ai.
Another, more established and robust approach to image de-identification, particularly for Whole Slide Images (WSIs), is to consider integrating dedicated tools such as Microsoft Presidio into your workflow. A proposed workflow for WSIs is as follows:
If integrating new software presents immediate challenges, remember that readily available tools like Microsoft Word and Adobe Acrobat Pro can be used for manual de-identification and redaction. While this approach is more tedious, it provides a practical way to begin experimenting with AI while protecting patient privacy. This allows for immediate exploration of AI's potential, informing decisions for future investments or providing a stopgap until more integrated software solutions become available. These advancements are closer than you might think.
Conclusion: Embracing the Future of Pathology
The time to act is now. By embracing prompt engineering, prioritizing data security, and fostering a culture of continuous learning, pathologists can confidently leverage AI to transform diagnostic practice and improve patient lives. Let us move forward with vigilance and purpose, turning the data deluge into a catalyst for positive change.
Author
Scott Kilcoyne
DigitCells Cofounder & COO
Pathologists and laboratorians are increasingly overwhelmed by the sheer volume, complexity, and rapid evolution of diagnostic data. From lengthy pathology reports to intricate genomic profiles, extracting actionable insights is a significant and growing challenge. This report provides a practical solution accessible today: harnessing the power of large language models (LLMs) through effective prompt engineering. We'll focus on concrete steps you can take now to improve your workflow and diagnostic accuracy, rather than theoretical future advancements. While AI-powered laboratories and retrieval-augmented generation (RAG) systems hold long-term promise, the key to unlocking the immediate benefits of current LLMs lies in crafting precise, context-rich prompts. The following examples will showcase direct interactions with LLMs, demonstrating how you can provide the necessary context to achieve these gains.
Specifically, we'll cover:
- Prompt engineering fundamentals, tailored for pathologic diagnoses.
- Key prompt patterns for molecular testing, immunohistochemistry results interpretation, and other common tasks.
- Techniques to extract, summarize, and cross-reference complex pathology reports and genomic data using prompts.
- Crucial considerations for data security and patient privacy in the age of AI-assisted pathology.
By mastering prompt engineering, pathologists can transform data hubs overwhelmed by information into efficient diagnosticians, harnessing the power of AI to deliver better patient care.
The Pathologist as Data Hub: Streamlining Second Opinions with Prompt Engineering
As the "data hub" for diagnostic information, laboratorians synthesize vast amounts of data. This challenge is particularly acute in second opinion cases, where information from multiple sources can be overwhelming. Consider the breadth of data typically involved:
- Anatomic Pathology Reports: These reports include gross and microscopic descriptions, tumor size, grade, margins, lymphovascular invasion, mitotic count/rate, and features like tumor budding.
- Immunohistochemistry (IHC) Results: IHC provides information on protein expression, including markers such as ER, PR, HER2, Ki-67, and others relevant to the specific tumor type.
- Clinical Data: Includes data from CBC, CMP, and tumor markers.
- Molecular Testing: This category encompasses a range of tests, from focused assays like MammaPrint and Oncotarget500 to broader analyses like targeted gene panels, whole exome sequencing, and copy number variation studies.
- "Omics" Data: Increasingly, pathologists encounter data from high throughput "omics" technologies. This includes spatial transcriptomics/proteomics, circulating tumor cells, proteomics, metabolomics, tumor microenvironment assessment (beyond basic morphology), MRD (minimal residual disease) monitoring, pharmacogenomics, and drug resistance mechanisms.
The key to navigating this data deluge—and avoiding the "information overload and decision paralysis" described by Abels et al. (2019)—lies in leveraging AI tools, specifically through prompt engineering. Rather than manually sifting through extensive reports, pathologists can use carefully crafted prompts to direct LLMs to:
- Focus on the most clinically relevant information.
- Quickly compare and contrast findings from different sources.
- Highlight any conflicting data requiring further investigation.
Leveraging Current AI Tools: Prompt Engineering is Key
While fully integrated AI systems are on the horizon, current AI tools, powered by LLMs and RAG, offer significant benefits now in diagnostic support, report generation, and literature summarization. However, their effectiveness hinges on prompt engineering—the art of crafting precise, context-rich inputs.
At present time, prompt engineering is the key to unlocking the power of AI in pathology. By crafting precise and informative inputs, pathologists can effectively manage the data deluge. Let's explore some specific examples:
1. Structured Data Extraction and Summarization:
- Prompt: "Summarize the key findings from the provided pathology report, immunohistochemistry results, and molecular testing report for a breast cancer case. Focus on tumor size, grade, ER/PR/HER2 status, Ki-67 index, and any identified genetic mutations. Present the information in a concise, bullet-point format."
- Benefit: Saves time by extracting and organizing specific data points.
2. Identifying Clinically Relevant Patterns and Discrepancies:
- Prompt: "Compare the original pathology report's assessment of tumor grade with the molecular profiling results (e.g., Oncotarget500). Highlight any discrepancies or inconsistencies between the two reports. Also, find any conflicting information between the initial biopsy and the final surgical resection specimen."
- Benefit: Identifies potential errors or areas for further investigation.
3. Prioritizing Information for Treatment Decisions:
- Prompt: "Based on the provided genomic data (NGS), identify any actionable mutations or potential therapeutic targets for this breast cancer patient. Prioritize targets with established clinical evidence and available therapies. Also, find any drug resistance mutations."
- Benefit: Focuses attention on clinically significant findings.
4. Integrating Multi-Omics Data:
- Prompt: "Analyze the spatial transcriptomics data in conjunction with the immunohistochemistry results. Identify regions within the tumor microenvironment with high expression of immune checkpoint markers and correlate these with the presence of tumor-infiltrating lymphocytes. Provide a summary of how these findings may affect immunotherapy response."
- Benefit: Integrates complex "omics" data for a holistic view.
5. Generating Differential Diagnoses and Prognostic Assessments:
- Prompt: "Based on the provided clinical, pathological, and molecular data, generate a differential diagnosis and assess the patient's risk of recurrence. Provide a rationale for your assessment, including relevant prognostic factors."
- Benefit: Assists in formulating a comprehensive assessment.
Key Prompt Engineering Patterns for Pathology
Beyond these specific use cases, several general prompt engineering patterns are particularly valuable in pathology:
1. Retrieval Pattern (Diagnostic Certainty):
- Purpose: Ensure the AI relies solely on validated medical knowledge and provided documents.
- Example: "Using the provided pathology report excerpts (including IHC scores and FISH ratios) and the CAP guidelines on HER2 testing, determine the HER2 status of this breast carcinoma. Do not include any information outside of these sources."
2. Mitigating "Hallucinations":
- Purpose: Structure prompts to constrain responses to factual information, preventing the AI from generating incorrect or fabricated information.
- Example: When requesting specific guideline information, explicitly instruct the AI to only provide information from that guideline, and to avoid including any external data that might be inaccurate.
3. Persona-Based Prompting (Education/Communication):
- Purpose: Adapt the AI's response style based on the intended audience (e.g., patient vs. colleague).
- Example: "Explain 'high-grade dysplasia' to a patient in simple, non-technical terms."
4. Error Handling Pattern (Confidence Levels):
- Purpose: Instruct the AI to state its confidence level and identify any uncertainties in its response.
- Example: The AI should explicitly state its confidence level in a diagnosis (e.g., "High confidence," "Moderate confidence," "Low confidence") and suggest additional studies if necessary.
5. Multi-Pass Query Refinement (Iterative Diagnosis):
- Purpose: Refine diagnoses and analyses through a series of increasingly specific queries.
- Example: Start with a broad query for a differential diagnosis, then follow up with prompts requesting specific stain recommendations, and finally, ask for a likelihood assessment of each potential diagnosis.
6. Hybrid Prompting with Few-Shot Examples (Pattern Recognition):
- Purpose: Provide the AI with examples of well-written descriptions or reports to guide its output.
- Example: Provide several examples of well-written descriptions of spindle cell lesions, and then ask the AI to describe a new, unseen lesion based on the provided examples.
Moving Forward: Multimodality, Curation, and Continuous Evaluation
For complex cases, combining text and image data through multimodal prompts will be crucial. To maximize the effectiveness of these prompt engineering patterns, pathologists should prioritize:
- Building and utilizing curated, up-to-date databases: This includes resources like CAP guidelines and WHO classifications.
- Fine-tuning retriever models: This will enhance the precision of information retrieval.
- Continuous evaluation of AI performance: Rigorous testing and systematic feedback collection are essential.
This iterative process ensures that AI tools consistently deliver accurate and reliable results, empowering pathologists to confidently navigate the data deluge, improve diagnostic accuracy, and ultimately enhance patient care.
A Call to Action: Secure and Responsible AI Implementation
Before embarking on this AI-powered journey, prioritize security and vigilance. Safeguarding patient data is paramount. Before sharing any data with third-party AI tools, implement robust de-identification procedures to remove all Protected Health Information (PHI). Start with a security-first mindset, thoroughly evaluating the privacy practices of any AI tools you consider. Be aware that many AI companies collect user data for model training or human review (often labeled as "telemetry"). Diligently opt out of such data collection settings whenever possible.
For the highest level of control over patient data, transition to local AI processing within your secure lab network. This significantly reduces external data exposure, but it's essential to acknowledge the substantial technical expertise and infrastructure required. One potential avenue to explore is carefully experimenting with locally deployed AI tools. For example, the open-source Nous Hermes 2 Mistral DPO LLM could be evaluated for pre-processing tasks like PHI removal from text-based reports or extracted metadata. However, be aware that local deployment is not a guarantee of perfect security. Rigorous testing, validation, and ongoing monitoring are absolutely essential to ensure the model's effectiveness, prevent unintended data leakage, and confirm that it meets all applicable regulatory requirements (e.g., HIPAA) before deploying any local LLM for handling PHI. Consider consulting with AI/ML security experts such as Rekonn.ai.
Another, more established and robust approach to image de-identification, particularly for Whole Slide Images (WSIs), is to consider integrating dedicated tools such as Microsoft Presidio into your workflow. A proposed workflow for WSIs is as follows:
- WSI Reading: Use libraries like OpenSlide to access WSI files (SVS or TIFF).
- Label Detection: Employ computer vision techniques (edge detection, template matching, or trained machine learning models) to isolate the slide label.
- Label Extraction: Extract the label region as a separate image.
- PII Redaction (Presidio): Apply Presidio's OCR and PII detection capabilities to redact PHI from the extracted label image.
- WSI Modification: Replace the original label region in the WSI with the redacted version using OpenSlide, ensuring the WSI's integrity.
- Validation: Visually inspect the modified WSI to ensure successful redaction and that no other image artifacts were introduced. A copy of the original WSI is retained as backup.
If integrating new software presents immediate challenges, remember that readily available tools like Microsoft Word and Adobe Acrobat Pro can be used for manual de-identification and redaction. While this approach is more tedious, it provides a practical way to begin experimenting with AI while protecting patient privacy. This allows for immediate exploration of AI's potential, informing decisions for future investments or providing a stopgap until more integrated software solutions become available. These advancements are closer than you might think.
Conclusion: Embracing the Future of Pathology
The time to act is now. By embracing prompt engineering, prioritizing data security, and fostering a culture of continuous learning, pathologists can confidently leverage AI to transform diagnostic practice and improve patient lives. Let us move forward with vigilance and purpose, turning the data deluge into a catalyst for positive change.
Author
Scott Kilcoyne
DigitCells Cofounder & COO