(Optimized Gleason grading annotation of the TCGA PRAD dataset) is the result of collaboration between Codatta and DPath.ai, setting a new standard for AI-ready pathology data. By bringing together a community of top pathology experts through the Codatta platform, this dataset transcends traditional slide-level annotations and introduces ROI-level spatial annotations, enhancing diagnostic detail, accuracy, and transparency. With optimized Gleason grading, detailed annotation rationales, and ROI-based Gleason pattern mapping, this dataset becomes a key resource for AI model development and pathology research, addressing the critical challenges of creating high-quality annotated data. Through Codatta's Royalty Model, contributors can maintain ownership of their work, ensuring recognition and ongoing value, while DPath.ai demonstrates how collaborative solutions can drive the development of pathology AI.
Figure 1: Optimized Gleason grading annotation of the TCGA PRAD dataset. Image source: https://huggingface.co/datasets/Codatta/Refined-TCGA-PRAD-Prostate-Cancer-Pathology-Dataset
What is the TCGA PRAD dataset?
The optimized Gleason grading annotation of the TCGA PRAD (The Cancer Genome Atlas Prostate Adenocarcinoma) dataset upgrades the original slide-level annotations by incorporating ROI-level spatial annotations. Jointly developed by Codatta and DPath.ai, this dataset is collaboratively created by the pathology community, supporting global participation and ensuring ownership of annotations. This method enhances diagnostic accuracy, detail, and reliability, which are key elements for AI model training and pathology research.
By organizing 435 TCGA whole slide images, pathologists identified 245 cases needing improved annotations and confirmed 190 cases with accurate annotations. This dataset contains slide-level metadata and ROI-level spatial annotations, providing researchers with valuable resources for AI pipeline development, interactive tumor region exploration, and advanced pathology research.
Empowering Pathology AI: Codatta partners with DPath.ai
(Optimized Gleason grading annotation of the TCGA PRAD dataset) demonstrates the potential of collaborative, community-driven data creation while enhancing annotation accuracy and detail, making AI model training more reliable and advancing medical research. However, these contributions require domain expertise, time, and effort, necessitating a sustainable incentive structure to recognize and reward the work of skilled professionals.
Royalty Model
Codatta's Royalty Model provides a solution for this. Compared to traditional Web2 models (like Scale AI), it enhances the efficiency of data contribution and acquisition. While Scale AI excels at meeting the immediate liquidity preferences of ordinary users, allowing for quick and efficient collection of large-scale data, its high costs exclude smaller participants when it comes to domain experts engaging in specialized tasks. Codatta aligns with skilled practitioners and experts by offering condition-based and asset-based rewards. As shown in Figure 2 below, these incentives attract contributors willing to invest high-quality professional data, even though returns may be delayed but have higher potential gains, making Codatta an ideal choice for vertical AI and advanced applications that require precision and expertise.
Figure 2: Mapping skill proficiency and liquidity preferences in data contribution
Unlike the high upfront costs of Scale AI, Codatta's Royalty Model eliminates financial barriers for small AI startups by introducing a pay-per-use system. This approach democratizes access to critical cutting-edge data without the need for expensive upfront investments, allowing startups to showcase their product-market fit and scale. Additionally, by converting data into liquid assets within decentralized financial markets, Codatta ensures that contributors can balance short-term liquidity needs with long-term asset ownership. Features like agreed trades and fractional ownership further optimize liquidity, making asset-based rewards more attractive to a broader range of contributors. This consistency promotes collaboration, drives innovation in niche AI applications, and creates a diverse investment ecosystem for data creators and startups.
DPath.ai: Collaborative solutions to tackle pathology AI data challenges
DPath.ai is pioneering a decentralized platform aimed at connecting global pathologists, researchers, and AI model developers. We are responsible for the acquisition, curation, and exchange of high-quality pathology data, allowing anyone interested in training AI models to participate. The DPath platform utilizes blockchain technology to ensure transparency, fairness, and security in data exchange.
Platforms like DPath.ai can leverage Codatta’s decentralized data protocol to collaboratively and transparently obtain annotations:
Task Definition: Clear annotation standards (such as Gleason grading for prostate cancer) ensure consistency and reliability of the resulting data.
Community Participation: Global skilled pathologists engage through the Codatta platform and are incentivized by its Royalty Model, receiving ongoing rewards linked to the future value of the dataset.
Quality and Integrity: Blockchain-based verification and multi-party cross-referencing ensure traceable high-quality annotations while enhancing the accountability of annotators.
Security and Accessibility: Decentralized data storage ensures that data ownership remains secure and accessible to relevant individuals.
Figure 3: Collaboration between Codatta and DPath.ai. Image source: https://huggingface.co/datasets/Codatta/Refined-TCGA-PRAD-Prostate-Cancer-Pathology-Dataset
By collaboratively obtaining domain-specific data, DPath.ai not only enriched the TCGA PRAD dataset with precise Gleason grading but also demonstrated how the Codatta platform can create cutting-edge data for professional AI fields. This approach fosters sustainable participation, democratizes data acquisition, and accelerates the development of equitable and efficient healthcare AI systems.
Conclusion
(Optimized Gleason grading annotation of the TCGA PRAD dataset) is the result of the collaboration between Codatta and DPath.ai, enhancing the diagnostic accuracy and detail of pathology AI data through ROI-level annotations with annotation rationale. By involving global pathology experts, the project ensures high-quality data while rewarding contributors through Codatta's Royalty Model, providing ongoing value and ownership. This approach also fosters collaboration, improves data liquidity, and accelerates the development of healthcare AI, showcasing the power of decentralized, community-driven solutions.