(Optimized Gleason grading annotations of the TCGA PRAD dataset) is the result of collaboration between Codatta and DPath.ai, setting a new standard for AI-ready pathology data. By gathering a community of top pathology experts through the Codatta platform, this dataset transcends traditional slide-level annotations by introducing ROI-level spatial annotations, enhancing diagnostic detail, accuracy, and transparency. With optimized Gleason grading, detailed annotation rationale, and ROI-based Gleason pattern mapping, this dataset becomes a key resource for AI model development and pathology research, addressing the critical challenges of creating high-quality annotated data. Through Codatta's Royalty Model, contributors can maintain ownership of their work, ensuring they receive recognition and ongoing value, while DPath.ai demonstrates how collaborative solutions can advance pathology AI.
Figure 1: Optimized Gleason grading annotation of the TCGA PRAD dataset. Image source: https://huggingface.co/datasets/Codatta/Refined-TCGA-PRAD-Prostate-Cancer-Pathology-Dataset
What is the TCGA PRAD dataset?
The optimized Gleason grading annotations of the TCGA PRAD (The Cancer Genome Atlas Prostate Adenocarcinoma) dataset upgraded the original slide-level annotations to include ROI-level spatial annotations. Co-developed by Codatta and DPath.ai, this dataset was collaboratively created by the pathology community, supporting global participation and ensuring ownership of annotations. This method enhances diagnostic accuracy, detail, and reliability, which are critical elements for AI model training and pathology research.
By organizing 435 TCGA whole slide images, pathologists identified 245 cases needing improved annotations and confirmed 190 cases with accurate annotations. The dataset includes slide-level metadata and ROI-level spatial annotations, providing researchers with valuable resources for AI pipeline development, interactive tumor region exploration, and advanced pathology research.
Empowering Pathology AI: Codatta and DPath.ai Join Forces
(Optimized Gleason grading annotations of the TCGA PRAD dataset) demonstrate the potential of collaborative, community-driven data creation while enhancing the accuracy and detail of annotations, making AI model training more reliable and advancing medical research. However, these contributions require domain expertise, time, and effort, necessitating a sustainable incentive structure to recognize and reward the work of skilled professionals.
Royalty Model
Codatta's Royalty Model provides a solution. Compared to the traditional Web2 model (like Scale AI), it enhances the efficiency of data contribution and acquisition. While Scale AI excels at meeting ordinary users' immediate liquidity preferences and can quickly and efficiently collect large-scale data, its high costs exclude smaller participants when it comes to domain experts engaging in specialized tasks. Codatta aligns with skilled practitioners and experts by offering conditional and asset-based rewards. As shown in Figure 2 below, these incentives attract contributors willing to invest high-quality professional data, where potential returns may be delayed but are higher, making Codatta an ideal choice for vertical AI and advanced applications that require precision and professionalism.
Figure 2: Mapping skill proficiency in data contribution against liquidity preferences
Unlike the high upfront costs associated with Scale AI, Codatta's Royalty Model eliminates financial barriers for small AI startups by introducing a pay-as-you-go system. This approach democratizes access to critical frontier data without the need for expensive upfront investments, allowing startups to showcase their product-market fit and scale. Additionally, by converting data into liquid assets in decentralized finance markets, Codatta ensures contributors can balance short-term liquidity needs with long-term asset ownership. Features like agreed transactions and fractional ownership further optimize liquidity, making asset-based rewards more appealing to a broader range of contributors. This consistency promotes collaboration, drives innovation in niche AI applications, and creates a diverse investment ecosystem for data creators and startups.
DPath.ai: Collaborative Solutions to Pathology AI Data Challenges
DPath.ai is pioneering a decentralized platform designed to connect pathologists, researchers, and AI model developers worldwide. We are responsible for acquiring, curating, and exchanging high-quality pathology data, enabling anyone interested in training AI models to participate. The DPath platform utilizes blockchain technology to ensure transparency, fairness, and security in data exchanges.
Platforms like DPath.ai can leverage Codatta's decentralized data protocol to collaboratively and transparently obtain annotations:
Task Definition: Clear annotation standards (such as Gleason grading for prostate cancer) ensure consistency and reliability of outcome data.
Community Engagement: Global skilled pathologists participate through the Codatta platform and are incentivized by its Royalty Model, earning ongoing rewards linked to the future value of the dataset.
Quality and Integrity: Blockchain-based verification and multi-party cross-referencing ensure traceable high-quality annotations while enhancing annotator accountability.
Security and Accessibility: By storing data in a decentralized manner, data ownership remains secure and accessible to relevant individuals.
Figure 3: Collaboration between Codatta and DPath.ai. Image source: https://huggingface.co/datasets/Codatta/Refined-TCGA-PRAD-Prostate-Cancer-Pathology-Dataset
By acquiring domain-specific data collaboratively, DPath.ai not only enriched the TCGA PRAD dataset with precise Gleason grading but also demonstrated how the Codatta platform creates cutting-edge data for the professional AI field. This approach fosters sustainable engagement, democratizes data acquisition, and accelerates the development of equitable and efficient healthcare AI systems.
Conclusion
(Optimized Gleason grading annotations of the TCGA PRAD dataset) is the result of collaboration between Codatta and DPath.ai, enhancing diagnostic accuracy and detail of pathology AI data through ROI-level annotations with annotation rationale. With participation from global pathology experts, the project ensures high-quality data while rewarding contributors through Codatta's Royalty Model, providing ongoing value and ownership. This approach also fosters collaboration, improves data liquidity, and accelerates the development of healthcare AI, showcasing the power of decentralized, community-driven solutions.