Knowledge Engine for Scientific Discovery
A collaborative research platform that transforms cutting-edge scientific research into accessible, multi-format tools for collective knowledge exploration. These are research instruments—like microscopes for observing the collective knowledge of humanity—enabling hypothesis formation, testing, and discovery across scientific disciplines.
CopernicusAI is an operational research platform that synthesizes scientific literature from 250+ million papers into AI-generated podcasts, integrates with a knowledge graph of 23,246 indexed papers, and provides collaborative tools for research discovery. The system demonstrates production-ready multi-source research synthesis with full citation tracking and evidence-based content generation requiring minimum 3 research sources per episode.
The platform includes a fully operational Research Tools Dashboard (deployed December 2025) with interactive knowledge graph visualization, vector search, and RAG capabilities, enabling researchers to explore, query, and synthesize scientific knowledge across disciplines.
The CopernicusAI Knowledge Engine systematically transforms information into knowledge through integrated capabilities. At its core, a knowledge engine is any system—biological or artificial—that systematically transforms information into knowledge, performing work by converting raw materials (information) into useful outputs (knowledge, understanding, insights).
The system architecture demonstrates the integration of data ingestion, processing, storage, and query capabilities across multiple modalities—research papers, process descriptions, and media content—enabling comprehensive knowledge discovery and synthesis.
Figure: Knowledge Engine Architecture - Data flow from ingestion through processing and storage to query interfaces
Multi-source acquisition from academic databases (PubMed, arXiv, NASA ADS), literature sources (textbooks, reviews), and educational content (videos, transcripts), with quality assessment and type classification.
LLM-powered entity extraction and process logic extraction, structured data storage (JSON metadata, Mermaid flowcharts, transcripts), and specialized databases for papers, processes, and media.
Multiple access interfaces including RAG queries, vector search, knowledge graph visualization, API endpoints, and web interfaces, converging to unified knowledge output.
CopernicusAI is an active research prototype exploring AI-generated audio briefings as an interface for assisted scientific research.
The system allows any user to generate, refine, and share AI-generated science podcasts based on structured prompts, enabling rapid orientation to a topic, iterative deepening, and personalized research briefings.
Rather than functioning as a static content platform, CopernicusAI supports collectively generated and shared research artifacts, analogous to community-driven knowledge platforms (e.g., discussion forums), but grounded in scientific sources and metadata-aware workflows.
The Research Tools Dashboard is fully operational and deployed to Google Cloud Run, providing unified access to all components with interactive knowledge graph visualization, vector search, RAG queries, and content browsing.
See the "Knowledge Engine Ecosystem" section below for details.
Inspired by Nicolaus Copernicus who challenged accepted knowledge with evidence and rigorous analysis, CopernicusAI creates collaborative research tools that enable collective participation in scientific discovery. These platforms are instruments for exploring humanity's collective knowledge—tools for hypothesis formation, testing, and collaborative research, not just educational content.
Just as a microscope enables observation of the microscopic world, CopernicusAI tools enable observation and exploration of humanity's collective knowledge. Subscribers collaborate to prompt, generate, and refine research content—sharing discoveries publicly or keeping them private. As large language models (LLMs) and AI systems gain unprecedented knowledge, CopernicusAI provides the infrastructure for human-AI collaborative knowledge exploration, with evidence-based truth-seeking as our guiding principle.
An integrated ecosystem of research and collaboration tools designed to assist scientists in their workflow, from research discovery through knowledge synthesis to multi-format content generation. View Public Project Interface →
Synthesis & distribution platform for AI-powered research briefing podcast generation
Visit Website →Foundational meta-tool for universal process analysis across disciplines
Explore →Mermaid markdown format flowcharts modeling 100+ biochemical processes in Yeast and E. Coli
Explore →Core data infrastructure for research paper metadata and citation networks
Explore →Multi-modal content with transcript-based search for scientific videos
Explore →✅ Prototype web interface for testing knowledge graph, vector search, RAG queries, and content browsing
Live System →Collaborative research platform where subscribers prompt and generate multi-voice AI podcasts (5-10 minutes) synthesizing research from multiple academic sources. Subscribers can share their podcasts publicly or keep them private. Evidence-based content generation requiring minimum 3 research sources per episode.
Multi-model architecture with intelligent model selection:
Comprehensive academic database coverage with 250+ million research papers accessible through integrated APIs.
Operating Audio Podcast System: Full production and distribution platform for subscriber-generated podcasts. Users can prompt, generate, publish, and distribute audio podcasts with RSS feed support for Spotify, Apple Podcasts, and Google Podcasts.
Advanced video features planned for future development:
See: Science Video Database - Companion project for research video content management.
A centralized metadata repository (not a file archive) providing structured JSON objects with AI-powered preprocessing.
The system requires a minimum of 3 research sources per podcast episode. Each source is:
The system automatically extracts and formats citations from research papers:
The system uses LLM analysis to identify paradigm-shifting research by:
These platforms enable collective participation and collaboration across diverse user communities:
Like a microscope enables observation of the microscopic world, these tools enable observation and exploration of humanity's collective knowledge.
This platform represents prior work that demonstrates foundational research and development achievements in AI-powered scientific knowledge synthesis, collaborative research tools, and multi-modal content generation. These contributions establish the technical foundation and proof-of-concept for the broader CopernicusAI Knowledge Engine initiative.
This platform serves as the core synthesis and distribution component of the CopernicusAI Knowledge Engine. The Knowledge Engine is an integrated ecosystem of research and collaboration tools that work together to assist scientists in their workflow, from research discovery through knowledge synthesis to multi-format content generation.
The Knowledge Engine is designed to grow and evolve. Additional tools, databases, and collaboration components will be added as the project develops, expanding capabilities for AI-assisted scientific research and knowledge discovery.
For Grant Proposals (NSF/DOE):
Welz, G. (2025). CopernicusAI: Knowledge Engine for Scientific Discovery.
Hugging Face Space. https://huggingface.co/spaces/garywelz/copernicusai
Live Platform: https://www.copernicusai.fyi
BibTeX Format:
@misc{welz2025copernicusai,
title={CopernicusAI: Knowledge Engine for Scientific Discovery},
author={Welz, Gary},
year={2025},
url={https://huggingface.co/spaces/garywelz/copernicusai},
note={Hugging Face Space, Live Platform: https://www.copernicusai.fyi}
}
https://copernicus-podcast-api-phzp4ie2sq-uc.a.run.app
Welz, G. (2024–2025). CopernicusAI: AI-Generated Audio Briefings as a Research Interface.
Hugging Face Spaces. https://huggingface.co/spaces/garywelz/copernicusai
BibTeX Format:
@misc{welz2025copernicusai,
title={CopernicusAI: AI-Generated Audio Briefings as a Research Interface},
author={Welz, Gary},
year={2024--2025},
url={https://huggingface.co/spaces/garywelz/copernicusai},
note={Hugging Face Space}
}
This platform is designed to support grant applications to:
National Science Foundation - Science education and research infrastructure
Department of Energy - Scientific computing and data science
AI research and development initiatives
The CopernicusAI Knowledge Engine is an integrated ecosystem of research and collaboration tools. The Research Tools Dashboard is now fully operational (December 2025) with a working web interface providing unified access to all components.
Fully operational web interface with knowledge graph visualization (23,246 papers), vector search, RAG queries, and content browsing.
Public Project Interface → (opens in new tab)Foundational meta-tool for universal process analysis across any discipline
First application of Programming Framework to biology - 50+ biological processes visualized
Core data infrastructure for structured research paper metadata and citation networks
Multi-modal content component with transcript-based search for scientific videos
Base URL: https://copernicus-podcast-api-phzp4ie2sq-uc.a.run.app
POST /api/papers/query
{
"discipline": "biology",
"keywords": ["DNA replication", "cell cycle"],
"date_range": {
"start": "2020-01-01",
"end": "2025-01-01"
},
"limit": 10
}
{
"status": "success",
"count": 10,
"papers": [
{
"id": "pmid_12345678",
"title": "Mechanisms of DNA Replication...",
"authors": ["Smith, J.", "Doe, A."],
"journal": "Nature",
"year": 2023,
"doi": "10.1038/s41586-023-01234",
"abstract": "..."
}
]
}
API uses Bearer token authentication. Include in request headers:
Authorization: Bearer YOUR_API_TOKEN
Standard rate limits apply: 100 requests/minute per API key. Contact for higher limits.
Current version: v1.0. API is stable and backward-compatible.