Integrated materials discovery pipeline combining GNN prediction, ML structural validation (CHGNet), economic assessment, and multi-domain prioritization. 77,296 materials, 29 campaign profiles across 16 industrial domains. Prototype screening filters 92% of structurally implausible candidates before they waste DFT or lab resources — all at zero cloud cost.
An autonomous computational platform that predicts properties of crystalline materials from their structure — without requiring physical synthesis or experimental measurement. It accelerates materials discovery by screening thousands of candidates computationally before any lab work begins.
| Materials corpus | 76,193 validated crystalline materials (JARVIS DFT + AFLOW) |
| Property prediction | Formation energy and electronic band gap from crystal structure |
| Candidate generation | Autonomous discovery via element substitution, doping, and mixed-parent strategies |
| Campaign profiles | 29 profiles across catalysis, PV, emissions, proton energy, hydrogen storage, water, corrosive environments |
| Application domains | 16 industrial domains: exhaust catalysis, green chemistry, proton batteries, H&sub2; storage, PV absorbers, water treatment, and more |
| Validation bridge | Readiness gate (R0-R5), DFT input generation (real VASP files), reconciliation and learning loop |
| Research tracks | Exhaust catalysis (15 candidates, maturity 7.8/10) + PV absorbers (19 candidates, maturity 6.2/10) |
| Chemistry awareness | Risk labeling, hazardous family tagging, honest novelty ladder (6 levels) |
| Cost per month | $0 — runs entirely on CPU |
The Materials Engine demonstrates that the SOST ecosystem extends beyond cryptocurrency into real scientific utility. Materials discoveries can be registered on-chain as proof-of-discovery, with access controlled via SOST token payments. This creates a knowledge marketplace where scientific computation has real economic value backed by gold reserves.
Model architecture, neural network details, training methodology, hyperparameters, and prediction accuracy metrics are available only in the restricted technical documentation. Public API endpoints provide predictions without exposing the underlying models.
| Target | Model | MAE | R² | Dataset |
|---|---|---|---|---|
| Formation Energy | GNN (restricted) | validated | restricted | 76,193 |
| Band Gap | GNN (restricted) | validated | restricted | 76,193 |
# Health check curl https://sostcore.com/api/materials/health # Search by formula curl https://sostcore.com/api/materials/search?formula=GaAs # Corpus stats curl https://sostcore.com/api/materials/stats # Top exotic materials curl https://sostcore.com/api/materials/exotic/ranking/5 # Campaign presets curl https://sostcore.com/api/materials/campaigns/presets
The SOST Computational Materials Discovery Platform is an operational research system within the SOST ecosystem. It ingests data from public materials databases (JARVIS, Materials Project, AFLOW, COD), trains graph neural networks for property prediction, generates and evaluates novel material candidates, and produces comprehensive intelligence dossiers — all at near-zero compute cost on CPU.
| Corpus | 76,193 materials — JARVIS DFT 3D (75,993) + AFLOW (200), 100% with validated CIF structures |
| ML Models | Dual-target GNN prediction (formation energy + band gap) — accuracy metrics restricted |
| Discovery | Autonomous engine — 29 campaign profiles across 16 industrial domains, direct GNN on lifted structures, readiness gate R0-R5, validation bridge with DFT input generation |
| Research Tracks | 2 active — Exhaust catalysis (15 candidates, 7.8/10 maturity) + Photovoltaic absorbers (19 candidates, 6.2/10 maturity) |
| API | 70+ endpoints — FastAPI with predict, frontier, triage, campaigns, autonomous discovery |
| Tests | 155 tests passing across 4 test suites, zero regressions |
| Compute Cost | $0/month — runs on CPU, open data, no cloud required |
| Property Prediction | GNN models on real crystal structures // formation_energy, band_gap |
| Novelty Detection | 104-dim fingerprint + cosine similarity scoring // known / near_known / novel_candidate |
| Exotic Ranking | Weighted rarity: element IDF + spacegroup rarity + neighbor sparsity |
| Candidate Generation | 3 strategies: element substitution, stoichiometry perturbation, prototype remix |
| Structure Lift | Approximate crystal structures from parent prototypes for GNN evaluation |
| Intelligence Dossiers | Full reports with evidence tagging (known/predicted/proxy/unavailable) |
| Validation Queue | Cheap-first 6-stage ladder: dedup → novelty → proxy → DFT → external → learning |
| Benchmark + Calibration | Empirical error bands per bucket (element count, value range) |
| Structure Analytics | 28 real descriptors: density, volume, lattice, bonds, symmetry, composition stats |
| Campaign Mode | 8 presets: III-V, oxide, group IV, stable novel, valuable, strategic, battery, oxide-exploratory |
| Frontier Engine | Dual-target multiobjectve ranking: stability + BG fit + novelty + exotic + structure + priority |
| Validation Packs | Exportable packages with evidence, risk flags, next-step recommendations (JSON/MD/CSV) |
| Pre-DFT Triage | Hard gates + scoring: approved / manual_review / watchlist / reject β 4 profiles |
| Niche Campaigns | Themed discovery: stable_semiconductor, wide_gap_exotic, high_novelty, batch + compare |
| Active Learning | Error hotspot detection, coverage analysis, retraining proposals, corpus expansion planner |
| DFT Validation | Ab-initio computation for candidate confirmation // Phase IV+ |
| Phonon Stability | Vibrational analysis for thermal stability // Phase IV+ |
| Structure Relaxation | M3GNet / CHGNet geometry optimization // Phase IV+ |
| Blockchain PoD | On-chain proof-of-discovery with SOST integration // Phase V+ |
| Remote Sensing | Satellite-based mineral detection + GNN spectral prediction // Vision β viability study complete |
| Community Compute | Marketplace for distributed materials simulation // Future |
Server-side authentication required.
Research data and autonomous discovery console are restricted.
75,993 materials from JARVIS DFT 3D, all with validated crystal structures (CIF), band gap, formation energy, and spacegroup. 100% ML-ready. Ingested via bulk pipeline with structure backfill from jarvis-tools atoms.
| JARVIS DFT 3D | 75,993 materials — band gap, formation energy, spacegroup, full CIF structures |
| Materials Project | Normalizer ready, API key required // multi-source expansion planned |
| AFLOW + COD | Normalizers implemented // additional coverage for future phases |
| Fingerprints | 75,993 precomputed 104-dim vectors — 94 compositional + 10 structural |
Graph neural networks trained on real crystal structures predict formation energy and band gap from CIF input. Training ladder validated at 5K, 10K, 20K, 40K, and 76K samples — 20K CGCNN promoted as production model.
| CGCNN | Formation energy: MAE=0.1528 eV/atom, R²=0.9499 (20K training) // production |
| ALIGNN-Lite | Band gap: MAE=0.3422 eV, R²=0.707 (20K training) // production |
| Inference | Single-sample real-time via /predict endpoint |
Generates plausible candidates from corpus parents via element substitution, stoichiometry perturbation, and prototype remix. Filters novelty-first, lifts approximate structures from parent prototypes, and evaluates with real GNN prediction.
| Novelty Filter | Cosine similarity on 104-dim fingerprints: known / near_known / novel_candidate |
| Exotic Ranking | Weighted: 40% novelty + 20% element rarity + 15% structure rarity + 25% sparsity |
| Structure Lift | 95% lift rate — element swap on parent structures, pymatgen validated |
| Campaigns | 5 presets: exotic hunt, stability first, band gap target, T/P watchlist, novelty hunt |
Every material or candidate gets a comprehensive dossier with evidence-tagged properties, application hypotheses, comparison tables, calibrated confidence, and validation priority.
| Evidence | Every property tagged: known | predicted | proxy | unavailable |
| Applications | Rule-based: semiconductor, PV, thermoelectric, catalytic, magnetic, structural, high-pressure |
| Calibration | Empirical confidence bands from benchmark MAE per element-count and value-range bucket |
| Validation Queue | 6-stage cheap-first ladder with ROI scoring and dedup |
Records predictions vs observations, identifies model failures by property and element family, tracks promising chemical regions, and builds retraining queues. Evidence bridge imports external data (JSON/CSV). Auto-links observations to predictions for feedback. Currently a scaffold — active retraining loop not yet triggered.
| Phase | Scope | Status |
|---|---|---|
| Phase I | Data foundation: 4-source ingestion, canonical schema, SQLite storage, audit/export | Complete |
| Phase II | Baseline ML: CGCNN + ALIGNN-Lite training, /predict, /similar, model registry | Complete |
| Phase III.A | Novelty filter: 104-dim fingerprints, exotic ranking, cosine similarity bands | Complete |
| Phase III.B | Shortlist engine: configurable criteria, T/P proxy screening, decision bands | Complete |
| Phase III.C | Corpus scale: 75,993 materials, persistent fingerprints, fast retrieval, campaigns | Complete |
| Phase III.D | Candidate generation: element substitution, stoichiometry perturbation, prototype remix | Complete |
| Phase III.E | Structure lift + evaluation: prototype structures, real GNN prediction on candidates | Complete |
| Phase III.F | Material Intelligence: dossiers, evidence tagging, application classification, validation priority | Complete |
| Phase III.G | Validation queue + learning scaffold: cheap-first ladder, feedback memory, ROI scoring | Complete |
| Phase III.H | Evidence bridge + benchmark + calibration: empirical confidence bands, evidence-feedback linking | Complete |
| Phase III.I-J | Structure analytics (28 descriptors) + corpus backfill (100% CIF coverage) | Complete |
| Phase IV.A | Scaled retraining: formation energy ladder 5K→76K, CGCNN 20K promoted (MAE=0.1528) | Complete |
| Phase IV.B | Scaled retraining: band gap ladder, ALIGNN-Lite 20K promoted (MAE=0.3422) | Complete |
| Phase IV.C | Dual-target frontier engine: multiobjectve ranking with 4 profiles | Complete |
| Phase IV.D | Frontier-to-validation bridge: exportable packs with risk flags + next steps | Complete |
| Phase IV.E | Pre-DFT triage gate: hard gates, cheap-first 4-profile decision engine | Complete |
| Phase IV.F | Niche discovery campaigns: themed batch searches with cross-campaign comparison | Complete |
| Phase IV.G–U | Active learning, engine stabilization, public demo, operational acceptance (v3.2.0-RC1) | Complete |
| Phase V | Direct GNN inference: CGCNN/ALIGNN-Lite forward pass on lifted candidate structures | Complete |
| Phase V.B | GNN integration into autonomous pipeline, known-material penalty, validation queue hardening | Complete |
| Phase V.C | Lift expansion (doping/mixed), proxy suppression, novel direct GNN path, quality uplift | Complete |
| Phase VI | Public autonomous discovery dashboard, retro CRT console, evidence-first UX | Complete |
| Phase VII | Uncertainty-aware discovery: heuristic uncertainty, validation readiness, DFT handoff packs | Complete |
| Phase VIII | Validation bridge: lifecycle tracking, result ingestion, reconciliation, closed learning loop | Complete |
| Phase IX | Scientific operations: batch validation, evidence accumulation, longitudinal reporting | Complete |
| Phase X | Live data wiring, single source of truth, interactive campaign/candidate inspectors | Complete |
| Phase XI | Calibration intelligence, autonomy governance, chemistry-aware scoring, campaign intelligence, validation economics | Complete |
| Phase XII.B | Ion-separation discovery: functional scoring (7 signals), membrane/lithium/desalination awareness, 5 functional campaign profiles | Complete |
| Phase XIII | Relaxation bridge: structure repair heuristics, relaxation readiness triage, compute backend placeholders (M3GNet/CHGNet/DFT) | Complete |
| Phase XIV | Functional intersection discovery: cross-screening water/lithium/membrane, multi-function candidate detection, function-first prioritization | Complete |
| Phase XV | Full-corpus intersection scan (35,589 formulas): 11,339 multi-function candidates, real shortlists for lithium/water/membrane sweet spot | Current |
| Phase 30 | Consensus multi-track ranking, PV risk flags (6 physics-informed), DFT triage queues (exploit/explore/cross-track). 21 candidates ranked → 11 DFT-queued. | Complete |
| Phase 31 | DFT queue validation: 64% rediscovery rate detected → novelty filter needed. Expanded to 7 tracks. | Complete |
| Phase 32 | Novelty-aware scoring: known-material penalty (42 entries). Rediscovery rate 64% → 0%. Novel candidates promoted. | Complete |
| Phase 34 | ML surrogate prescreen (CHGNet): 17 candidates in 40 seconds. 64 DFT-hours saved. 4 unstable candidates eliminated before expensive computation. | Complete |
| Phase 35 | First DFT validation completed. Top candidate converged: ferrimagnetic oxide, stable phase confirmed. Reclassified from photovoltaic to functional catalyst/electrode. Full pipeline validated end-to-end. | Complete |
| Phase 36+ | DOS/band structure analysis, FM vs AFM comparison, convex hull verification, next candidate DFT queue | In progress |
| Phase XIX.B–C | Economic-functional expansion: abundance/cost scoring (USGS crustal ppm + $/kg), toxicity penalty, PGM detector, 6 mission profiles (catalysis, PV, ion separation, CO₂), forbidden/preferred element filters, economic composite formula | In Development |
| Initiative | Description | Status |
|---|---|---|
| Database expansion | 76K → 500K+ materials. Add Materials Project, OQMD, NOMAD, Materials Cloud, Open Catalyst Project APIs | Planned |
| PGM Replacement Engine | Find cheap alternatives to Pt/Pd/Rh/Ir/Ru. Target families: perovskites, spinels, Fe–N–C, sulfides, nitrides, carbides, phosphides. Forbidden element filters + PGM content detector operational | In Development |
| Multi-property GNN | Expand from 2 to 12+ predicted properties: stability, conductivity, magnetism, toxicity, cost, abundance, catalytic activity, corrosion resistance | Planned |
| MLIP validation layer | CHGNet/M3GNet/MACE relaxation before DFT. Pyramid: chemical filter → GNN → MLIP → DFT (only top 0.1%) | Partially implemented (CHGNet) |
| Structure generator v2 | Perovskite templates, 2D slicing, high-entropy alloys, single-atom catalysts, vacancy defects, controlled doping | Planned |
| Mission profiles | 6 targeted campaigns implemented: PGM-free catalyst, water splitting, low-cost PV, Li⁺ selective brine, desalination membrane, CO₂ capture. MissionProfile dataclass with target properties, element filters, weight tuning | In Development |
| Abundance & cost scoring | Crustal abundance (USGS/CRC ppm), cost proxy ($/kg), toxicity penalty, PGM detection, abundant replacement ratio. Log-scale scoring with economic composite formula | In Development |
| Uncertainty quantification | Ensemble + MC dropout. Uncertain → validation queue, not claimed discovery | Planned |
Goal: evolve from materials generator to economic-functional discovery system — finding materials that are cheap, abundant, stable, and useful. All computation at $0 or near-zero using free-tier cloud, CPU local, and future SOST Proof-of-Useful-Computation network.
The working layer that turns free-form scientific text into a structured, audited research pipeline: claims → evidence → multi-agent reasoning → falsification-first experiment plan → feedback → campaigns → knowledge graph → dossiers → review queue → autonomous orchestration. Local-first, deterministic, append-only, offline by default. Zero paid API dependencies. Zero mandatory LLM calls. See full breakdown on the SOST AI Engine page, section 08.
GeaSpirit is the unified platform combining computational materials science with remote geospatial detection. The Materials Engine predicts properties and generates novel candidates. A future Remote Sensing module will detect mineral signatures from satellite imagery. Together they enable a closed discovery cycle:
PREDICT → PRIORITIZE → SEARCH → VALIDATE → REGISTER
The Materials Engine has identified non-toxic, earth-abundant semiconductor families with direct band gaps and ultralight carrier effective masses exceeding established photovoltaic materials — potentially competitive thin-film solar cell candidates at raw material costs comparable to silicon. Multiple compositions within the same crystal family independently exhibit direct gaps, suggesting a structurally robust electronic property rather than a statistical fluke.
In parallel, the platform is screening earth-abundant oxide catalysts (perovskites, spinels,
mixed-metal oxides) as replacements for platinum-group metals (Pt, Pd, Rh — $30K–140K/kg)
in exhaust catalysis, green chemistry, and electrochemical applications. Candidates are scored
across thermal stability, redox activity, cost, and supply chain risk.
Both research tracks use the full autonomous discovery pipeline: corpus screening of 77,000+
materials, GNN-based property prediction, CHGNet structural validation, economic intelligence,
and DFT verification for top candidates.
Specific compositions, DFT results, and candidate rankings are maintained in restricted
research documentation accessible with authentication.
The Materials Engine has been used to screen and rank structural material systems for large-scale energy infrastructure applications, including environments with extreme hydrostatic pressure, permanent seawater immersion, and multi-decade service life requirements.
Using the multi-criteria scoring engine (11 dimensions, 7 penalty functions) and the manual
registry of 20+ engineering material families, the platform identified hybrid material systems
that outperform conventional Portland cement concrete in marine durability by eliminating
known chloride-attack and steel-corrosion failure modes.
The screening also produced technoeconomic models, LCOS calculations, and site selection
frameworks — demonstrating that Materials Engine extends beyond crystalline materials
discovery into full system-level engineering assessment.
Details are maintained in internal research documentation. This line demonstrates
the platform’s capability for applied industrial problems beyond its original
photovoltaic and catalysis verticals.
No existing system combines computational materials prediction with satellite-based mineral detection. KoBold Metals ($3B) uses AI + geophysics but has no GNN materials engine. SOST’s unique angle: predict what materials should exist computationally, then search for them in real satellite imagery. The Materials Engine already discovers exotic candidates — a future remote sensing module would tell you where to look for them on Earth.
| Stage | Capability | Status |
|---|---|---|
| Stage 1 | Materials Engine: corpus, GNN prediction, novelty, generation, frontier, triage, campaigns | Operational |
| Stage 2 | Remote sensing: Sentinel-2 mineral alteration maps for arid regions ($0) | Viability OK |
| Stage 3 | Weak integration: Materials Engine cross-references detected mineral classes | Planned |
| Stage 4 | GNN spectral prediction: predict signatures → search in satellite imagery | Research |
| Stage 5 | Proof of Discovery on SOST blockchain + geological data marketplace | Future |
Every phase uses free data and open-source tools first. Materials from JARVIS/MP/AFLOW/COD ($0). Satellite data from Sentinel-2/EMIT/Sentinel-1 ($0). Training on Google Colab ($0). Each phase must produce something useful and sellable before the next begins. No cloud spend until revenue justifies it. Current Materials Engine runs on a single VPS at $0/month compute cost.
| Focus | Novel, unknown, exotic, and under-explored materials — not a generic prediction service |
| Method | Novelty-first discovery: generate → filter by novelty → frontier rank → triage → validate |
| Learning | Active learning: detect where models fail, expand corpus in sparse regions, retrain selectively |
| Cost | Near-zero until revenue: open data, open models, CPU-first, cheap-first validation ladder |
| Materials Engine | Predictions are GNN baseline estimates (MAE ~0.15–0.34 eV). NOT DFT-validated. NOT experimentally confirmed. |
| Remote Sensing | Cannot see directly underground. Surface mineral mapping only. Subsurface inference requires alteration halo proxies. |
| Integration | GNN spectral prediction from crystal structure is a research challenge. Training data limited (~300–400 minerals with both). |
| Blockchain | Proof of Discovery is a concept. Does not replace physical validation or peer review. |
| Novelty | Assessed relative to ingested corpus only (76,193 materials), not all published science. A “novel” candidate may exist in databases we have not ingested. |
| Proxy | 34.5% of autonomous candidates rely on neighbor-proxy estimates rather than direct GNN. Proxy carries higher uncertainty. |
| Milestone | Detail |
|---|---|
| v2.1.0 | Multi-source corpus expansion + dedup foundation: 6 sources registered, staging engine, MP simulation (22% unique), expansion recommendation |
| v2.0.0 | Active learning orchestrator: error hotspots (3 found), coverage analysis (89 elements, 213 SGs), retraining proposals, corpus expansion planner |
| Niche Campaigns | 5 themed discovery presets with cross-campaign comparison: stable_semiconductor, wide_gap_exotic, high_novelty, balanced, generated_review |
| Pre-DFT Triage | 4-profile decision gate: strict/balanced/exotic/semiconductor — hard gates + reason codes + next-action recommendations |
| Frontier Engine | Dual-target multiobjectve ranking: FE stability + BG fit + novelty + exotic + structure quality + validation priority |
| Band Gap Model | ALIGNN-Lite 20K promoted: MAE=0.3422 eV, R²=0.707 (14% improvement over 2K baseline) |
| Training Ladder | Both targets: 5 rungs (5K→76K), 20K optimal. CGCNN wins FE, ALIGNN-Lite wins BG |
| Platform Strategy | GeaSpirit unified platform strategy + remote sensing viability report completed (docs/) |
| Phase V | Direct GNN inference: CGCNN forward pass on crystal structures. Real property prediction for autonomous candidates. |
| Autonomous Discovery | Iterative campaign engine with 8 profiles, error learning, validation queue (5 tiers), structure lift pipeline. 28/28 tests passing. |
| Material Mixer | 4 generation strategies: element substitution, single-site doping, mixed-parent, cross-substitution. Dual-output reports (technical + plain language). |
| Phase II Hardening | Chemical plausibility filters, charge balance heuristic, acceptance rate 61% → 27%. Known material sanity: 11/11 pass. |
| Phase V.C | Lift expansion (doping/mixed-parent), proxy suppression (60%→35%), novel direct GNN path (3%→22%), proxy cap at 0.55. 8 campaigns. |
| 47 Test Files | 1,073 test functions across 47 files covering full pipeline: schema through autonomous discovery (Phase V.C) |
Access to the Materials Discovery Engine is under study. Any future access model, if implemented, would be denominated in USD-equivalent terms to preserve stability and usability regardless of SOST market price fluctuations.
Pending study and validation of the best access method. The final access structure, technical route, and user model have not been fixed yet. Details will be published when the model has been validated and approved.
| Denomination | USD-equivalent (converted to SOST at market rate) |
| Model status | Under review — not yet finalized |
| Core principle | The algorithm is free. Access mechanism for spam prevention only. |
The Computational Materials Discovery Platform is an operational research system within the SOST ecosystem. It uses dedicated ML models (CGCNN, ALIGNN-Lite) and heuristic algorithms, not the ConvergenceX mining engine. Current predictions are baseline GNN estimates (MAE ~0.15 eV/atom for formation energy) and should not be treated as experimental measurements. Novelty assessment is relative to the ingested corpus only, not all published scientific literature. The platform does not yet perform DFT validation, phonon stability calculations, or experimental verification. Generated candidates are heuristic hypotheses requiring further computational or experimental validation. The code is open source (MIT License β fully open source