Operationally accepted research platform for computational materials science. 76,193 materials, dual-target GNN property prediction, autonomous discovery engine with direct inference on novel candidates, 19 campaign profiles (including water/lithium/membrane functional verticals), functional intersection discovery, validation queue — all at zero cost on CPU.
An autonomous computational platform that predicts properties of crystalline materials from their structure — without requiring physical synthesis or experimental measurement. It accelerates materials discovery by screening thousands of candidates computationally before any lab work begins.
| Materials corpus | 76,193 validated crystalline materials (JARVIS DFT + AFLOW) |
| Property prediction | Formation energy and electronic band gap from crystal structure |
| Candidate generation | Autonomous discovery via element substitution, doping, and mixed-parent strategies |
| Campaign profiles | 19 profiles including battery, semiconductor, strategic, water/lithium/membrane |
| Functional discovery | Ion separation, desalination, lithium recovery, membrane candidate detection |
| Validation bridge | Prediction → observation lifecycle with reconciliation and learning |
| Chemistry awareness | Risk labeling (familiar/plausible/unusual/risky) with family trust calibration |
| Cost per month | $0 — runs entirely on CPU |
The Materials Engine demonstrates that the SOST ecosystem extends beyond cryptocurrency into real scientific utility. Materials discoveries can be registered on-chain as proof-of-discovery, with access controlled via SOST token payments. This creates a knowledge marketplace where scientific computation has real economic value backed by gold reserves.
Model architecture, neural network details, training methodology, hyperparameters, and prediction accuracy metrics are available only in the restricted technical documentation. Public API endpoints provide predictions without exposing the underlying models.
| Target | Model | MAE | R² | Dataset |
|---|---|---|---|---|
| Formation Energy | GNN (restricted) | validated | restricted | 76,193 |
| Band Gap | GNN (restricted) | validated | restricted | 76,193 |
# Health check curl https://sostcore.com/api/materials/health # Search by formula curl https://sostcore.com/api/materials/search?formula=GaAs # Corpus stats curl https://sostcore.com/api/materials/stats # Top exotic materials curl https://sostcore.com/api/materials/exotic/ranking/5 # Campaign presets curl https://sostcore.com/api/materials/campaigns/presets
The SOST Computational Materials Discovery Platform is an operational research system within the SOST ecosystem. It ingests data from public materials databases (JARVIS, Materials Project, AFLOW, COD), trains graph neural networks for property prediction, generates and evaluates novel material candidates, and produces comprehensive intelligence dossiers — all at near-zero compute cost on CPU.
| Corpus | 76,193 materials — JARVIS DFT 3D (75,993) + AFLOW (200), 100% with validated CIF structures |
| ML Models | Dual-target GNN prediction (formation energy + band gap) — accuracy metrics restricted |
| Discovery | Autonomous engine — 8 campaign profiles, direct GNN on lifted structures, validation queue |
| API | 145 endpoints — FastAPI with predict, frontier, triage, campaigns, autonomous discovery |
| Tests | 1,073 tests passing across 47 test files |
| Compute Cost | $0/month — CPU-only, open data, no cloud required |
| Property Prediction | GNN models on real crystal structures // formation_energy, band_gap |
| Novelty Detection | 104-dim fingerprint + cosine similarity scoring // known / near_known / novel_candidate |
| Exotic Ranking | Weighted rarity: element IDF + spacegroup rarity + neighbor sparsity |
| Candidate Generation | 3 strategies: element substitution, stoichiometry perturbation, prototype remix |
| Structure Lift | Approximate crystal structures from parent prototypes for GNN evaluation |
| Intelligence Dossiers | Full reports with evidence tagging (known/predicted/proxy/unavailable) |
| Validation Queue | Cheap-first 6-stage ladder: dedup → novelty → proxy → DFT → external → learning |
| Benchmark + Calibration | Empirical error bands per bucket (element count, value range) |
| Structure Analytics | 28 real descriptors: density, volume, lattice, bonds, symmetry, composition stats |
| Campaign Mode | 8 presets: III-V, oxide, group IV, stable novel, valuable, strategic, battery, oxide-exploratory |
| Frontier Engine | Dual-target multiobjectve ranking: stability + BG fit + novelty + exotic + structure + priority |
| Validation Packs | Exportable packages with evidence, risk flags, next-step recommendations (JSON/MD/CSV) |
| Pre-DFT Triage | Hard gates + scoring: approved / manual_review / watchlist / reject β 4 profiles |
| Niche Campaigns | Themed discovery: stable_semiconductor, wide_gap_exotic, high_novelty, batch + compare |
| Active Learning | Error hotspot detection, coverage analysis, retraining proposals, corpus expansion planner |
| DFT Validation | Ab-initio computation for candidate confirmation // Phase IV+ |
| Phonon Stability | Vibrational analysis for thermal stability // Phase IV+ |
| Structure Relaxation | M3GNet / CHGNet geometry optimization // Phase IV+ |
| Blockchain PoD | On-chain proof-of-discovery with SOST integration // Phase V+ |
| Remote Sensing | Satellite-based mineral detection + GNN spectral prediction // Vision β viability study complete |
| Community Compute | Marketplace for distributed materials simulation // Future |
Server-side authentication required.
Research data and autonomous discovery console are restricted.
75,993 materials from JARVIS DFT 3D, all with validated crystal structures (CIF), band gap, formation energy, and spacegroup. 100% ML-ready. Ingested via bulk pipeline with structure backfill from jarvis-tools atoms.
| JARVIS DFT 3D | 75,993 materials — band gap, formation energy, spacegroup, full CIF structures |
| Materials Project | Normalizer ready, API key required // multi-source expansion planned |
| AFLOW + COD | Normalizers implemented // additional coverage for future phases |
| Fingerprints | 75,993 precomputed 104-dim vectors — 94 compositional + 10 structural |
Graph neural networks trained on real crystal structures predict formation energy and band gap from CIF input. Training ladder validated at 5K, 10K, 20K, 40K, and 76K samples — 20K CGCNN promoted as production model.
| CGCNN | Formation energy: MAE=0.1528 eV/atom, R²=0.9499 (20K training) // production |
| ALIGNN-Lite | Band gap: MAE=0.3422 eV, R²=0.707 (20K training) // production |
| Inference | Single-sample real-time via /predict endpoint |
Generates plausible candidates from corpus parents via element substitution, stoichiometry perturbation, and prototype remix. Filters novelty-first, lifts approximate structures from parent prototypes, and evaluates with real GNN prediction.
| Novelty Filter | Cosine similarity on 104-dim fingerprints: known / near_known / novel_candidate |
| Exotic Ranking | Weighted: 40% novelty + 20% element rarity + 15% structure rarity + 25% sparsity |
| Structure Lift | 95% lift rate — element swap on parent structures, pymatgen validated |
| Campaigns | 5 presets: exotic hunt, stability first, band gap target, T/P watchlist, novelty hunt |
Every material or candidate gets a comprehensive dossier with evidence-tagged properties, application hypotheses, comparison tables, calibrated confidence, and validation priority.
| Evidence | Every property tagged: known | predicted | proxy | unavailable |
| Applications | Rule-based: semiconductor, PV, thermoelectric, catalytic, magnetic, structural, high-pressure |
| Calibration | Empirical confidence bands from benchmark MAE per element-count and value-range bucket |
| Validation Queue | 6-stage cheap-first ladder with ROI scoring and dedup |
Records predictions vs observations, identifies model failures by property and element family, tracks promising chemical regions, and builds retraining queues. Evidence bridge imports external data (JSON/CSV). Auto-links observations to predictions for feedback. Currently a scaffold — active retraining loop not yet triggered.
| Phase | Scope | Status |
|---|---|---|
| Phase I | Data foundation: 4-source ingestion, canonical schema, SQLite storage, audit/export | Complete |
| Phase II | Baseline ML: CGCNN + ALIGNN-Lite training, /predict, /similar, model registry | Complete |
| Phase III.A | Novelty filter: 104-dim fingerprints, exotic ranking, cosine similarity bands | Complete |
| Phase III.B | Shortlist engine: configurable criteria, T/P proxy screening, decision bands | Complete |
| Phase III.C | Corpus scale: 75,993 materials, persistent fingerprints, fast retrieval, campaigns | Complete |
| Phase III.D | Candidate generation: element substitution, stoichiometry perturbation, prototype remix | Complete |
| Phase III.E | Structure lift + evaluation: prototype structures, real GNN prediction on candidates | Complete |
| Phase III.F | Material Intelligence: dossiers, evidence tagging, application classification, validation priority | Complete |
| Phase III.G | Validation queue + learning scaffold: cheap-first ladder, feedback memory, ROI scoring | Complete |
| Phase III.H | Evidence bridge + benchmark + calibration: empirical confidence bands, evidence-feedback linking | Complete |
| Phase III.I-J | Structure analytics (28 descriptors) + corpus backfill (100% CIF coverage) | Complete |
| Phase IV.A | Scaled retraining: formation energy ladder 5K→76K, CGCNN 20K promoted (MAE=0.1528) | Complete |
| Phase IV.B | Scaled retraining: band gap ladder, ALIGNN-Lite 20K promoted (MAE=0.3422) | Complete |
| Phase IV.C | Dual-target frontier engine: multiobjectve ranking with 4 profiles | Complete |
| Phase IV.D | Frontier-to-validation bridge: exportable packs with risk flags + next steps | Complete |
| Phase IV.E | Pre-DFT triage gate: hard gates, cheap-first 4-profile decision engine | Complete |
| Phase IV.F | Niche discovery campaigns: themed batch searches with cross-campaign comparison | Complete |
| Phase IV.G–U | Active learning, engine stabilization, public demo, operational acceptance (v3.2.0-RC1) | Complete |
| Phase V | Direct GNN inference: CGCNN/ALIGNN-Lite forward pass on lifted candidate structures | Complete |
| Phase V.B | GNN integration into autonomous pipeline, known-material penalty, validation queue hardening | Complete |
| Phase V.C | Lift expansion (doping/mixed), proxy suppression, novel direct GNN path, quality uplift | Complete |
| Phase VI | Public autonomous discovery dashboard, retro CRT console, evidence-first UX | Complete |
| Phase VII | Uncertainty-aware discovery: heuristic uncertainty, validation readiness, DFT handoff packs | Complete |
| Phase VIII | Validation bridge: lifecycle tracking, result ingestion, reconciliation, closed learning loop | Complete |
| Phase IX | Scientific operations: batch validation, evidence accumulation, longitudinal reporting | Complete |
| Phase X | Live data wiring, single source of truth, interactive campaign/candidate inspectors | Complete |
| Phase XI | Calibration intelligence, autonomy governance, chemistry-aware scoring, campaign intelligence, validation economics | Complete |
| Phase XII.B | Ion-separation discovery: functional scoring (7 signals), membrane/lithium/desalination awareness, 5 functional campaign profiles | Complete |
| Phase XIII | Relaxation bridge: structure repair heuristics, relaxation readiness triage, compute backend placeholders (M3GNet/CHGNet/DFT) | Complete |
| Phase XIV | Functional intersection discovery: cross-screening water/lithium/membrane, multi-function candidate detection, function-first prioritization | Complete |
| Phase XV | Full-corpus intersection scan (35,589 formulas): 11,339 multi-function candidates, real shortlists for lithium/water/membrane sweet spot | Current |
| Phase XVI+ | Structure lift on top candidates, DFT validation pipeline, blockchain proof-of-discovery | Planned |
GeaSpirit is the unified platform combining computational materials science with remote geospatial detection. The Materials Engine predicts properties and generates novel candidates. A future Remote Sensing module will detect mineral signatures from satellite imagery. Together they enable a closed discovery cycle:
PREDICT → PRIORITIZE → SEARCH → VALIDATE → REGISTER
No existing system combines computational materials prediction with satellite-based mineral detection. KoBold Metals ($3B) uses AI + geophysics but has no GNN materials engine. SOST’s unique angle: predict what materials should exist computationally, then search for them in real satellite imagery. The Materials Engine already discovers exotic candidates — a future remote sensing module would tell you where to look for them on Earth.
| Stage | Capability | Status |
|---|---|---|
| Stage 1 | Materials Engine: corpus, GNN prediction, novelty, generation, frontier, triage, campaigns | Operational |
| Stage 2 | Remote sensing: Sentinel-2 mineral alteration maps for arid regions ($0) | Viability OK |
| Stage 3 | Weak integration: Materials Engine cross-references detected mineral classes | Planned |
| Stage 4 | GNN spectral prediction: predict signatures → search in satellite imagery | Research |
| Stage 5 | Proof of Discovery on SOST blockchain + geological data marketplace | Future |
Every phase uses free data and open-source tools first. Materials from JARVIS/MP/AFLOW/COD ($0). Satellite data from Sentinel-2/EMIT/Sentinel-1 ($0). Training on Google Colab ($0). Each phase must produce something useful and sellable before the next begins. No cloud spend until revenue justifies it. Current Materials Engine runs on a single VPS at $0/month compute cost.
| Focus | Novel, unknown, exotic, and under-explored materials — not a generic prediction service |
| Method | Novelty-first discovery: generate → filter by novelty → frontier rank → triage → validate |
| Learning | Active learning: detect where models fail, expand corpus in sparse regions, retrain selectively |
| Cost | Near-zero until revenue: open data, open models, CPU-first, cheap-first validation ladder |
| Materials Engine | Predictions are GNN baseline estimates (MAE ~0.15–0.34 eV). NOT DFT-validated. NOT experimentally confirmed. |
| Remote Sensing | Cannot see directly underground. Surface mineral mapping only. Subsurface inference requires alteration halo proxies. |
| Integration | GNN spectral prediction from crystal structure is a research challenge. Training data limited (~300–400 minerals with both). |
| Blockchain | Proof of Discovery is a concept. Does not replace physical validation or peer review. |
| Novelty | Assessed relative to ingested corpus only (76,193 materials), not all published science. A “novel” candidate may exist in databases we have not ingested. |
| Proxy | 34.5% of autonomous candidates rely on neighbor-proxy estimates rather than direct GNN. Proxy carries higher uncertainty. |
| Milestone | Detail |
|---|---|
| v2.1.0 | Multi-source corpus expansion + dedup foundation: 6 sources registered, staging engine, MP simulation (22% unique), expansion recommendation |
| v2.0.0 | Active learning orchestrator: error hotspots (3 found), coverage analysis (89 elements, 213 SGs), retraining proposals, corpus expansion planner |
| Niche Campaigns | 5 themed discovery presets with cross-campaign comparison: stable_semiconductor, wide_gap_exotic, high_novelty, balanced, generated_review |
| Pre-DFT Triage | 4-profile decision gate: strict/balanced/exotic/semiconductor — hard gates + reason codes + next-action recommendations |
| Frontier Engine | Dual-target multiobjectve ranking: FE stability + BG fit + novelty + exotic + structure quality + validation priority |
| Band Gap Model | ALIGNN-Lite 20K promoted: MAE=0.3422 eV, R²=0.707 (14% improvement over 2K baseline) |
| Training Ladder | Both targets: 5 rungs (5K→76K), 20K optimal. CGCNN wins FE, ALIGNN-Lite wins BG |
| Platform Strategy | GeaSpirit unified platform strategy + remote sensing viability report completed (docs/) |
| Phase V | Direct GNN inference: CGCNN forward pass on crystal structures. Real property prediction for autonomous candidates. |
| Autonomous Discovery | Iterative campaign engine with 8 profiles, error learning, validation queue (5 tiers), structure lift pipeline. 28/28 tests passing. |
| Material Mixer | 4 generation strategies: element substitution, single-site doping, mixed-parent, cross-substitution. Dual-output reports (technical + plain language). |
| Phase II Hardening | Chemical plausibility filters, charge balance heuristic, acceptance rate 61% → 27%. Known material sanity: 11/11 pass. |
| Phase V.C | Lift expansion (doping/mixed-parent), proxy suppression (60%→35%), novel direct GNN path (3%→22%), proxy cap at 0.55. 8 campaigns. |
| 47 Test Files | 1,073 test functions across 47 files covering full pipeline: schema through autonomous discovery (Phase V.C) |
Access to the Materials Discovery Engine is under study. Any future access model, if implemented, would be denominated in USD-equivalent terms to preserve stability and usability regardless of SOST market price fluctuations.
Pending study and validation of the best access method. The final access structure, technical route, and user model have not been fixed yet. Details will be published when the model has been validated and approved.
| Denomination | USD-equivalent (converted to SOST at market rate) |
| Model status | Under review — not yet finalized |
| Core principle | The algorithm is free. Access mechanism for spam prevention only. |
The Computational Materials Discovery Platform is an operational research system within the SOST ecosystem. It uses dedicated ML models (CGCNN, ALIGNN-Lite) and heuristic algorithms, not the ConvergenceX mining engine. Current predictions are baseline GNN estimates (MAE ~0.15 eV/atom for formation energy) and should not be treated as experimental measurements. Novelty assessment is relative to the ingested corpus only, not all published scientific literature. The platform does not yet perform DFT validation, phonon stability calculations, or experimental verification. Generated candidates are heuristic hypotheses requiring further computational or experimental validation. The code is open-source (MIT License) and all limitations are documented honestly in every response.