MATERIALS ENGINE · v3.2.0-RC1

Computational Materials
Discovery Platform

Operationally accepted research platform for computational materials science. 76,193 materials, dual-target GNN property prediction, autonomous discovery engine with direct inference on novel candidates, 19 campaign profiles (including water/lithium/membrane functional verticals), functional intersection discovery, validation queue — all at zero cost on CPU.

ACCEPTED | 76,193 materials · 145 API endpoints · 1,073 tests · Autonomous Discovery · $0/month

// WHAT IS THE MATERIALS DISCOVERY ENGINE

An autonomous computational platform that predicts properties of crystalline materials from their structure — without requiring physical synthesis or experimental measurement. It accelerates materials discovery by screening thousands of candidates computationally before any lab work begins.

// CAPABILITIES

Materials corpus76,193 validated crystalline materials (JARVIS DFT + AFLOW)
Property predictionFormation energy and electronic band gap from crystal structure
Candidate generationAutonomous discovery via element substitution, doping, and mixed-parent strategies
Campaign profiles19 profiles including battery, semiconductor, strategic, water/lithium/membrane
Functional discoveryIon separation, desalination, lithium recovery, membrane candidate detection
Validation bridgePrediction → observation lifecycle with reconciliation and learning
Chemistry awarenessRisk labeling (familiar/plausible/unusual/risky) with family trust calibration
Cost per month$0 — runs entirely on CPU

// RELATION TO SOST PROTOCOL

The Materials Engine demonstrates that the SOST ecosystem extends beyond cryptocurrency into real scientific utility. Materials discoveries can be registered on-chain as proof-of-discovery, with access controlled via SOST token payments. This creates a knowledge marketplace where scientific computation has real economic value backed by gold reserves.

// RESTRICTED INFORMATION

Model architecture, neural network details, training methodology, hyperparameters, and prediction accuracy metrics are available only in the restricted technical documentation. Public API endpoints provide predictions without exposing the underlying models.

00 — LIVE API

Interactive Explorer

// SEARCH MATERIAL
🌐 Accepts: English Β· EspaΓ±ol Β· FranΓ§ais Β· Deutsch Β· Italiano Β· Русский Β· δΈ­ζ–‡ Β· ζ—₯本θͺž Β· Ψ§Ω„ΨΉΨ±Ψ¨ΩŠΨ© + chemical formulas
// CORPUS STATISTICS
Loading...
// PRODUCTION MODELS
TargetModelMAEDataset
Formation EnergyGNN (restricted)validatedrestricted76,193
Band GapGNN (restricted)validatedrestricted76,193
// HOW TO USE (API)
# Health check
curl https://sostcore.com/api/materials/health

# Search by formula
curl https://sostcore.com/api/materials/search?formula=GaAs

# Corpus stats
curl https://sostcore.com/api/materials/stats

# Top exotic materials
curl https://sostcore.com/api/materials/exotic/ranking/5

# Campaign presets
curl https://sostcore.com/api/materials/campaigns/presets
// GOLDEN WORKFLOWS
1. Search a Known Material
GET /search?formula=GaAs → properties, spacegroup, source
2. Predict Band Gap
POST /predict {cif, target} → predicted value + confidence
3. Discover Exotic Materials
GET /exotic/ranking/10 → rarest materials by novelty + rarity
4. Run Discovery Campaign
POST /campaigns/run {type, top_k} → ranked candidates
5. Build Frontier Shortlist
POST /frontier/run {profile, top_k} → multi-objective selection
01 — OVERVIEW

Computational Materials Platform

Operational Platform — v2.0.0
LIVE

The SOST Computational Materials Discovery Platform is an operational research system within the SOST ecosystem. It ingests data from public materials databases (JARVIS, Materials Project, AFLOW, COD), trains graph neural networks for property prediction, generates and evaluates novel material candidates, and produces comprehensive intelligence dossiers — all at near-zero compute cost on CPU.

Corpus76,193 materials — JARVIS DFT 3D (75,993) + AFLOW (200), 100% with validated CIF structures
ML ModelsDual-target GNN prediction (formation energy + band gap) — accuracy metrics restricted
DiscoveryAutonomous engine — 8 campaign profiles, direct GNN on lifted structures, validation queue
API145 endpoints — FastAPI with predict, frontier, triage, campaigns, autonomous discovery
Tests1,073 tests passing across 47 test files
Compute Cost$0/month — CPU-only, open data, no cloud required
02 — COMPUTATION

What Works Today

Implemented & Tested Capabilities
OPERATIONAL
Property PredictionGNN models on real crystal structures // formation_energy, band_gap
Novelty Detection104-dim fingerprint + cosine similarity scoring // known / near_known / novel_candidate
Exotic RankingWeighted rarity: element IDF + spacegroup rarity + neighbor sparsity
Candidate Generation3 strategies: element substitution, stoichiometry perturbation, prototype remix
Structure LiftApproximate crystal structures from parent prototypes for GNN evaluation
Intelligence DossiersFull reports with evidence tagging (known/predicted/proxy/unavailable)
Validation QueueCheap-first 6-stage ladder: dedup → novelty → proxy → DFT → external → learning
Benchmark + CalibrationEmpirical error bands per bucket (element count, value range)
Structure Analytics28 real descriptors: density, volume, lattice, bonds, symmetry, composition stats
Campaign Mode8 presets: III-V, oxide, group IV, stable novel, valuable, strategic, battery, oxide-exploratory
Frontier EngineDual-target multiobjectve ranking: stability + BG fit + novelty + exotic + structure + priority
Validation PacksExportable packages with evidence, risk flags, next-step recommendations (JSON/MD/CSV)
Pre-DFT TriageHard gates + scoring: approved / manual_review / watchlist / reject β€” 4 profiles
Niche CampaignsThemed discovery: stable_semiconductor, wide_gap_exotic, high_novelty, batch + compare
Active LearningError hotspot detection, coverage analysis, retraining proposals, corpus expansion planner
What Does NOT Exist Yet
PLANNED
DFT ValidationAb-initio computation for candidate confirmation // Phase IV+
Phonon StabilityVibrational analysis for thermal stability // Phase IV+
Structure RelaxationM3GNet / CHGNet geometry optimization // Phase IV+
Blockchain PoDOn-chain proof-of-discovery with SOST integration // Phase V+
Remote SensingSatellite-based mineral detection + GNN spectral prediction // Vision β€” viability study complete
Community ComputeMarketplace for distributed materials simulation // Future
02.5 — AUTONOMOUS DISCOVERY
╔═══════════════════════════════════════╗
║  RESTRICTED ACCESS                   ║
║  Discovery console requires clearance  ║
╚═══════════════════════════════════════╝
USERNAME
PASSWORD

Server-side authentication required.
Research data and autonomous discovery console are restricted.

03 — ARCHITECTURE

Platform Architecture

Layer 1
Data
75,993 materials
100% with structures
Layer 2
Prediction
CGCNN / ALIGNN-Lite
MAE=0.15 eV/atom
Layer 3
Discovery
Candidate generation
novelty-first filtering
Layer 4
Intelligence
Dossiers, calibration
validation queue
Layer 5
Learning
Feedback memory
benchmark + retrain
Layer 1 — Data Foundation
COMPLETE

75,993 materials from JARVIS DFT 3D, all with validated crystal structures (CIF), band gap, formation energy, and spacegroup. 100% ML-ready. Ingested via bulk pipeline with structure backfill from jarvis-tools atoms.

JARVIS DFT 3D75,993 materials — band gap, formation energy, spacegroup, full CIF structures
Materials ProjectNormalizer ready, API key required // multi-source expansion planned
AFLOW + CODNormalizers implemented // additional coverage for future phases
Fingerprints75,993 precomputed 104-dim vectors — 94 compositional + 10 structural
Layer 2 — ML Property Prediction
OPERATIONAL

Graph neural networks trained on real crystal structures predict formation energy and band gap from CIF input. Training ladder validated at 5K, 10K, 20K, 40K, and 76K samples — 20K CGCNN promoted as production model.

CGCNNFormation energy: MAE=0.1528 eV/atom, R²=0.9499 (20K training) // production
ALIGNN-LiteBand gap: MAE=0.3422 eV, R²=0.707 (20K training) // production
InferenceSingle-sample real-time via /predict endpoint
Layer 3 — Candidate Generation + Evaluation
OPERATIONAL

Generates plausible candidates from corpus parents via element substitution, stoichiometry perturbation, and prototype remix. Filters novelty-first, lifts approximate structures from parent prototypes, and evaluates with real GNN prediction.

Novelty FilterCosine similarity on 104-dim fingerprints: known / near_known / novel_candidate
Exotic RankingWeighted: 40% novelty + 20% element rarity + 15% structure rarity + 25% sparsity
Structure Lift95% lift rate — element swap on parent structures, pymatgen validated
Campaigns5 presets: exotic hunt, stability first, band gap target, T/P watchlist, novelty hunt
Layer 4 — Material Intelligence + Validation
OPERATIONAL

Every material or candidate gets a comprehensive dossier with evidence-tagged properties, application hypotheses, comparison tables, calibrated confidence, and validation priority.

EvidenceEvery property tagged: known | predicted | proxy | unavailable
ApplicationsRule-based: semiconductor, PV, thermoelectric, catalytic, magnetic, structural, high-pressure
CalibrationEmpirical confidence bands from benchmark MAE per element-count and value-range bucket
Validation Queue6-stage cheap-first ladder with ROI scoring and dedup
Layer 5 — Feedback + Learning Loop
SCAFFOLD

Records predictions vs observations, identifies model failures by property and element family, tracks promising chemical regions, and builds retraining queues. Evidence bridge imports external data (JSON/CSV). Auto-links observations to predictions for feedback. Currently a scaffold — active retraining loop not yet triggered.

04 — ROADMAP

Development Roadmap

PhaseScopeStatus
Phase IData foundation: 4-source ingestion, canonical schema, SQLite storage, audit/exportComplete
Phase IIBaseline ML: CGCNN + ALIGNN-Lite training, /predict, /similar, model registryComplete
Phase III.ANovelty filter: 104-dim fingerprints, exotic ranking, cosine similarity bandsComplete
Phase III.BShortlist engine: configurable criteria, T/P proxy screening, decision bandsComplete
Phase III.CCorpus scale: 75,993 materials, persistent fingerprints, fast retrieval, campaignsComplete
Phase III.DCandidate generation: element substitution, stoichiometry perturbation, prototype remixComplete
Phase III.EStructure lift + evaluation: prototype structures, real GNN prediction on candidatesComplete
Phase III.FMaterial Intelligence: dossiers, evidence tagging, application classification, validation priorityComplete
Phase III.GValidation queue + learning scaffold: cheap-first ladder, feedback memory, ROI scoringComplete
Phase III.HEvidence bridge + benchmark + calibration: empirical confidence bands, evidence-feedback linkingComplete
Phase III.I-JStructure analytics (28 descriptors) + corpus backfill (100% CIF coverage)Complete
Phase IV.AScaled retraining: formation energy ladder 5K→76K, CGCNN 20K promoted (MAE=0.1528)Complete
Phase IV.BScaled retraining: band gap ladder, ALIGNN-Lite 20K promoted (MAE=0.3422)Complete
Phase IV.CDual-target frontier engine: multiobjectve ranking with 4 profilesComplete
Phase IV.DFrontier-to-validation bridge: exportable packs with risk flags + next stepsComplete
Phase IV.EPre-DFT triage gate: hard gates, cheap-first 4-profile decision engineComplete
Phase IV.FNiche discovery campaigns: themed batch searches with cross-campaign comparisonComplete
Phase IV.G–UActive learning, engine stabilization, public demo, operational acceptance (v3.2.0-RC1)Complete
Phase VDirect GNN inference: CGCNN/ALIGNN-Lite forward pass on lifted candidate structuresComplete
Phase V.BGNN integration into autonomous pipeline, known-material penalty, validation queue hardeningComplete
Phase V.CLift expansion (doping/mixed), proxy suppression, novel direct GNN path, quality upliftComplete
Phase VIPublic autonomous discovery dashboard, retro CRT console, evidence-first UXComplete
Phase VIIUncertainty-aware discovery: heuristic uncertainty, validation readiness, DFT handoff packsComplete
Phase VIIIValidation bridge: lifecycle tracking, result ingestion, reconciliation, closed learning loopComplete
Phase IXScientific operations: batch validation, evidence accumulation, longitudinal reportingComplete
Phase XLive data wiring, single source of truth, interactive campaign/candidate inspectorsComplete
Phase XICalibration intelligence, autonomy governance, chemistry-aware scoring, campaign intelligence, validation economicsComplete
Phase XII.BIon-separation discovery: functional scoring (7 signals), membrane/lithium/desalination awareness, 5 functional campaign profilesComplete
Phase XIIIRelaxation bridge: structure repair heuristics, relaxation readiness triage, compute backend placeholders (M3GNet/CHGNet/DFT)Complete
Phase XIVFunctional intersection discovery: cross-screening water/lithium/membrane, multi-function candidate detection, function-first prioritizationComplete
Phase XVFull-corpus intersection scan (35,589 formulas): 11,339 multi-function candidates, real shortlists for lithium/water/membrane sweet spotCurrent
Phase XVI+Structure lift on top candidates, DFT validation pipeline, blockchain proof-of-discoveryPlanned
05 — STATUS

What Exists Now

Available Now
76,193 materials with 100% crystal structures
Dual-target GNN prediction (CGCNN + ALIGNN-Lite)
Autonomous discovery engine with direct GNN inference
8 campaign profiles · structure lift · validation queue
Material intelligence dossiers with evidence tagging
Frontier ranking + triage + niche campaigns
Benchmark calibration with empirical confidence bands
28 real structure descriptors (density, bonds, symmetry)
1,073 tests passing · 145 API endpoints · 47 test files
In Progress
Phase XI: calibration intelligence + autonomy governance
Chemistry-aware scoring and campaign selection
Validation economics: evidence ROI optimization
15 campaign profiles · 5 autonomy levels
Planned / Future
DFT validation for top candidates
Phonon stability + equation of state
Structural relaxation (M3GNet/CHGNet)
Remote sensing: satellite mineral detection
GNN spectral signature prediction
Blockchain proof-of-discovery
Compute marketplace
06 — VISION

Unified Platform Vision

GeaSpirit — Predict · Detect · Discover
VISION

GeaSpirit is the unified platform combining computational materials science with remote geospatial detection. The Materials Engine predicts properties and generates novel candidates. A future Remote Sensing module will detect mineral signatures from satellite imagery. Together they enable a closed discovery cycle:

PREDICT → PRIORITIZE → SEARCH → VALIDATE → REGISTER

Materials Engine
OPERATIONAL — v2.1.0
75,993 materials · 100% structures
CGCNN FE (MAE=0.15) + ALIGNN-Lite BG (MAE=0.34)
Frontier ranking + triage + niche campaigns
Active learning orchestrator + corpus expansion
655 tests · 70+ API endpoints
Remote Sensing Initiative
VIABILITY STUDY COMPLETE
Sentinel-2 (13 bands, free) for mineral alteration mapping
EMIT hyperspectral (285 bands, free) for mineral ID
Iron oxide mapping: 70–85% accuracy in arid regions
Zero-cost route designed — not yet built
Strategic initiative, not operational module
Integration Layer
FUTURE RESEARCH
GNN predicts spectral signatures from crystal structure
Remote sensing searches for those signatures in imagery
Requires DFT-to-spectrum training pipeline
RRUFF database: ~4,000 minerals with structure + spectra
Research challenge — not yet implemented
Why the Unified Platform Matters

No existing system combines computational materials prediction with satellite-based mineral detection. KoBold Metals ($3B) uses AI + geophysics but has no GNN materials engine. SOST’s unique angle: predict what materials should exist computationally, then search for them in real satellite imagery. The Materials Engine already discovers exotic candidates — a future remote sensing module would tell you where to look for them on Earth.

How the Platform Evolves
StageCapabilityStatus
Stage 1Materials Engine: corpus, GNN prediction, novelty, generation, frontier, triage, campaignsOperational
Stage 2Remote sensing: Sentinel-2 mineral alteration maps for arid regions ($0)Viability OK
Stage 3Weak integration: Materials Engine cross-references detected mineral classesPlanned
Stage 4GNN spectral prediction: predict signatures → search in satellite imageryResearch
Stage 5Proof of Discovery on SOST blockchain + geological data marketplaceFuture
Zero-Cost Strategy

Every phase uses free data and open-source tools first. Materials from JARVIS/MP/AFLOW/COD ($0). Satellite data from Sentinel-2/EMIT/Sentinel-1 ($0). Training on Google Colab ($0). Each phase must produce something useful and sellable before the next begins. No cloud spend until revenue justifies it. Current Materials Engine runs on a single VPS at $0/month compute cost.

Strategic Specialization
FocusNovel, unknown, exotic, and under-explored materials — not a generic prediction service
MethodNovelty-first discovery: generate → filter by novelty → frontier rank → triage → validate
LearningActive learning: detect where models fail, expand corpus in sparse regions, retrain selectively
CostNear-zero until revenue: open data, open models, CPU-first, cheap-first validation ladder
Honest Limitations
TRANSPARENCY
Materials EnginePredictions are GNN baseline estimates (MAE ~0.15–0.34 eV). NOT DFT-validated. NOT experimentally confirmed.
Remote SensingCannot see directly underground. Surface mineral mapping only. Subsurface inference requires alteration halo proxies.
IntegrationGNN spectral prediction from crystal structure is a research challenge. Training data limited (~300–400 minerals with both).
BlockchainProof of Discovery is a concept. Does not replace physical validation or peer review.
NoveltyAssessed relative to ingested corpus only (76,193 materials), not all published science. A “novel” candidate may exist in databases we have not ingested.
Proxy34.5% of autonomous candidates rely on neighbor-proxy estimates rather than direct GNN. Proxy carries higher uncertainty.
07 — RECENT

Recent Milestones

MilestoneDetail
v2.1.0Multi-source corpus expansion + dedup foundation: 6 sources registered, staging engine, MP simulation (22% unique), expansion recommendation
v2.0.0Active learning orchestrator: error hotspots (3 found), coverage analysis (89 elements, 213 SGs), retraining proposals, corpus expansion planner
Niche Campaigns5 themed discovery presets with cross-campaign comparison: stable_semiconductor, wide_gap_exotic, high_novelty, balanced, generated_review
Pre-DFT Triage4-profile decision gate: strict/balanced/exotic/semiconductor — hard gates + reason codes + next-action recommendations
Frontier EngineDual-target multiobjectve ranking: FE stability + BG fit + novelty + exotic + structure quality + validation priority
Band Gap ModelALIGNN-Lite 20K promoted: MAE=0.3422 eV, R²=0.707 (14% improvement over 2K baseline)
Training LadderBoth targets: 5 rungs (5K→76K), 20K optimal. CGCNN wins FE, ALIGNN-Lite wins BG
Platform StrategyGeaSpirit unified platform strategy + remote sensing viability report completed (docs/)
Phase VDirect GNN inference: CGCNN forward pass on crystal structures. Real property prediction for autonomous candidates.
Autonomous DiscoveryIterative campaign engine with 8 profiles, error learning, validation queue (5 tiers), structure lift pipeline. 28/28 tests passing.
Material Mixer4 generation strategies: element substitution, single-site doping, mixed-parent, cross-substitution. Dual-output reports (technical + plain language).
Phase II HardeningChemical plausibility filters, charge balance heuristic, acceptance rate 61% → 27%. Known material sanity: 11/11 pass.
Phase V.CLift expansion (doping/mixed-parent), proxy suppression (60%→35%), novel direct GNN path (3%→22%), proxy cap at 0.55. 8 campaigns.
47 Test Files1,073 test functions across 47 files covering full pipeline: schema through autonomous discovery (Phase V.C)
08 — ACCESS

Access Model

Access Structure
UNDER REVIEW

Access to the Materials Discovery Engine is under study. Any future access model, if implemented, would be denominated in USD-equivalent terms to preserve stability and usability regardless of SOST market price fluctuations.

Pending study and validation of the best access method. The final access structure, technical route, and user model have not been fixed yet. Details will be published when the model has been validated and approved.

DenominationUSD-equivalent (converted to SOST at market rate)
Model statusUnder review — not yet finalized
Core principleThe algorithm is free. Access mechanism for spam prevention only.
09 — DISCLAIMER

Research Disclaimer

The Computational Materials Discovery Platform is an operational research system within the SOST ecosystem. It uses dedicated ML models (CGCNN, ALIGNN-Lite) and heuristic algorithms, not the ConvergenceX mining engine. Current predictions are baseline GNN estimates (MAE ~0.15 eV/atom for formation energy) and should not be treated as experimental measurements. Novelty assessment is relative to the ingested corpus only, not all published scientific literature. The platform does not yet perform DFT validation, phonon stability calculations, or experimental verification. Generated candidates are heuristic hypotheses requiring further computational or experimental validation. The code is open-source (MIT License) and all limitations are documented honestly in every response.