Materials Discovery Engine — 76,193 Materials

Target	Model	MAE	R²	Dataset
Formation Energy	GNN (restricted)	validated	restricted	76,193
Band Gap	GNN (restricted)	validated	restricted	76,193

02 — COMPUTATION

What Works Today

Implemented & Tested Capabilities

OPERATIONAL

Property Prediction	GNN models on real crystal structures // formation_energy, band_gap
Novelty Detection	104-dim fingerprint + cosine similarity scoring // known / near_known / novel_candidate
Exotic Ranking	Weighted rarity: element IDF + spacegroup rarity + neighbor sparsity
Candidate Generation	3 strategies: element substitution, stoichiometry perturbation, prototype remix
Structure Lift	Approximate crystal structures from parent prototypes for GNN evaluation
Intelligence Dossiers	Full reports with evidence tagging (known/predicted/proxy/unavailable)
Validation Queue	Cheap-first 6-stage ladder: dedup → novelty → proxy → DFT → external → learning
Benchmark + Calibration	Empirical error bands per bucket (element count, value range)
Structure Analytics	28 real descriptors: density, volume, lattice, bonds, symmetry, composition stats
Campaign Mode	8 presets: III-V, oxide, group IV, stable novel, valuable, strategic, battery, oxide-exploratory
Frontier Engine	Dual-target multiobjectve ranking: stability + BG fit + novelty + exotic + structure + priority
Validation Packs	Exportable packages with evidence, risk flags, next-step recommendations (JSON/MD/CSV)
Pre-DFT Triage	Hard gates + scoring: approved / manual_review / watchlist / reject — 4 profiles
Niche Campaigns	Themed discovery: stable_semiconductor, wide_gap_exotic, high_novelty, batch + compare
Active Learning	Error hotspot detection, coverage analysis, retraining proposals, corpus expansion planner

What Does NOT Exist Yet

PLANNED

DFT Validation	Ab-initio computation for candidate confirmation // Phase IV+
Phonon Stability	Vibrational analysis for thermal stability // Phase IV+
Structure Relaxation	M3GNet / CHGNet geometry optimization // Phase IV+
Blockchain PoD	On-chain proof-of-discovery with SOST integration // Phase V+
Remote Sensing	Satellite-based mineral detection + GNN spectral prediction // Vision — viability study complete
Community Compute	Marketplace for distributed materials simulation // Future

02.5 — AUTONOMOUS DISCOVERY

╔═══════════════════════════════════════╗
║ RESTRICTED ACCESS ║
║ Discovery console requires clearance ║
╚═══════════════════════════════════════╝

Server-side authentication required.
Research data and autonomous discovery console are restricted.

02.5 — AUTONOMOUS DISCOVERY

DISCOVERY CONTROL CONSOLE

ENGINE ONLINE ⓘ

DIRECT GNN RATE ⓘ

54.1%

PROXY DEPENDENCY ⓘ

45.9%

MEAN TOP-10 SCORE ⓘ

0.653

CAMPAIGNS ⓘ

14 profiles available

CORPUS ⓘ

76,193

materials indexed

PLAUSIBILITY ⓘ

0.767

// DISCOVERY PIPELINE

        SEED
        →
        GENERATE
        →
        FILTER
        →
        LIFT
        →
        GNN
        →
        SCORE
        →
        VALIDATE
        →
        HANDOFF
      

// LATEST CAMPAIGN RESULTS VERIFIED SNAPSHOT click to inspect

// VALIDATION QUEUE click tier for details

●

PRIORITY

DFT handoff ready

●

CANDIDATE

Stronger ML pass

●

REVIEW

Human inspection

●

WATCHLIST

Keep monitoring

●

REFERENCE

Known baseline

●

REJECTED

No further action

// TOP NOVEL CANDIDATES THEORETICAL — NOT VALIDATED

Evidence shows where the candidate comes from. Discovery Class shows how novel and risky it is.

FORMULA	CAMPAIGN	SCORE	FE (eV)	EVIDENCE	DISCOVERY CLASS
CoNaO₂	Battery	0.925	-1.419	KNOWN	REDISCOVERED
Zn₂O	Oxide	0.902	-0.294	DIRECT GNN	EXOTIC / HIGH-RISK
CdInTe	Strategic	0.892	-0.430	DIRECT GNN	EXOTIC / HIGH-RISK
CdSeTe	Strategic	0.892	-0.614	DIRECT GNN	THEORETICAL NOVEL
CoLiS₂	Battery	0.890	-0.810	KNOWN	REDISCOVERED

DIRECT GNN LIFTED STRUCTURE PROXY ESTIMATE KNOWN MATERIAL NOT VALIDATED

// CAMPAIGN INSPECTOR

DATA: LATEST VERIFIED RUN ·

■ GUIDE

What does this engine do?

Generates novel material candidates using element substitution strategies, evaluates them with GNN models (CGCNN for formation energy, ALIGNN-Lite for band gap), and routes them through a multi-tier validation queue.

What is Direct GNN?

A real CGCNN or ALIGNN-Lite forward pass on a lifted crystal structure. The strongest computational evidence this engine can produce — but still a model prediction, not experimental confirmation.

What is Proxy?

A neighbor-based estimate from similar materials in the corpus. Weaker than direct GNN — carries higher uncertainty because no actual model inference was performed on this specific candidate.

What does "novel" mean here?

Not found in our 76,193-material corpus (JARVIS + AFLOW). The candidate may exist in other databases we haven't ingested. Novelty is relative, not absolute.

What is structure lift?

Finding a similar material in the corpus and substituting elements to create an approximate crystal structure for the candidate. Required before direct GNN inference. Positions are NOT relaxed.

What happens after handoff?

A validation pack is generated with all evidence. The candidate enters the validation bridge for DFT or external computational validation. Results are fed back to recalibrate the engine.

Why are these not confirmed materials?

No DFT, no phonon stability analysis, no experimental synthesis or characterization has been performed. All candidates are theoretical hypotheses prioritized by computational heuristics. Confirmation requires independent validation.

Honest limitations: All candidates are theoretical. No DFT or experimental validation performed. Structure lift ≠ correct structure. GNN prediction ≠ confirmation. Novelty relative to 76,193-material corpus only. Proxy estimates (~46%) carry higher uncertainty.

ACTIVE RESEARCH LINE

EXHAUST CATALYST DISCOVERY

Computational search for cheap metal oxide catalysts to replace platinum group metals (Pt $30K/kg, Pd $40K/kg, Rh $140K/kg) in hydrocarbon exhaust cleaning. Using perovskites, spinels and mixed oxides with Fe, Mn, Co, Ni, Cu, Ce, La, Ti, Sr, Zr — metals costing $0.50-30/kg.

76,193

MATERIALS SCREENED

2,711

CANDIDATES PASSED

REGISTERED (SOST-CAT)

PRIORITY (DFT READY)

TIER A — PRIORITY SYNTHESIS

ID	FORMULA	NAME	STRUCTURE	BG (eV)	FE (eV/at)	COST	ORIGIN
SOST-CAT-01	Ba₄CeMn₃O₁₂	TripleSync	perovskite	1.52	-2.77	~$8/kg	DFT-PREDICTED (JARVIS)
SOST-CAT-02	MnTiO₃	RedoxTitan	ilmenite	0.28*	-2.42	~$6/kg	KNOWN MINERAL (pyrophanite)
SOST-CAT-03	Sr₂ZrMnO₆	DoublePerov	double perovskite	1.29	-3.07	~$15/kg	DFT-PREDICTED (JARVIS)
BENCH-01	CeO₂	Ceria	fluorite	2.21	-3.69	~$5/kg	COMMERCIAL (benchmark)

TIER B — PROMISING

SOST-CAT-04 Ca₃ScCoO₆ "ScandCobalt"
BG 1.53 | FE -2.97 | ~$35/kg
SOST-CAT-05 Sr₄Ti₃Nb₂O₁₅ "LayeredNiobate"
BG 1.65 | FE -3.24 | ~$20/kg
SOST-CAT-06 Ti₄MnP₆O₂₄ "NASICONcat"
BG 1.60 | FE -2.59 | ~$8/kg
SOST-CAT-07 AlBaO₃ "AluBar"
BG 0.0* | FE -2.80 | ~$2/kg
NOVEL — Phase 2

TIER C — EXPLORATION (8 candidates)

        SOST-CAT-08 FeMnO₃ "FerroMangan" | SOST-CAT-09 Fe₂MnO₃ "IronMangan" | SOST-CAT-10 AlMnO₃ "AluMangan"

        SOST-CAT-11 CeNiO₂ "CeriaNickel" | SOST-CAT-12 CoTiO₃ "CobaltTitan" | SOST-CAT-13 Sr₂TiMnO₆ "TiMnDouble"

        SOST-CAT-14 La₂MgNiO₆ "LaNiDouble" | SOST-CAT-15 ZrMnO₃ "ZircoMangan"

RESEARCH FAMILIES

Family 1: Ce-Mn-Ba

Lead: Ba₄CeMn₃O₁₂ "TripleSync"
Ce oxygen storage + Mn redox + Ba thermal stability
Strong literature support

Family 2: Mn-Ti

Lead: MnTiO₃ "RedoxTitan"
Ti structural stability + Mn redox activity
Known mineral, catalysis unexplored

Family 3: Sr-Mn

Lead: Sr₂ZrMnO₆ "DoublePerov"
Zr oxygen mobility + Mn redox in double perovskite
Moderate literature support

Family 4: Fe-Mn

Lead: FeMnO₃ "FerroMangan"
Cheapest redox pair: Fe $0.50 + Mn $2/kg
Novel — Phase 2 generated

COST vs PLATINUM GROUP METALS

Pt $30,000/kg
Pd $40,000/kg
Rh $140,000/kg
Fe $0.50/kg
Mn $2/kg
Ce $5/kg
Ti $10/kg
Sr $6/kg

Data provenance: Phase 1 screening from JARVIS-DFT (NIST, 76,193 materials). Phase 2 candidates generated by Materials Engine Autonomous Discovery (CGCNN + ALIGNN-Lite). Band gaps from DFT are underestimated ~30-50% (PBE limitation). Formation energies reliable ±0.1 eV/atom. Novel compositions (SOST-CAT-07 through SOST-CAT-11) have NO experimental validation — they are computational proposals. Full registry: docs/CATALYST_REGISTRY.md (private repo).

ACTIVE RESEARCH LINE

PHOTOVOLTAIC ABSORBER DISCOVERY

Non-toxic, earth-abundant solar cell materials to replace silicon (indirect band gap), lead perovskites (toxic Pb), CdTe (toxic Cd), and GaAs (too expensive). Target: direct band gap ~1.34 eV (Shockley-Queisser maximum 33.7% efficiency).

8,054

CANDIDATES (non-toxic)

1,566

TOXIC REJECTED (Pb,Cd)

REGISTERED (SOST-PV)

1.340

BEST BG (eV) = SQ PEAK

TOP CANDIDATES — SINGLE JUNCTION (1.1-1.5 eV)

ID	FORMULA	NAME	BG (eV)	FE	ORIGIN
SOST-PV-01	La₃S₃N	LanthanumNitrideSulfide	1.206 DFT	-2.27	DFT-VERIFIED SEMICONDUCTOR
SOST-PV-04	SrZrS₃	StrontiumZirconSulfide	1.243	-1.99	DFT (chalcogenide perovskite)
SOST-PV-06	Ca₂SnS₄	CalciumTinSulfide	1.384	-1.45	DFT (JARVIS)
SOST-PV-12	MnNa₂O₃	SodiumManganate-PV	1.310*	-1.90	NOVEL (Phase 2, ~$2/kg)
SOST-PV-13	CuFeO₂	Delafossite	~1.5	-1.13	KNOWN mineral (~$4/kg)

vs CURRENT TECHNOLOGY

Si 1.12eV indirect 26.8% $15/kg
PbPerovskite 1.55eV 26.1% TOXIC Pb
CdTe 1.45eV 22.1% TOXIC Cd
La₃S₃N 1.340eV SQ peak ~$5/kg NO TOXIC
MnNa₂O₃ 1.31eV NOVEL ~$2/kg NO TOXIC

RE₃S₃N FAMILY — TWO DIRECT-GAP MEMBERS VERIFIED:
• Sm₃S₃N: 1.24 eV DIRECT, m_e*=0.022 m₀ — INDUSTRIAL CANDIDATE ($13/kg). Ultralight carriers. Non-toxic.
• Dy₃S₃N: 1.39 eV DIRECT, m_e*=0.020 m₀ — premium candidate ($300/kg). Near SQ optimum.
• La₃S₃N: 1.16 eV indirect (baseline) | Y₄S₃N₂: 1.63 eV (tandem candidate)
Carrier effective masses 12x lighter than Si, 3x lighter than GaAs. All real materials (experimentally synthesized, Pnma structure). DOS, PDOS, band structure complete for Sm and Dy. 24 novel compositions tested — 0 survived (family is structurally special). Application as PV absorbers not yet experimentally demonstrated.

03 — ARCHITECTURE

Platform Architecture

Layer 1

Data

75,993 materials
100% with structures

Layer 2

Prediction

CGCNN / ALIGNN-Lite
MAE=0.15 eV/atom

Layer 3

Discovery

Candidate generation
novelty-first filtering

Layer 4

Intelligence

Dossiers, calibration
validation queue

Layer 5

Learning

Feedback memory
benchmark + retrain

Layer 1 — Data Foundation

COMPLETE

75,993 materials from JARVIS DFT 3D, all with validated crystal structures (CIF), band gap, formation energy, and spacegroup. 100% ML-ready. Ingested via bulk pipeline with structure backfill from jarvis-tools atoms.

JARVIS DFT 3D	75,993 materials — band gap, formation energy, spacegroup, full CIF structures
Materials Project	Normalizer ready, API key required // multi-source expansion planned
AFLOW + COD	Normalizers implemented // additional coverage for future phases
Fingerprints	75,993 precomputed 104-dim vectors — 94 compositional + 10 structural

Layer 2 — ML Property Prediction

OPERATIONAL

Graph neural networks trained on real crystal structures predict formation energy and band gap from CIF input. Training ladder validated at 5K, 10K, 20K, 40K, and 76K samples — 20K CGCNN promoted as production model.

CGCNN	Formation energy: MAE=0.1528 eV/atom, R²=0.9499 (20K training) // production
ALIGNN-Lite	Band gap: MAE=0.3422 eV, R²=0.707 (20K training) // production
Inference	Single-sample real-time via /predict endpoint

Layer 3 — Candidate Generation + Evaluation

OPERATIONAL

Generates plausible candidates from corpus parents via element substitution, stoichiometry perturbation, and prototype remix. Filters novelty-first, lifts approximate structures from parent prototypes, and evaluates with real GNN prediction.

Novelty Filter	Cosine similarity on 104-dim fingerprints: known / near_known / novel_candidate
Exotic Ranking	Weighted: 40% novelty + 20% element rarity + 15% structure rarity + 25% sparsity
Structure Lift	95% lift rate — element swap on parent structures, pymatgen validated
Campaigns	5 presets: exotic hunt, stability first, band gap target, T/P watchlist, novelty hunt

Layer 4 — Material Intelligence + Validation

OPERATIONAL

Every material or candidate gets a comprehensive dossier with evidence-tagged properties, application hypotheses, comparison tables, calibrated confidence, and validation priority.

Evidence	Every property tagged: `known` \| `predicted` \| `proxy` \| `unavailable`
Applications	Rule-based: semiconductor, PV, thermoelectric, catalytic, magnetic, structural, high-pressure
Calibration	Empirical confidence bands from benchmark MAE per element-count and value-range bucket
Validation Queue	6-stage cheap-first ladder with ROI scoring and dedup

Layer 5 — Feedback + Learning Loop

SCAFFOLD

Records predictions vs observations, identifies model failures by property and element family, tracks promising chemical regions, and builds retraining queues. Evidence bridge imports external data (JSON/CSV). Auto-links observations to predictions for feedback. Currently a scaffold — active retraining loop not yet triggered.

04 — ROADMAP

Development Roadmap

Phase	Scope	Status
Phase I	Data foundation: 4-source ingestion, canonical schema, SQLite storage, audit/export	Complete
Phase II	Baseline ML: CGCNN + ALIGNN-Lite training, /predict, /similar, model registry	Complete
Phase III.A	Novelty filter: 104-dim fingerprints, exotic ranking, cosine similarity bands	Complete
Phase III.B	Shortlist engine: configurable criteria, T/P proxy screening, decision bands	Complete
Phase III.C	Corpus scale: 75,993 materials, persistent fingerprints, fast retrieval, campaigns	Complete
Phase III.D	Candidate generation: element substitution, stoichiometry perturbation, prototype remix	Complete
Phase III.E	Structure lift + evaluation: prototype structures, real GNN prediction on candidates	Complete
Phase III.F	Material Intelligence: dossiers, evidence tagging, application classification, validation priority	Complete
Phase III.G	Validation queue + learning scaffold: cheap-first ladder, feedback memory, ROI scoring	Complete
Phase III.H	Evidence bridge + benchmark + calibration: empirical confidence bands, evidence-feedback linking	Complete
Phase III.I-J	Structure analytics (28 descriptors) + corpus backfill (100% CIF coverage)	Complete
Phase IV.A	Scaled retraining: formation energy ladder 5K→76K, CGCNN 20K promoted (MAE=0.1528)	Complete
Phase IV.B	Scaled retraining: band gap ladder, ALIGNN-Lite 20K promoted (MAE=0.3422)	Complete
Phase IV.C	Dual-target frontier engine: multiobjectve ranking with 4 profiles	Complete
Phase IV.D	Frontier-to-validation bridge: exportable packs with risk flags + next steps	Complete
Phase IV.E	Pre-DFT triage gate: hard gates, cheap-first 4-profile decision engine	Complete
Phase IV.F	Niche discovery campaigns: themed batch searches with cross-campaign comparison	Complete
Phase IV.G–U	Active learning, engine stabilization, public demo, operational acceptance (v3.2.0-RC1)	Complete
Phase V	Direct GNN inference: CGCNN/ALIGNN-Lite forward pass on lifted candidate structures	Complete
Phase V.B	GNN integration into autonomous pipeline, known-material penalty, validation queue hardening	Complete
Phase V.C	Lift expansion (doping/mixed), proxy suppression, novel direct GNN path, quality uplift	Complete
Phase VI	Public autonomous discovery dashboard, retro CRT console, evidence-first UX	Complete
Phase VII	Uncertainty-aware discovery: heuristic uncertainty, validation readiness, DFT handoff packs	Complete
Phase VIII	Validation bridge: lifecycle tracking, result ingestion, reconciliation, closed learning loop	Complete
Phase IX	Scientific operations: batch validation, evidence accumulation, longitudinal reporting	Complete
Phase X	Live data wiring, single source of truth, interactive campaign/candidate inspectors	Complete
Phase XI	Calibration intelligence, autonomy governance, chemistry-aware scoring, campaign intelligence, validation economics	Complete
Phase XII.B	Ion-separation discovery: functional scoring (7 signals), membrane/lithium/desalination awareness, 5 functional campaign profiles	Complete
Phase XIII	Relaxation bridge: structure repair heuristics, relaxation readiness triage, compute backend placeholders (M3GNet/CHGNet/DFT)	Complete
Phase XIV	Functional intersection discovery: cross-screening water/lithium/membrane, multi-function candidate detection, function-first prioritization	Complete
Phase XV	Full-corpus intersection scan (35,589 formulas): 11,339 multi-function candidates, real shortlists for lithium/water/membrane sweet spot	Current
Phase 30	Consensus multi-track ranking, PV risk flags (6 physics-informed), DFT triage queues (exploit/explore/cross-track). 21 candidates ranked → 11 DFT-queued.	Complete
Phase 31	DFT queue validation: 64% rediscovery rate detected → novelty filter needed. Expanded to 7 tracks.	Complete
Phase 32	Novelty-aware scoring: known-material penalty (42 entries). Rediscovery rate 64% → 0%. Novel candidates promoted.	Complete
Phase 34	ML surrogate prescreen (CHGNet): 17 candidates in 40 seconds. 64 DFT-hours saved. 4 unstable candidates eliminated before expensive computation.	Complete
Phase 35	First DFT validation completed. Top candidate converged: ferrimagnetic oxide, stable phase confirmed. Reclassified from photovoltaic to functional catalyst/electrode. Full pipeline validated end-to-end.	Complete
Phase 36+	DOS/band structure analysis, FM vs AFM comparison, convex hull verification, next candidate DFT queue	In progress
Phase XIX.B–C	Economic-functional expansion: abundance/cost scoring (USGS crustal ppm + $/kg), toxicity penalty, PGM detector, 6 mission profiles (catalysis, PV, ion separation, CO₂), forbidden/preferred element filters, economic composite formula	In Development

EXPANSION ROADMAP

Initiative	Description	Status
Database expansion	76K → 500K+ materials. Add Materials Project, OQMD, NOMAD, Materials Cloud, Open Catalyst Project APIs	Planned
PGM Replacement Engine	Find cheap alternatives to Pt/Pd/Rh/Ir/Ru. Target families: perovskites, spinels, Fe–N–C, sulfides, nitrides, carbides, phosphides. Forbidden element filters + PGM content detector operational	In Development
Multi-property GNN	Expand from 2 to 12+ predicted properties: stability, conductivity, magnetism, toxicity, cost, abundance, catalytic activity, corrosion resistance	Planned
MLIP validation layer	CHGNet/M3GNet/MACE relaxation before DFT. Pyramid: chemical filter → GNN → MLIP → DFT (only top 0.1%)	Partially implemented (CHGNet)
Structure generator v2	Perovskite templates, 2D slicing, high-entropy alloys, single-atom catalysts, vacancy defects, controlled doping	Planned
Mission profiles	6 targeted campaigns implemented: PGM-free catalyst, water splitting, low-cost PV, Li⁺ selective brine, desalination membrane, CO₂ capture. MissionProfile dataclass with target properties, element filters, weight tuning	In Development
Abundance & cost scoring	Crustal abundance (USGS/CRC ppm), cost proxy ($/kg), toxicity penalty, PGM detection, abundant replacement ratio. Log-scale scoring with economic composite formula	In Development
Uncertainty quantification	Ensemble + MC dropout. Uncertain → validation queue, not claimed discovery	Planned

Goal: evolve from materials generator to economic-functional discovery system — finding materials that are cheap, abundant, stable, and useful. All computation at $0 or near-zero using free-tier cloud, CPU local, and future SOST Proof-of-Useful-Computation network.

06 — VISION

Unified Platform Vision

GeaSpirit — Predict · Detect · Discover

VISION

GeaSpirit is the unified platform combining computational materials science with remote geospatial detection. The Materials Engine predicts properties and generates novel candidates. A future Remote Sensing module will detect mineral signatures from satellite imagery. Together they enable a closed discovery cycle:

PREDICT → PRIORITIZE → SEARCH → VALIDATE → REGISTER

Materials Engine

OPERATIONAL — v2.1.0
75,993 materials · 100% structures
CGCNN FE (MAE=0.15) + ALIGNN-Lite BG (MAE=0.34)
Frontier ranking + triage + niche campaigns
Active learning orchestrator + corpus expansion
655 tests · 70+ API endpoints

Remote Sensing Initiative

VIABILITY STUDY COMPLETE
Sentinel-2 (13 bands, free) for mineral alteration mapping
EMIT hyperspectral (285 bands, free) for mineral ID
Iron oxide mapping: 70–85% accuracy in arid regions
Zero-cost route designed — not yet built
Strategic initiative, not operational module

Integration Layer

FUTURE RESEARCH
GNN predicts spectral signatures from crystal structure
Remote sensing searches for those signatures in imagery
Requires DFT-to-spectrum training pipeline
RRUFF database: ~4,000 minerals with structure + spectra
Research challenge — not yet implemented

Applied Research — Next-Generation Photovoltaic & Catalyst Materials

ACTIVE R&D

The Materials Engine has identified non-toxic, earth-abundant semiconductor families with direct band gaps and ultralight carrier effective masses exceeding established photovoltaic materials — potentially competitive thin-film solar cell candidates at raw material costs comparable to silicon. Multiple compositions within the same crystal family independently exhibit direct gaps, suggesting a structurally robust electronic property rather than a statistical fluke.

In parallel, the platform is screening earth-abundant oxide catalysts (perovskites, spinels, mixed-metal oxides) as replacements for platinum-group metals (Pt, Pd, Rh — $30K–140K/kg) in exhaust catalysis, green chemistry, and electrochemical applications. Candidates are scored across thermal stability, redox activity, cost, and supply chain risk.

Both research tracks use the full autonomous discovery pipeline: corpus screening of 77,000+ materials, GNN-based property prediction, CHGNet structural validation, economic intelligence, and DFT verification for top candidates.

Specific compositions, DFT results, and candidate rankings are maintained in restricted research documentation accessible with authentication.

Applied Research — Structural Materials for Energy Infrastructure

INTERNAL R&D

The Materials Engine has been used to screen and rank structural material systems for large-scale energy infrastructure applications, including environments with extreme hydrostatic pressure, permanent seawater immersion, and multi-decade service life requirements.

Using the multi-criteria scoring engine (11 dimensions, 7 penalty functions) and the manual registry of 20+ engineering material families, the platform identified hybrid material systems that outperform conventional Portland cement concrete in marine durability by eliminating known chloride-attack and steel-corrosion failure modes.

The screening also produced technoeconomic models, LCOS calculations, and site selection frameworks — demonstrating that Materials Engine extends beyond crystalline materials discovery into full system-level engineering assessment.

Details are maintained in internal research documentation. This line demonstrates the platform’s capability for applied industrial problems beyond its original photovoltaic and catalysis verticals.

Why the Unified Platform Matters

No existing system combines computational materials prediction with satellite-based mineral detection. KoBold Metals ($3B) uses AI + geophysics but has no GNN materials engine. SOST’s unique angle: predict what materials should exist computationally, then search for them in real satellite imagery. The Materials Engine already discovers exotic candidates — a future remote sensing module would tell you where to look for them on Earth.

How the Platform Evolves

Stage	Capability	Status
Stage 1	Materials Engine: corpus, GNN prediction, novelty, generation, frontier, triage, campaigns	Operational
Stage 2	Remote sensing: Sentinel-2 mineral alteration maps for arid regions ($0)	Viability OK
Stage 3	Weak integration: Materials Engine cross-references detected mineral classes	Planned
Stage 4	GNN spectral prediction: predict signatures → search in satellite imagery	Research
Stage 5	Proof of Discovery on SOST blockchain + geological data marketplace	Future

Zero-Cost Strategy

Every phase uses free data and open-source tools first. Materials from JARVIS/MP/AFLOW/COD ($0). Satellite data from Sentinel-2/EMIT/Sentinel-1 ($0). Training on Google Colab ($0). Each phase must produce something useful and sellable before the next begins. No cloud spend until revenue justifies it. Current Materials Engine runs on a single VPS at $0/month compute cost.

Strategic Specialization

Focus	Novel, unknown, exotic, and under-explored materials — not a generic prediction service
Method	Novelty-first discovery: generate → filter by novelty → frontier rank → triage → validate
Learning	Active learning: detect where models fail, expand corpus in sparse regions, retrain selectively
Cost	Near-zero until revenue: open data, open models, CPU-first, cheap-first validation ladder

Honest Limitations

TRANSPARENCY

Materials Engine	Predictions are GNN baseline estimates (MAE ~0.15–0.34 eV). NOT DFT-validated. NOT experimentally confirmed.
Remote Sensing	Cannot see directly underground. Surface mineral mapping only. Subsurface inference requires alteration halo proxies.
Integration	GNN spectral prediction from crystal structure is a research challenge. Training data limited (~300–400 minerals with both).
Blockchain	Proof of Discovery is a concept. Does not replace physical validation or peer review.
Novelty	Assessed relative to ingested corpus only (76,193 materials), not all published science. A “novel” candidate may exist in databases we have not ingested.
Proxy	34.5% of autonomous candidates rely on neighbor-proxy estimates rather than direct GNN. Proxy carries higher uncertainty.

07 — RECENT

Recent Milestones

Milestone	Detail
v2.1.0	Multi-source corpus expansion + dedup foundation: 6 sources registered, staging engine, MP simulation (22% unique), expansion recommendation
v2.0.0	Active learning orchestrator: error hotspots (3 found), coverage analysis (89 elements, 213 SGs), retraining proposals, corpus expansion planner
Niche Campaigns	5 themed discovery presets with cross-campaign comparison: stable_semiconductor, wide_gap_exotic, high_novelty, balanced, generated_review
Pre-DFT Triage	4-profile decision gate: strict/balanced/exotic/semiconductor — hard gates + reason codes + next-action recommendations
Frontier Engine	Dual-target multiobjectve ranking: FE stability + BG fit + novelty + exotic + structure quality + validation priority
Band Gap Model	ALIGNN-Lite 20K promoted: MAE=0.3422 eV, R²=0.707 (14% improvement over 2K baseline)
Training Ladder	Both targets: 5 rungs (5K→76K), 20K optimal. CGCNN wins FE, ALIGNN-Lite wins BG
Platform Strategy	GeaSpirit unified platform strategy + remote sensing viability report completed (docs/)
Phase V	Direct GNN inference: CGCNN forward pass on crystal structures. Real property prediction for autonomous candidates.
Autonomous Discovery	Iterative campaign engine with 8 profiles, error learning, validation queue (5 tiers), structure lift pipeline. 28/28 tests passing.
Material Mixer	4 generation strategies: element substitution, single-site doping, mixed-parent, cross-substitution. Dual-output reports (technical + plain language).
Phase II Hardening	Chemical plausibility filters, charge balance heuristic, acceptance rate 61% → 27%. Known material sanity: 11/11 pass.
Phase V.C	Lift expansion (doping/mixed-parent), proxy suppression (60%→35%), novel direct GNN path (3%→22%), proxy cap at 0.55. 8 campaigns.
47 Test Files	1,073 test functions across 47 files covering full pipeline: schema through autonomous discovery (Phase V.C)

Open Materials Reference

The Materials Engine is the materials-knowledge layer of the SOST ecosystem — Predict · Detect · Discover. A graph-neural-network platform predicts material properties and generates novel candidates over a corpus of 76,193 materials (100% with crystal structures); this open, multilingual reference catalogues the materials themselves — known and experimental alloys, advanced materials, minerals and chemical elements — with composition, atomic- and structural-level data, properties and citable sources in 16 languages. Its purpose is materials discovery: to explore the composition and crystal-structure space at the atomic level and surface non-theorized alloy combinations whose predicted properties surpass today's materials — greater strength, hardness and availability at lower density and cost.

Computational platform

GNN property prediction (CGCNN / ALIGNN-Lite) for formation energy, band gap and more — MAE ≈ 0.15 eV/atom — over real crystal structures. Operational research system (v2.x).

Discovery & novelty

Autonomous discovery over the composition–structure space: a 104-dimension fingerprint with novelty-first filtering proposes non-theorized alloys and compounds and ranks them by predicted figures of merit — specific strength, hardness, cost and element availability — flagging those forecast to outperform incumbent materials, as candidates for synthesis and validation.

Open data foundation

Built on public materials databases (JARVIS, Materials Project, AFLOW, COD) with a canonical schema — plus this open reference of known alloys, minerals and elements.

How it works · DFT & quantum foundation

Density Functional Theory (DFT) is the quantum-mechanical workhorse of modern materials science. Instead of solving the many-electron Schrödinger equation directly, it reformulates the problem in terms of the electron density and solves the Kohn–Sham equations self-consistently — yielding a material's ground-state energy, electronic band structure, elastic moduli and thermodynamic stability from first principles, with no empirical fitting. The open databases this engine builds on (Materials Project, JARVIS, AFLOW, OQMD, COD) are largely DFT-computed. DFT is accurate but expensive — hours to days of compute per crystal — so the engine trains graph-neural-network surrogates on DFT data to predict properties in milliseconds, then reserves full DFT to verify the most promising non-theorized candidates: a predict-then-verify loop.

Where we are

Today the engine spans a 76,193-material corpus (100% with crystal structures), GNN property prediction at MAE ≈ 0.15 eV/atom for formation energy, a 104-dimension structural fingerprint with novelty-first filtering, and this open, multilingual reference of known and experimental alloys, minerals and elements with atomic- and structural-level data, photos and citable academic sources.

Where we are going

Next: close the predict → DFT-verify → synthesize loop; extend coverage from oxides and intermetallics into multi-principal-element (high-entropy) alloy space; add cost and raw-material-availability objectives to the candidate ranking; and publish reproducible candidate dossiers with content hashes for independent verification.

Relation to SOST Useful Compute

DFT and GNN materials discovery are deterministic, decomposable, replay-verifiable scientific workloads — precisely the class of computation SOST's Useful Compute architecture (Proof-of-Useful-Compute, via the Trinity research harness) is designed to host: tasks whose results can be re-executed and hash-matched across independent workers. The long-term vision is that materials-discovery jobs run as useful-compute tasks anchored on SOST. Note: Useful Compute rewards are NOT active — the infrastructure is in research / dry-run only, and any rewarded phase would require a future protocol redesign; nothing here implies active rewards.

Known Alloys

An open, growing reference of known engineered alloys — each with its composition, a description and links to Wikipedia, Wikidata or other public sources, in 16 languages. Curated and expanded incrementally.

Loading the known-alloys database…

Open data from Wikidata (CC0); images from Wikimedia Commons, each under its own licence — open an entry and follow the image link for author and licence.
This is an open-data reference for education and engineering triage. It is not a materials specification, certification or warranty.

Experimental Alloys & Advanced Materials

An open, research-sourced reference of experimental and advanced materials — carbon allotropes, aerogels, high-entropy alloys, metallic glasses, advanced steels, superalloys and 2D materials. Each fiche gives the composition, standout property, the improvement over conventional materials, applications, and whether it is commercial, lab-demonstrated or still theoretical. Fiche text is in English (scientific reference).

Loading the experimental-materials database…

Materials corpus	76,193 validated crystalline materials (JARVIS DFT + AFLOW)
Property prediction	Formation energy and electronic band gap from crystal structure
Candidate generation	Autonomous discovery via element substitution, doping, and mixed-parent strategies
Campaign profiles	29 profiles across catalysis, PV, emissions, proton energy, hydrogen storage, water, corrosive environments
Application domains	16 industrial domains: exhaust catalysis, green chemistry, proton batteries, H&sub2; storage, PV absorbers, water treatment, and more
Validation bridge	Readiness gate (R0-R5), DFT input generation (real VASP files), reconciliation and learning loop
Research tracks	Exhaust catalysis (15 candidates, maturity 7.8/10) + PV absorbers (19 candidates, maturity 6.2/10)
Chemistry awareness	Risk labeling, hazardous family tagging, honest novelty ladder (6 levels)
Cost per month	$0 — runs entirely on CPU

Corpus	76,193 materials — JARVIS DFT 3D (75,993) + AFLOW (200), 100% with validated CIF structures
ML Models	Dual-target GNN prediction (formation energy + band gap) — accuracy metrics restricted
Discovery	Autonomous engine — 29 campaign profiles across 16 industrial domains, direct GNN on lifted structures, readiness gate R0-R5, validation bridge with DFT input generation
Research Tracks	2 active — Exhaust catalysis (15 candidates, 7.8/10 maturity) + Photovoltaic absorbers (19 candidates, 6.2/10 maturity)
API	70+ endpoints — FastAPI with predict, frontier, triage, campaigns, autonomous discovery
Tests	155 tests passing across 4 test suites, zero regressions
Compute Cost	$0/month — runs on CPU, open data, no cloud required

Denomination	USD-equivalent (converted to SOST at market rate)
Model status	Under review — not yet finalized
Core principle	The algorithm is free. Access mechanism for spam prevention only.

Computational Materials
Discovery Platform

// WHAT IS THE MATERIALS DISCOVERY ENGINE

// CAPABILITIES

// RELATION TO Sovereign Stock Token

// RESTRICTED INFORMATION

Interactive Explorer

Computational Materials Platform

What Works Today

Platform Architecture

Development Roadmap

What Exists Now

Unified Platform Vision

Recent Milestones

Access Model

Research Disclaimer

Open Materials Reference

Computational platform

Discovery & novelty

Open data foundation

How it works · DFT & quantum foundation

Where we are

Where we are going

Relation to SOST Useful Compute

Known Alloys

Experimental Alloys & Advanced Materials

Computational Materials Discovery Platform

// WHAT IS THE MATERIALS DISCOVERY ENGINE

// CAPABILITIES

// RELATION TO Sovereign Stock Token

// RESTRICTED INFORMATION

Interactive Explorer

Computational Materials Platform

What Works Today

Platform Architecture

Development Roadmap

What Exists Now

Unified Platform Vision

Recent Milestones

Access Model

Research Disclaimer

Open Materials Reference

Computational platform

Discovery & novelty

Open data foundation

How it works · DFT & quantum foundation

Where we are

Where we are going

Relation to SOST Useful Compute

Known Alloys

Experimental Alloys & Advanced Materials

Computational Materials
Discovery Platform