Internal autonomous research and validation system that generates hypotheses, validates them against local evidence, detects contradictions, scores utility and reduces overclaiming across Materials Engine, GeaSpirit, SOST protocol decisions and Useful Compute task design. Local-first. Free-first. Paid-last. No autonomous public claims.
The SOST AI Engine is an internal autonomous research system that generates hypotheses, compares evidence, detects contradictions and helps prioritize scientific and protocol decisions across SOST projects. It does not publish autonomous conclusions; all public outputs require human review.
No conclusion is accepted just because a model said it. Every important claim is classified by evidence level — local code, local data, local doc, multi-source, model-only, speculative, contradicted or insufficient — before it can be considered for any internal promotion, and never published without an explicit human review pass.
100,000 hypotheses are generated locally and cheaply. Only the best, most uncertain or most contradictory ones reach a multi-AI council. Network access is OFF by default; paid models are OFF by default and require an explicit operator flag.
Ten-level evidence classification (local_code_verified → do_not_publish), claim extractor, public-claim guard with safer-rewrite suggestions, eight seeded eval cases derived from real past mistakes (Useful Compute rewards postponed, avg288 vs avg1000, gold-redemption wording, GeaSpirit mineral guarantees, DFT-validated overclaim, etc.).
Read-only validators that cross-check claims against the actual local corpus. The unified
ValidatorResult carries verdict, evidence_level, confidence, publishability,
evidence_items, missing_evidence, risks and next_steps. The orchestrator merges multiple
validators with the strictest verdict winning.
validate_material_claim, validate_dft_status, low-cost / catalyst / photovoltaic / false-positive checksMass local hypothesis generation (binary, ternary, quaternary and doped compositions for materials; AOI×commodity for GeaSpirit; risky-wording and Heavy-task-design ideas for SOST/UC). Deterministic ranking with configurable weights. AI Council with validator-veto (not majority vote). Outcome-driven rule-based learning loop with append-only persistence.
Each hypothesis is enriched with a structured ApplicabilityProfile answering:
what it could be useful for, why theoretically, what evidence is missing, what false-positive
risks exist, what the next validation step is, and whether it's publishable.
Connects the engine to official / free public APIs (arXiv, OpenAlex, Crossref, PubChem, JARVIS, Materials Project, USGS) and to optional local / free AI providers (Ollama, OpenRouter, HuggingFace) — with cache, rate-limit, domain allowlist, citation tracking and source reliability scoring. Truth hierarchy: local validators > local data > official DB > peer-reviewed metadata > preprints > local LLM > free hosted > paid judge (last and opt-in).
The engine now runs as an internal autonomous research daemon: it observes
local Materials Engine, GeaSpirit, SOST and Useful Compute artefacts in read-only mode,
plans its own bounded tasks, executes safe local work, learns from outcomes via
rule-based memory, and produces reviewable archives for human approval.
Critically, it never publishes: every public claim must pass an explicit
human review and approval step before being exported — and the exporter writes only
to reports/ai_engine/approved_exports/, never to the public website.
summary.md, publication_candidates.md, do_not_publish.md, manifest.json, checksums.sha256, plus a .tar.gz archive
M7 wires the policy gate and judge plumbing for local + free AI providers (Ollama
local, OpenRouter / HuggingFace free models). Paid AI is hard-disabled
in M7 — even when a caller passes --allow-paid, the policy coerces
max_paid_calls to 0 and reports paid_judge as disabled. The
provider answer judge runs every reply against the canonical contradictions and the
public-claim guard, scores overclaim and hallucination, and embeds the canonical
correction in the corrected_answer.
provider_policy: explicit defaults all-OFF; allow_paid=True coerced to Falsefree_ai_model_registry: Ollama prefix allowlist (qwen/llama/mistral/phi/gemma/deepseek/codellama), OpenRouter only :free suffix, HuggingFace small free-inference listprovider_answer_judge: deterministic JudgeReport scoring overclaim/hallucination, with corrected_answer that appends the canonical truthlive_research_session: validators-only by default; every provider recorded as used=False with a skipped_reasoncompare_exchange-style check; is_available() never raisespassword field anywhereA small CLI-driven interface that lets the operator type a free-form prompt and have the AI engine search/reason over local knowledge of Materials Engine, GeaSpirit, SOST, Useful Compute and DEX, then return a cautious internal-only answer. Never publishes — even when the prompt explicitly asks to publish, the answer composer routes to needs_human_review.
prompt_router: 12 intent buckets (explain / search / compare / validate / generate_hypotheses / public_wording_review / useful_compute_task_design / dft_priority / geaspirit_public_safety / dex_safety / mining_help / create_report)project_selector: keyword-based routing across materials / geaspirit / sost / useful_compute / dex / mininganswer_composer: always runs contradiction_resolver + public_claim_guard; embeds canonical corrections; downgrades publishability on blocking findingsinternal_citation: lightweight registry for local-file references with inline [n] markers and a markdown bibliographyreports/ai_engine/ask/<ts>_<slug>/ with answer.md, evidence.json, files_consulted.txt, risk_report.md, manifest.jsoninternal_only, network off, paid false, no automatic publication, no website writeThe public-safe layer. Private AI thinks. Human reviews. Public site explains only approved safe knowledge. The public website (sost-help.html and sost-miner-troubleshooter.html) consumes only static JSON exported from a human-reviewed pipeline. The public site never calls the private engine, never queries Ollama / OpenRouter / HuggingFace / paid AI, and never uploads a log.
approved_knowledge_exporter: combines approved publication-queue items with 12 default safe FAQ templates; writes reports/ai_engine/approved_public_help/<ts>/ with index, markdown, troubleshooter rules, faq, safety + source manifests, README and sha256 checksumspublic_help_guard: exit-gate guard hard-blocking guaranteed profit, passive income, Useful Compute rewards are active, avg1000 consensus, confirmed/guaranteed mineral, DFT-validated, fully trustless DEX, any send/paste/share private key or seed phrase, the personal-email leak token, and the AI attribution leak tokenminer_troubleshooting_knowledge: 11 deterministic log-pattern rules (rejected-block, profile-mismatch, no-peers, connection-refused, bootstrap-chain, http-zero, too-many-threads, cmake/libsecp/libssl-missing, etc.) consumed by the local-only browser troubleshooterscripts/import_public_help_pack.py: validates the pack, refuses on missing safety_manifest / checksums / banned phrases, copies the JSON into website/data/; never auto-runs git-add / commit / pushThe closed feedback loop. Deterministic, rule-based, internal-only — no neural model, no network, no autopublish. The engine now records what actually happened to a candidate after it left review (DFT result, GeaSpirit verdict, public-wording correction, miner outcome, provider contradiction), turns each event into a clamped boost/penalty adjustment, and remembers the pattern for future ranking. The operator command center reads the same signal and recommends P0/P1/P2/P3 actions; the miner support triage classifies free-form miner text into a structured case and drafts a conservative reply that always passes through the public claim guard before being marked as low-risk. The public claim guard + contradiction resolver remain the only paths to public output.
OutcomeEvent log (28 outcome types across materials / GeaSpirit / useful compute / SOST / DEX / provider), deterministic derive(event) mapping to LearningAdjustment + PatternLesson rows, hard caps MAX_BOOST=0.30 / MAX_PENALTY=0.40 / MAX_NET_DELTA=0.50, pattern-memory upsert with +0.02 confidence bump per repeat (capped at 1.0), and a Markdown report under reports/ai_engine/learning/reports/ai_engine/operator/<ts>/ (index.md, actions.md, state.md, dashboard.txt)reports/ai_engine/support_cases/<ts>/, conservative community-reply drafter (wallet_safety replies are never auto-safe — they always require human ack), release-notice drafter, and help-refresh suggester that proposes Q/A items the operator can review (the suggester itself never publishes)scripts/sost_ai_ops.py with subcommands status, next-actions, risks, review-packs, learning, providers, miner-support, public-help-suggestions, full-reportis_safe_for_export() blocks targets {false_positive_risk, guard, task_design, provider_reliability} from leaving the engine as such; every adjustment passes through clamp_adjustment() before insertion; the loop is replayable, auditable and deterministic by constructionFrom "this candidate is interesting" to "this is the concrete next step, this is the experiment, this is what would confirm or kill it". Internal-only, deterministic, human-approved. A hypothesis becomes a structured dossier with current evidence, missing evidence, recommended path (literature / CHGNet / DFT input / GeaSpirit layer review / Useful Compute task design), draft experiments, deterministic pass/fail criteria, coarse compute-cost class, P0–P4 priority and publishability tag (internal_only by default). The AI may design validation work; it never executes heavy jobs and never publishes scientific claims.
ValidationDossier dataclass + 3 SQLite tables (validation_dossiers, validation_experiments, dossier_index), validation-plan router, ExperimentSpec dataclass with execution_allowed=False by default, deterministic pass/fail criteria, coarse cost-class estimator, priority + publishability policy, and the dossier renderer that writes dossier.md, dossier.json, experiment_plan.json, go_no_go.md, commands_draft.sh (mode 0644, every line #-commented), README.md, checksums.sha256HeavyTaskSpec with input/output schema, deterministic requirements, verification method, runtime target, dependency requirements, hardware requirements, and risk lists; conservative go/no-go (fake CPU burn / busy-wait / loop-forever → REJECT; too-light tasks → REJECT; paid / proprietary deps → REJECT; DFT-class without pinned version + pseudopotentials + tolerances + container → HOLD needs_reproducibility_solution); benchmark + verification plans as DRAFT-ONLY; strongest verdict the AI can issue is ready_for_benchmark — the rewarded phase remains gated by a separate human decisiondossier_index for fast filtered queries, substring search by subject + project + type, Markdown operator summary grouped by project / priority / publishability, integration into sost_ai_ops.py validation-dossiers and into the full-report bundleinternal_only; commands_draft.sh is non-executable by construction; Materials candidates and GeaSpirit AOIs always default to internal_only; Useful Compute campaigns always default to human_review_required; public exposure routes through the existing M9 export pipeline + human approvalFrom individual dossiers to organised campaigns. Internal-only, deterministic, human-approved. M11 decides what might be valid; M12 decides what to validate first. A typed batch of dossiers is selected by a per-type policy, ranked, and packaged into a draft-only execution pack (campaign.md, manifest.json, selected_dossiers.jsonl, budget.md, risk_report.md, manual_execution_plan.md, do_not_run_automatically.md, README, checksums) that the operator reviews before any compute or publication.
ValidationCampaign dataclass + 4 SQLite tables (validation_campaigns, campaign_dossiers, campaign_approvals, campaign_index), 12 campaign types (literature / CHGNet / DFT input / DFT relaxation draft / GeaSpirit layer review / GeaSpirit public safety / Useful Compute heavy benchmark design / Useful Compute reproducibility / Useful Compute fake-heavy rejection / Useful Compute schema design / public-wording safety / miner support improvement), status transitions ONLY through record_approval(...) (draft → ready_for_human_review → approved_for_manual_execution → completed), execution_allowed=False default that the system never flips, deterministic per-type selector + family/application diversification + coarse budget estimatorlearning_adjustments), CHGNet excludes red-flag candidates, DFT-class campaigns require literature + CHGNet support OR an explicit P0/P1 promotion, family-aware diversification (at most two same-family candidates before falling back), per-pack extras: selected_materials.csv, selected_materials.jsonl, go_no_go.mdgo / hold / block_public, per-pack extras: selected_aoi_claims.jsonl, layer_gap_matrix.csv, public_safety_matrix.md, go_no_go.mdselected_heavy_tasks.jsonl, benchmark_matrix.csv, reproducibility_matrix.md, input_output_schema_needs.md, go_no_go.mdcampaign_index rebuild + filtered queries, substring search by title + type + project, Markdown operator summary, sost_ai_ops.py validation-campaigns and campaign-next-actions subcommands, automatic inclusion in the full-report bundlemanual_execution_plan.md is mode 0644 with every line shell-commented (sh manual_execution_plan.md is a no-op); do_not_run_automatically.md ships in every pack as an explicit reminder
Planned Phase 3 milestone. The discovery theorist will turn the M11 dossier graph + M12
campaign outcomes into typed scientific theories, with explicit pre-conditions, observable
predictions, refutation paths, and a typed theory-graph that links theories to dossiers and
to one another. Not yet shipped. The M14 console exposes the
discover action but currently returns a safe unavailable envelope so
callers can already wire the route end-to-end while the engine is being designed.
internal_only
by default)theories, theory_links,
theory_evidence) with deterministic IDs and stable hashes for reproducible
snapshotsreports/ai_engine/discovery_dossiers/<id>/ and a Markdown summary
for the operator command center
Localhost-only web console for operating the M1–M12 capabilities of the engine
through prompts, quick actions, evidence panels and risk flags. Not exposed on
sostcore.com. Never publishes anything. Never runs heavy compute. No paid AI.
Started with python3 scripts/sost_ai_console.py, prints a one-shot URL with
an ephemeral token, scrubs the token from the address bar after first paint, and keeps
it in a JS closure for the lifetime of the page.
ThreadingHTTPServer bound to 127.0.0.1 by default;
0.0.0.0 rejected unless both --unsafe-bind-all and
--i-understand-this-exposes-the-private-console are passed;
secrets.token_urlsafe(32) session token, never persisted; constant-time
hmac.compare_digest bearer match; positive read/write allowlists; no
shell helper exists in the codebase; 50 testslocalStorage, no eval(),
no Function(); CSP default-src 'self'; CSS dark
command-center aesthetic with cyan/green/gold accents; sidebar with 14 sections;
chat panel with project + mode selectors, safety badges (NETWORK OFF, PAID LOCKED,
PUBLICATION LOCKED, LOCALHOST ONLY), 24+ canned quick prompts, evidence drawer with
colour-coded publishability, miner-log triage screen, reports browser, settings panel;
token never written to any DOM node; 22 testsask/ideate via ask_engine.ask;
validate/public_wording_review via
public_claim_guard.scan_text; create_dossier via M11
materials/geaspirit/useful-compute planners + insert_dossier +
dossier_renderer.render; create_campaign via
campaign_renderer.build_and_render (execution_allowed=False
unconditionally); triage_miner_log via miner_support_triage;
draft_reply via community_reply_drafter;
next_actions via operator_command_center;
discover returns a safe-unavailable envelope until M13 lands; legacy
do_not_publish publishability mapped onto canonical blocked;
48 testsconsole_sessions + console_messages +
console_actions tables (idempotent migrations on top of the engine's
existing persistence layer); console_conversation high-level API
(new_chat / rename_chat / append_user /
append_assistant / list_chats / load_chat);
console_export writes
reports/ai_engine/console_exports/<UTC>_<sid8>/ with
conversation.md, manifest.json (schema
sost_ai_console_export@v1), and checksums.sha256; clearing
local history requires the literal confirmation
"YES_DELETE_LOCAL_HISTORY"; 16 tests"Useful Compute rewards active" never returns
public_safe, miner-rejected-block log triages correctly, dossier
creation writes a folder under
reports/ai_engine/validation_dossiers/..., no
website/ writes occur during a typical action flow,
discover returns the M13 unavailable envelope; comprehensive operator
guide in docs/multi_ai_console_operator_guide.md;
11 testseval() / Function() / inline scripts / inline event
attributes, paid + publication + heavy execution all locked, only allowlisted
directories are readable / writable, OPTIONS preflights rejected (no CORS surface),
and SSH tunnel is the documented remote-access pattern (no public port)Turns the M14 console into a daily cockpit: every action is now persisted in SQLite, conversations can be reloaded and exported, the action surface covers review-pack / outcome / daemon / operator-status / learning-report / dossier-list / campaign-list, and a strict report browser lets the operator read internal artifacts without ever leaving the safe allowlist. All security locks remain in place.
console_messages + console_actions on
every persistable action; the in-memory ring stays as a lightweight audit cache.
ConsoleState.current_session_id is filled lazily on the first action.
New endpoints GET /api/history, GET /api/history/load,
POST /api/history/{new,rename,clear} with explicit
confirm="CLEAR_CONSOLE_HISTORY" guard. The session token NEVER appears
in any persisted row. (14 tests)project_observer.snapshot_all + review_pack.build_review_pack;
outcome_record → OutcomeEvent + outcome_ingestor.ingest;
daemon_once → autonomous_daemon.run_once with
dry_run=True always, allow_paid=False, console-side hard
caps max_tasks ≤ 25 / max_runtime ≤ 300 s;
operator_status → full operator_command_center.collect;
operator_risks → P0/P1 filter; learning_report →
learning_report.generate; validation_dossiers / validation_campaigns
→ their respective M11/M12 indexes. (14 tests)resolve_safe(rel) rejects absolute paths,
parent-traversal, ".." inside any segment, and symlinks that escape
reports/ai_engine/. New routes GET /api/reports/{tree,view,
search} — HTML files are tagged html_escaped, never
rendered as live HTML. Search is case-insensitive, capped at 200 hits and 64 KB
per file. (15 tests)/api/history; + NEW button; EXPORT button (copies the exact CLI
command to the clipboard — the export script writes to disk, not the JS);
CLEAR button with a confirmation modal; creativity selector (conservative /
speculative / wild / fantastic / red_team) sent on every prompt; per-mode
placeholder hints; run-status spinner on the Send button. No new external
dependencies; eval() / Function() / external scripts
still absent. (17 tests)GET /api/providers/status returns booleans only:
ollama_available (via shutil.which, never executed),
paid_locked: true, publication_locked: true,
network_enabled: false by default, plus six
*_token_present booleans (OPENROUTER / HUGGINGFACE / ANTHROPIC /
OPENAI / GROQ / TOGETHER) — only bool(env_var), never the
value itself. The status reader makes no network call. (7 tests)The console no longer answers "Generate 20 membrane ideas" with generic policy text. M16 turns the SOST AI Engine into an internal speculative scientist: it invents structured hypotheses with mechanism, falsifier and validation path; it separates plausible from speculative-but-testable from wild from fantastic; and it self-criticizes every idea. Everything stays internal_only. Nothing publishes, nothing executes DFT/CHGNet/GIS, nothing uses paid AI.
Idea dataclass with stable 16-char id
(sha256 of project|domain|title), five score fields clamped to
[0,1] (novelty / plausibility / utility / falsifiability / absurdity), seven
evidence levels (locally_supported / plausible / speculative_but_testable /
wild_but_testable / fantastic_unvalidated / non_testable_now / rejected),
publishability defaulting to internal_only, and a priority() rule
that drops to P3 for any non-testable idea regardless of score.Critique with
strongest_for / strongest_against / easiest_falsifier / most_likely_failure /
promotes_if / kills_if. Every idea has a falsifier; ideas with
falsifiability_score < 0.3 cannot rank above low priority.learning_adjustments and lowers patterns previously rejected,
flagging them known_weak_pattern. Wild / fantastic creativity may
still revisit weak patterns but they are clearly labelled.reports/ai_engine/ideas/<UTC>_<slug>/ with
ideas.md, ideas.jsonl, ranking.csv,
falsifiers.md, validation_paths.md,
risk_report.md, manifest.json (schema
sost_ai_ideas@v1), and checksums.sha256 covering the
other files. idea_index SQLite table for fast filtered queries.ideate action returns the
numbered idea list (mechanism / why-might-work / why-might-fail / first-test /
falsifier / next step per idea). The discover action uses the
same engine with creativity="fantastic" by default; the M13
typed-theory-graph deferral is surfaced as a warning, not a refusal. (38 tests)scripts/multi_ai_generate_ideas.py mirrors
the engine for offline use: --project, --domain,
--count, --creativity, --prompt,
--json, --no-render, --no-persist.internal_only by default;
fantastic / non_testable_now ideas keep that publishability; the renderer
honours the M14 write allowlist; the engine never calls the network and never
spawns a subprocess; all randomness is deterministic from the prompt + count
+ creativity tuple.Closes the answer-quality bug reported on M16: the console used to fall through to "General internal answer — see project context above." for practical materials questions like "que material es el mejor para hacer nanopartículas fácilmente?". M17 routes those prompts through a structured answerer that returns a real ranked recommendation with mechanism, advantages, risks and first validation step per option. Pure stdlib, no network, no paid AI, no DFT claim.
evidence_source: general_scientific_reasoning,
confidence: medium, publishability: internal_only and
no local DFT / CHGNet / lab artifact was consulted — this is a practical
recommendation, not a validation. The engine never claims DFT validation
without an actual artifact.answer_composer.compose() consults the
answerer before falling back to the legacy "see project context above" line. If
the answerer matches, its body replaces the placeholder and its next_actions are
merged into the result envelope.
Closes the multi-repo gap reported on M17 / unified-lab launch: the readonly adapters
used to hard-code the project root to materials-engine-private/, so the AI
engine never actually saw the sibling GeaSpirit / SOST core / GeaDeep / Materials
Discovery repos. M18 ships a stdlib JSON registry that declares every repo the engine
may read, plus five canonical capability gates with strict deny-by-default semantics.
The engine still cannot publish, run DFT, touch consensus or call the network
unless the registry explicitly allows it — and the shipped registry never does.
src/multi_ai_review/project_registry.json declares six projects:
materials (~/SOST/materials-engine-private, the only project with
can_write_repo: true), geaspirit (~/SOST/geaspirit,
read-only sibling), geadeep (~/SOST/geadeep-energy-private,
read-only sibling), sost (~/SOST/sostcore/sost-core, read-only;
consensus surface), materials_discovery
(~/SOST/materials-engine-discovery, read-only archive),
useful_compute (lives inside materials-engine-private; read-only).
Schema tag sost_ai_project_registry@v1.project_registry.py with lazy + thread-safe cache.
Resolution order: explicit path argument >
SOST_AI_PROJECT_REGISTRY env var > the JSON next to the loader.
A missing file, invalid JSON, wrong schema or unknown project all yield an empty
registry — deny-by-default still applies.can_run_dft(project),
can_publish(project),
can_touch_consensus(project),
can_use_network(project),
can_write_repo(project). Plus
gate_summary(project) for status responses. Every action that
touches one of these axes consults the gate before running.False. Project present but flag absent → False
(the registry-wide defaults are themselves all False). Only an
explicit "can_X": true in a project block flips a gate. Belt+braces
in the test suite asserts that the shipped JSON never grants
can_touch_consensus, can_publish or
can_run_dft for any project.adapters/*_readonly.py keeps the historical
parents[3] as an explicit _FALLBACK_ROOT and adds a
_repo_root() helper that consults the registry first. The
module-level _REPO_ROOT alias is preserved so existing callers keep
working. The live unified-lab daemon picks up the new resolution on next process
restart without disruption mid-loop.~/SOST/geaspirit (the real sibling repo, not the materials
repo). SOST adapter now reads from ~/SOST/sostcore/sost-core.
Useful Compute remains read-only inside materials-engine-private. Materials is
the only project that can be written to from the AI engine.console_security.ALLOWED_WRITE_SUBDIRS still apply on top of the
gates; the daemon still runs with --network off.
22 new tests; 1,188 / 1,188 total green.Closes the observability gap reported on the unified-lab launch: the daemon was running but its trail was not measurable. M19 adds three small layers that make autonomy verifiable — without granting it any new permissions. Every task the engine plans, dispatches or refuses leaves a row; every cycle leaves a heartbeat; the scoreboard rolls both up into a Markdown + JSON dashboard.
outcome_ledger.py with the seven canonical statuses
(planned / executed / skipped /
failed / useful / wrong /
repeated). Stores project, task_type, subject, summary, gate_blocked
flag + reason, input/output hashes, saved_path. Helpers:
record, list_outcomes, counts_by_status,
positive_negative_ratio, gate_block_rate,
already_seen(input_hash) for repeat detection.
Distinct from M10's subject-level outcome events — both layers coexist.project_heartbeat.py records one row per project per daemon cycle:
tasks_generated, tasks_useful, tasks_failed, gate_blocks, routes_read,
memory_updates, free-form notes. Operators can answer "is materials still
learning, or just looping?" at a glance.autonomous_scoreboard.py rolls the ledger + last heartbeat + the
eight registry gates per project up into a payload tagged
sost_ai_autonomous_scoreboard@v1. Renders both
autonomous_scoreboard.md and
autonomous_scoreboard.json under
reports/ai_engine/operator/ (already on the M14 write allowlist —
no allowlist change needed).can_execute_heavy_task (DFT/CHGNet/GIS),
can_create_public_draft (public-facing artifact prep), and
can_update_memory (write into the AI engine's learning memory).
Every project denies all three by default;
only materials grants can_update_memory
(its own memory). The full canonical list is now eight gates wide.can_execute_heavy_task;
no project grants can_create_public_draft;
only materials grants can_update_memory;
unknown project denies all eight gates;
the M18 invariants on the original five gates remain unchanged.scripts/multi_ai_scoreboard.py renders the dashboard
(--db / --out-dir / --since-hours /
--json). Live smoke against an empty ledger returns the canonical
schema tag, six projects, eight gates per project, and zeroed totals across the
seven statuses.can_update_memory=true for materials. Still no auto-DFT, no
auto-publication, no auto-commits, no consensus surface. The unified-lab
daemon picks up the new tables on next process restart; no mid-loop
disruption.
26 new tests; 1,214 / 1,214 total green.Closes the "method" gap: the engine could ideate (M16), recommend (M17), and run autonomously (M19), but it could not ask itself whether its proposed rules would have actually worked on past data. M20 ships the scientific-method layer: hypothesis → plan → counterfactual replay against M10 history → GO / MAYBE / NO-GO → memory. No new execution permissions — the engine still cannot run anything, only plan and simulate.
baseline_registry.py reads M10's outcome_events table
and classifies each row as positive (e.g. dft_success,
chgnet_stable, literature_supported,
human_promoted, provider_useful_answer...) or
negative (dft_failure, chgnet_red_flag,
human_rejected, provider_overclaim_confirmed...).
Optional baseline_overrides.json next to the engine wins on
conflict. Baselines dedupe per (project, subject) — twelve "FeS2 promoted"
events count as one positive signal, not twelve.experiment_planner.py + experiment_plans SQLite table.
plan(hypothesis, project) picks project-aware step lists:
photovoltaic-flagged materials hypotheses get a band_gap_estimate (DFT
input draft only — never run) step; generic materials get the
membrane/catalyst step list; geaspirit gets layer-gap + false-positive +
publication-safety; useful_compute gets schema design + fake-heavy +
reproducibility. Stable plan_id (sha256-derived) so the same hypothesis at the
same time produces the same id.replay_sandbox.py. A Rule is a callable
(subject, context) -> bool. replay() applies the
rule to the historical positives and negatives and returns precision, recall,
accuracy, F1 plus the TP/FP/TN/FN counters and a verdict.
Decision matrix: precision ≤ 0.30 OR recall ≤ 0.20 → NO-GO;
precision ≥ 0.65 AND recall ≥ 0.50 → GO; otherwise MAYBE.historical_count < min_historical (default 20) means the
strongest possible verdict is MAYBE — never GO, regardless of
how clean precision and recall look. The system stays honest about its own
confidence.reports/ai_engine/experiments/<UTC>_<slug>/ with
plan.md, plan.json, replay_result.json,
decision.md (GO/MAYBE/NO-GO + reason) and
checksums.sha256. Honours the M14 write allowlist; new entry
reports/ai_engine/experiments added.can_plan_experiment and can_replay_experiment are
true for materials / geaspirit / useful_compute (the projects
that produce hypotheses); false for sost / geadeep /
materials_discovery. can_execute_experiment is
false for every project and asserted by the
test suite. The canonical gate set is now eleven wide.scripts/multi_ai_plan_experiment.py turns a hypothesis into a
plan + on-disk folder; scripts/multi_ai_replay_experiment.py
counterfactually evaluates a keyword-based rule against M10 history and
prints the verdict. Both refuse to run on projects that lack the gate.can_execute_experiment
is denied for every registered project.
25 new tests; 1,239 / 1,239 total green.Closes the "criterion" gap: the engine could ideate, plan, replay and score, but it had no explicit compass. M21 codifies the project objectives, runs every hypothesis through a six-role scientist swarm, mines contradictions for refinement opportunities, and persists the canonical memory of the cycle. The system stops being "the AI that proposes things" and starts being "the AI that asks itself if its proposals would have worked" — guided by an operator-defined mission. No new execution permissions.
canonical_objectives.py codifies eight materials objectives
(defensible_discovery, cost_abundance_stability, reduce_false_positives_pre_dft,
catalysts_no_pgm, non_toxic_pv, hydrogen_proton_membranes, water_desalination,
industrial_robust), six geaspirit objectives (real_world_systems,
deep_sea_environment, cost_near_concrete, proxy_vs_proof, multi_layer_evidence,
public_safety_review), two geadeep objectives (deep_sea_energy,
manufacturability_at_scale) and two useful_compute objectives (determinism,
reproducibility). Each objective carries align_keywords + kill_criteria +
proxies as data.mission_alignment.py scores a hypothesis against the canonical
objectives for its project. Per-objective score in [0, 1], aggregate weighted
score, fired-kill-criteria flag, matched-objective count. Refuses to run when
can_rank_mission_alignment is denied.ai_scientist_swarm.py runs six pure-function roles on every
hypothesis — Discoverer (bold expansion), Skeptic (kill-criteria + weak
proxies), Engineer (fabricable / lab_only / unknown), Economist (cheap /
moderate / expensive via Pt/Pd/Rh/In/Ga/Cd/Te keywords), Validator (cheapest
test menu, project-aware), Historian (consults outcome_ledger for
revisit_failed / consistent_with_history / contested_history). Mean of the six
role scores becomes the swarm score.critique_loop.py turns disagreements between roles into structured
opportunities: "is there a doping/sibling that resolves the contradiction?
does the candidate serve another application better? is the contradiction a
false positive in the predictor? does the candidate deserve a plan or an
archive?". Archive policy: low swarm_score (< 0.25), or revisit_failed +
weak discoverer, or kill-criterion-fired + score < 0.4 forces archive.canonical_memory.py persists per-cycle summaries
(mission_cycles table) and per-hypothesis records
(mission_hypotheses table). Gated by can_update_memory
from M19 — only materials writes here.can_generate_hypothesis, can_critique_hypothesis,
can_rank_mission_alignment — true for materials,
geaspirit, useful_compute; false for sost, geadeep,
materials_discovery. Canonical gate set is now fourteen wide
(5 from M18 + 3 from M19 + 3 from M20 + 3 from M21).scripts/multi_ai_mission_cycle.py runs the full pipeline end to
end: read objectives -> M16 generate N hypotheses -> swarm + critique
loop -> mission_alignment -> top-K -> M20 plan + replay -> render
reports/ai_engine/mission_cycle/<UTC>_<slug>/ with
cycle_manifest.json, hypotheses.jsonl,
top_picks.md, contradictions.md,
next_best_actions.md; per-hypothesis row in the M19 outcome ledger
+ one heartbeat row.can_update_memory remains true only for materials.
The live unified-lab daemon keeps running unchanged — M21 modules are ready
but not yet wired into its hot path.
33 new tests; 1,272 / 1,272 total green.Closes the "agenda" gap. M21 gave the engine voices and a compass; M22 gives it a map and a calendar. Every recent hypothesis is grouped into a research frontier, gaps in coverage are surfaced, the next cycle's attention is split across explore / exploit / falsify / review, and a weekly roadmap is rendered alongside an anti-obsession watchdog. No new execution permissions; the system still cannot run anything autonomously.
frontier_map.py reads M21 mission_hypotheses + M19
outcome_ledger + canonical_objectives. Coarse family detection via ordered
substring rules (LDH / kesterite / antimony chalcogenide / oxysulfide /
single-atom / PGM / phosphide / nitride / sulfide / layered oxide / ceramic).
Each Frontier carries hypothesis count, average swarm + alignment
scores, three representative subjects, positive/negative outcome counts and a
status label (active / thin / stagnant
/ contested).research_gap_detector.py surfaces five canonical gap kinds:
many_ideas_no_evidence (many hypotheses, zero positives),
evidence_no_exploration (positives but no recent extension),
recurring_contradictions (same swarm contradiction string seen
≥ 3 cycles), promising_no_validation (high alignment + swarm
but no experiment plan attached), objective_uncovered (canonical
objective with zero matched hypotheses).strategic_allocator.py proposes the next cycle's attention
split — 40 % exploit (active frontiers with the highest
combined score), 30 % explore (uncovered objectives + thin
frontiers), 20 % falsify (high-promise hypotheses without a
plan), 10 % review (stagnant / contested frontiers + recurring
contradictions). Each item carries weight, target, detail and rationale.anti_obsession_guard.py watches the family distribution of the
last 30 mission_hypotheses. If any one family covers more than 40 % of the
window (and the sample is at least 10), it raises an
ObsessionFlag with an explicit "diversify with creativity=wild
or seed prompts from a different family" suggestion. Saves the engine from
getting locked onto a pretty chemistry just because keywords keep matching.canonical_roadmap.py combines all four pieces into one
operator-facing plan: try this week (exploit + falsify),
explore this week, archive / discard (contested + thin
frontiers with low scores), wait — needs more data (review items),
plus an anti-obsession notes block. Internal-only; the roadmap is
advisory and does not authorise execution.can_build_frontier_map, can_allocate_research_attention,
can_write_roadmap — true for materials,
geaspirit, useful_compute; false for sost, geadeep,
materials_discovery. The canonical gate set is now seventeen wide
(5 + 3 + 3 + 3 + 3). The hard locks (run_dft / publish / touch_consensus /
execute_*) remain false everywhere.scripts/multi_ai_frontier_cycle.py runs the pipeline end to end
and renders reports/ai_engine/frontier/<UTC>_<slug>/
with the six canonical files: frontier_map.json,
frontier_map.md, research_gaps.md,
strategic_allocation.json, weekly_roadmap.md,
anti_obsession_report.md.
Closes the "cabin" gap. M14 is a chat-style console; M23 gives operators a
read-only operations dashboard at http://127.0.0.1:8766 plus a
background autopilot that runs M21 mission cycles + M22 frontier cycles on a
configurable interval. Localhost only. Token-gated. No public surface.
No new execution permissions.
ai_ops_token.py. issue(ttl_seconds=4h) returns the
cleartext exactly once; on disk only the SHA-256 over a per-issue salt is
persisted. verify() uses constant-time
hmac.compare_digest plus an expiry check. revoke()
unlinks the record. status() returns metadata without the
cleartext.ai_ops_state.py pulls from M19 (outcome ledger + heartbeats),
M20 (experiment plans), M21 (canonical memory), M22 (frontier map) and the
registry gate set. full_state() is the one-shot snapshot the
dashboard's /api/ops/state endpoint serves.ai_ops_dashboard_server.py — stdlib
ThreadingHTTPServer, refuses non-local bind with
PermissionError (remote access via SSH tunnel only). Single HTML
page (CSP default-src 'self', no CDN, no cookies, no
localStorage), one JS file at /static/ops.js, four
JSON endpoints under /api/ops/* — all
/api/* require Bearer token.ai_ops_autopilot.py. tick() runs one M21 mission
cycle plus one M22 frontier cycle for every project that holds the full
pipeline gate set; disallowed projects produce a TickResult
with ok=false and a clear "missing required gates" error.
loop() refuses interval_seconds < 30.
Best-effort: never raises mid-loop. Persists outcome ledger + heartbeats;
writes canonical memory only when can_update_memory permits
(materials only).scripts/sost_ai_generate_token.py issues a fresh token and
prints the dashboard URL once on stdout;
scripts/sost_ai_ops_dashboard.py starts the HTTP server
(refuses non-local hosts);
scripts/sost_ai_autopilot.py ticks once or loops indefinitely
with operator-tunable creativity / count / interval./api/* endpoints are
read-only; no DFT, no publish, no consensus, no commit.
23 new tests; 1,325 / 1,325 total green.Closes the "what counts as a heavy task?" gap reported on the Useful Compute live trial. The public API and worker stay as a dry-run infrastructure; M24 ships a private lab where the AI engine classifies, spec-generates and stages heavy-task candidates. Public publishing and reward activation remain hard-locked behind gates that no project grants in the shipped registry — the operator's CTO verdict.
heavy_task_classifier.py with the five canonical accept axes
the operator promised: is_useful (no busy-wait / fake-heavy),
is_deterministic (declared + no race / unseeded random hint),
is_auditable (declared + no not-auditable hint),
is_heavy_enough (runtime ≥ 60 s AND memory ≥ 256 MB), and
is_safe_to_verify (no "verifier rerun required" /
"no replay possible" hint). Eight curated keyword vocabularies. A task that
fails ANY axis is rejected with the offending axis listed.heavy_task_spec_generator.py. TaskSpec + per-project
schema templates (materials / geaspirit / useful_compute), pinned-deps
declaration, fixed-seed policy, replay = ~10 % of original runtime, explicit
fake-heavy baseline ("busy-wait must NOT match the output within
tolerances"). Reward class always starts at no_reward;
visibility is internal_only when DFT / raw-geospatial / wallet /
consensus keywords match, otherwise human_review_required.useful_compute_private_queue.py + SQLite
useful_compute_private_queue table. stage() refuses
when can_stage_private_useful_compute_task is denied — only the
useful_compute project holds that gate.
attempt_publish() and attempt_enable_rewards() are
the documented entry points for future operator-only workflows but
always return {ok: false, reason: "denied"}
because every project denies the corresponding gate.useful_compute_task_intelligence.py orchestrates classify ->
spec -> (optional stage). seed_candidates(project) exposes
operator-blessed seed examples per project (5 materials, 5 geaspirit, 3
useful_compute) — every seed is engineered to pass the classifier so
operators have a known-good baseline.can_design_useful_compute_task (true: materials, geaspirit,
useful_compute), can_stage_private_useful_compute_task
(true: useful_compute ONLY),
can_publish_useful_compute_task (false EVERYWHERE),
can_enable_useful_compute_rewards (false EVERYWHERE).
Canonical gate set is now twenty-one wide. Belt+braces
tests assert publish + reward gates remain false for every project.scripts/multi_ai_useful_compute_task_lab.py with
--seeds (operator-blessed templates), --title +
--description (ad-hoc candidate), --stage
(private staging when the gate permits) and --json for the
full pipeline payload.Closes the loop between the autonomous engine and the human operator. The AI may *propose* concrete decisions (campaigns, dossiers, false-positive archives, frontier promotions, useful-compute staging requests, DFT input prep); the operator approves or rejects them. Golden rule: the AI may ask for permission. The AI cannot grant permission to itself.
approval_request.py. ApprovalRequest dataclass
with deterministic request_id (SHA-256 over project / kind /
subject / timestamp). Eight canonical kinds:
approve_campaign, reject_hypothesis,
convert_to_dossier, stage_useful_compute,
prepare_dft_input, archive_false_positive,
promote_frontier_family, demote_frontier_family.
Lifecycle pending → approved | rejected | withdrawn.
create() gated by can_create_approval_request;
approve() and reject() require a non-empty
operator argument — the engine cannot generate one
from inside an automated tick.operator_inbox.py. Aggregates pending approvals across
projects, sorts by kind priority (false-positive archive ≫ approve
campaign), and renders a markdown table for the CLI plus
sost_ai_operator_inbox@v1 JSON for the dashboard.daily_brief.py. Five-section UTC report — what the engine
did (recent ledger rows), learned (recent canonical
mission cycles), found (top frontier families + research gaps),
blocked (capability-gate denials in window), and recommends
(open approval requests). Saved under
reports/ai_engine/daily_brief/<UTC>/ as
brief.json + brief.md. Default 24h lookback.operator_feedback.py. On approve →
records a useful task outcome in M19's ledger plus a positive
hypothesis_learning_event. On reject → records
a wrong outcome plus a negative learning event so the
M21 swarm down-weights the pattern. Idempotent per request_id.
Gated by can_apply_operator_feedback — only
materials holds that gate.executive_summary.py. Top-3-per-section operator view per
project: opportunities (frontier families ranked by
swarm × count + outcome bias), risks (research gaps +
gate-block density + negative-signal pressure),
next actions (oldest pending approvals)./api/ops/approvals, /api/ops/daily-brief,
/api/ops/executive-summary,
/api/ops/decision-history. All localhost-only and
token-gated; payloads are read-only.scripts/sost_ai_daily_brief.py,
scripts/sost_ai_operator_inbox.py,
scripts/sost_ai_approve_request.py,
scripts/sost_ai_reject_request.py. Approve/reject scripts
require --operator; both refuse to run with an empty
operator name. Optional --apply-feedback flag pipes the
decision into the M19 ledger.can_create_approval_request (true for materials, geaspirit,
useful_compute), can_apply_operator_feedback (true only for
materials), can_execute_approval
(false everywhere — even a human approval does not
unlock automated execution; the engine still refuses to run side-effects
from an approval row by itself).| level | meaning | publishable? |
|---|---|---|
local_code_verified | verified by reading repo source | yes |
local_data_verified | verified by reading a local DB / data file | yes |
local_doc_supported / local_report_supported | supported by a local doc or internal report | yes |
external_official_supported | supported by an official free public source (arXiv, OpenAlex, ...) | with caveat |
multi_source_supported | supported by ≥ 2 independent source types | yes |
model_consensus_only | only models agree; no data or doc backs it | no |
speculative | weak signals, not reproducible | no |
contradicted | local evidence contradicts the claim | no |
insufficient_evidence | nothing collected to back the claim | no |
do_not_publish | public-facing claim with insufficient backing | block |
Generates and ranks material candidates by family, application and element-cost proxy.
Only candidates with explicit DFT or CHGNet evidence in the local corpus may be
recommended for promotion to a real DFT queue. Predicted-only candidates are clearly
marked predicted_only — never validated.
Evaluates mineral / depth / coordinate / certainty claims with built-in conservatism. Any
wording implying guaranteed mineral presence, certain depth or 100% certainty is blocked
at do_not_publish. Depth claims require depth-aware geophysics (gravity /
magnetics / AEM); satellite-only signals are labelled as surface proxy, never as
subsurface evidence.
Audits public wording, consensus and explorer claims. Locks: avg288 is consensus (avg600 / avg1000 are informational); cASERT 6210 fork was cancelled; "mandatory update" requires both fork wording and a documented activation height. Layered atop the Phase 1 public-claim guard.
Designs candidate Heavy tasks (DFT relax, CHGNet pre-screen, GeaSpirit feature extraction, ...) and classifies each into ready_for_design, needs_benchmark, needs_reproducibility_solution, too_light, deferred or rejected. Every profile carries the explicit "rewards postponed" warning. The engine never activates rewards and never writes to a queue.
| Tests passing | 1,402 / 1,402 (M1 + M2 + M3 + M4 + M5 + M6 + M7 + M8 + M9 + M10 + M11 + M12 + M14 + M15 + M16 + M17 + M18 + M19 + M20 + M21 + M22 + M23 + M24 + M25; M13 deferred) |
| Free public source connectors | 8 (arXiv, OpenAlex, Crossref, PubChem, JARVIS, Materials Project, USGS, generic-official) |
| Hypothesis generation capacity | ≥ 100,000 candidates offline, in seconds |
| Network calls per run (default) | 0 — network is OFF by default |
| Paid model calls per run (default) | 0 — paid is OFF by default |
| Persistence | SQLite, idempotent migrations, 39 internal tables (9 added in M10 for the outcome learning loop, 3 added in M11 for the dossier factory: validation_dossiers, validation_experiments, dossier_index, 4 added in M12 for the campaign orchestrator: validation_campaigns, campaign_dossiers, campaign_approvals, campaign_index, 3 added in M14 for the private console: console_sessions, console_messages, console_actions, 1 added in M16 for the speculative discovery lab: idea_index) |
| Public outputs | none autonomous — human review required |
| Source license | private repo — outputs only released after audit |
This is an experimental internal research system. It is not a product, not a financial recommendation engine, not a guaranteed scientific oracle. Its outputs are internal only by default, and any candidate that is later promoted to a public claim must pass through a human review pass and an explicit M2 validator verdict.
The Scientific Intake Engine is the working layer inside the Materials Engine that turns free-form scientific text (papers, news, proposals, speculative ideas) into a structured, audited research pipeline: claims → evidence → multi-agent reasoning → falsification-first experiment plan → feedback loop → portfolio campaigns → knowledge graph → dossiers → review queue → autonomous orchestration. Ten phases shipped end-to-end. Every phase is local-first, deterministic, append-only, and offline by default. Zero paid API dependencies.
Impossible-physics never escapes the kitchen.
Sources flagged with closed-timelike-curve / perpetual-motion /
over-unity / exceeds-Carnot language are capped at REJECT,
cannot be promoted to high-priority, cannot enter a publication
package, and cannot be transitioned to published —
even with explicit operator override.
No auto-publish. The autonomous orchestrator
only ever enqueues review items; the operator must walk them
through inbox → reviewing → approved →
published manually.
No procedural detail, ever. A shared sanitiser
scrubs grams / temperatures / pressures / ignition / detonation
cues from every report and plan field before persistence.
Auth on writes, public on reads (with the
single exception of the operational review surface, which is
auth on both sides).
Scientific text
→ Phase I claims / entities / 6 scores / hypotheses
→ Phase II evidence acquisition (Crossref / arXiv / Semantic Scholar)
→ Phase III multi-agent reasoning + decision memo
→ Phase IV falsification-first experiment plan + safety gates
→ Phase V feedback loop + active learning + calibration
→ Phase VI autonomous campaigns + portfolio ranking
→ Phase VII knowledge graph + clusters + opportunities
→ Phase VIII dossiers + reports (Markdown / JSON / HTML)
→ Phase IX review queue + state machine + publication packages
→ Phase X autonomous orchestration (5 modes, one endpoint)
Free-form text intake with claim extraction, named-entity
recognition (materials, formulas, technologies, institutions,
people, physical concepts), six-score evaluation
(credibility / novelty / feasibility / evidence / commercial /
research priority) and seed hypothesis generation. SHA-256
dedup. Auth via X-Intake-Key, sliding-window
per-IP rate limit. Impossible-physics gate forces REJECT
regardless of score.
Free-tier Crossref / arXiv / Semantic Scholar adapters with
per-source dedup keyed on doi → arxiv_id →
lower-title. Public-source-only. Cache + retry built in.
Re-scores the source's evaluation against acquired evidence.
No paid API dependency anywhere.
Five local deterministic agents
(Skeptic / Evidence / Feasibility / Commercial / DecisionMemo)
produce per-claim stance assessments
(supports / contradicts / mixed / unrelated /
insufficient) and a decision memo with six component
scores. Impossible-physics survives Phase 3 too — final
score capped at 15 in that case.
Five local agents (Safety / Simulation / Falsification /
Designer / Milestone) produce a falsification-first plan:
literature_review → falsification → calculation
→ simulation → safety_review → benchmark →
external_lab → bench_test → prototype. Hydrogen,
combustion, high-pressure, cryogenic, radiation, toxic
domains all surface as safety_review with mandatory
external-lab routing. Refuses procedural detail entirely.
Append-only feedback ingestion (literature_review / simulation / benchmark / lab_result / expert_review / correction) with six outcomes (supports / contradicts / inconclusive / unsafe / failed_replication / successful_replication). Per-outcome delta tables, confidence scaling, recommendation rank ladder. Hard-coded floor: expert review can never lift an impossible-physics source above REJECT, even with confidence_score=100.
Group sources into named campaigns. Nine-signal portfolio ranker (final_project_score / evidence_alignment / contradiction_inverse / feasibility / novelty / commercial / safety_inverse / feedback_confidence / evidence_coverage) with configurable weights and a small domain-alignment bonus. Safe local auto-execute only — never auto-promote, never invoke external paid APIs, never auto-run dangerous experiments.
Typed knowledge graph (13 node types × 13 edge types) per source / campaign / global. Lexicon-based theme clustering (hydrogen, combustion, battery, quantum, ctc-impossible, manufacturing, safety, …). Six opportunity types: missing_evidence, contradiction, underexplored_material, cross_domain_transfer, experiment_gap, commercialization_gap. Impossible-physics caps every opportunity priority and suppresses cross-domain transfers entirely.
Five report types (source_dossier / executive_summary /
technical_review / opportunity_brief / campaign_dossier),
two visibilities (private = full detail,
public = redacted: bands instead of raw scores,
no operational thresholds, no procedural detail), three
export formats (Markdown / JSON / HTML). Every report carries
a Reproducibility & audit section with
source_id / evidence_ids / memo_id / plan_id / graph_build_id.
Operational state machine
inbox → reviewing → approved → published
(plus rejected as a terminal failure path).
Append-only notes + decision audit trail.
Four publication package types (bct,
paper, lab, investor).
Internal review surface is auth-only on both reads
and writes — only the curated publication packages are
public. Impossible-physics items can be reviewed and rejected
but never published nor packaged.
One endpoint, five modes:
intake_only, full_private,
full_review, campaign_cycle,
report_only. Composes Phases 1–9 into a
single controlled pipeline with per-step audit trail and
per-job artifact log. Never auto-publishes
— full_review only enqueues a review item;
the operator transitions it manually. Publication packages
are opt-in (allow_publication_package=true) and
even then enforce the impossible-physics + unsafe-feedback
gates from Phase IX. Failures are captured per-step and never
crash the API.
/intake/*
GET /intake/sources/…GET /intake/campaigns/…GET /intake/reports/…GET /intake/publicationsGET /intake/autonomous/jobsGET /intake/graph/…?include_*=true aggregations
Phase I — scientific-intake: phase 1 — text → claims/entities/scoring/hypotheses + auth
Phase II — scientific-intake: phase 2 evidence acquisition adapters
Phase III — scientific-intake: phase 3 multi-agent evidence reasoning
Phase IV — scientific-intake: phase 4 experiment planning and validation gates
Phase V — scientific-intake: phase 5 active learning feedback loop
Phase VI — scientific-intake: phase 6 autonomous research campaigns
Phase VII — scientific-intake: phase 7 knowledge graph and concept discovery
Phase VIII — scientific-intake: phase 8 research dossiers and reports
Phase IX — scientific-intake: phase 9 review workspace and publication packages
Phase X — scientific-intake: phase 10 autonomous research orchestration (commit b43d0d2)
All ten commits authored by NeoB <noreply@sostprotocol.org>.
Repository: materials-engine-private · module:
src/scientific_intake/.
M5 already ships the provider interfaces. M7 will wire real HTTP calls under explicit
flags (--allow-local-model, --allow-free-ai,
--allow-paid) with cache, rate limits and budget logging. No automatic
paid AI usage. No passwords — only API keys via env vars.
A small Python layer mirroring the on-page Gold-DEX position-pricing formula, plus a public-wording guard for the DEX HTML pages and a deal-audit persistence table. It sits beside the existing browser AI Copilot — it is not a replacement for it.