NMR Interpretation Guide
Use this guide when reviewing NMR evidence and deciding whether a SpectraCheck assignment is ready for report export.
Outline
Section titled “Outline”- Upload the raw FID archive and confirm instrument metadata.
- Review 1H regions, integration, multiplicity, and solvent references.
- Review 13C assignments, solvent peaks, and carbonyl regions.
- Use COSY, HSQC, and HMBC evidence to validate connectivity.
- Resolve contradiction flags before approving an interpretation.
Every numbered step needs a product screenshot before publication.
Worked example requirement
Section titled “Worked example requirement”Use one accessible example, such as caffeine or ibuprofen, and run it through the current production workflow. Capture every state a reviewer sees:
- Upload accepted with raw source file and parameter file visible.
- Processed spectrum with regions labeled.
- Peak table with shift, multiplicity, integration, assignment, and confidence.
- Evidence card for at least one assigned peak.
- Contradiction or warning state, even if the example uses a seeded issue.
- Final accepted interpretation ready for export.
2D evidence in plain language
Section titled “2D evidence in plain language”| Experiment | What it helps validate |
|---|---|
| COSY | Which protons are coupled to nearby protons. |
| HSQC | Which proton is attached to which carbon. |
| HMBC | Longer-range proton-carbon connections used to support structure fragments. |
Keep the explanation short enough for an analytical chemist who understands NMR but has not used MolTrace before.
Contradiction review
Section titled “Contradiction review”A contradiction is not a failure; it is a review queue. Show what evidence disagrees, which assignment is affected, and what action the scientist can take:
- Accept with rationale.
- Reassign the peak.
- Mark as impurity, solvent, reference, or unknown.
- Request additional evidence before export.
The NMR scientist owns this page and must verify the example, screenshots, and language before release.
Backend capabilities
Section titled “Backend capabilities”The NMR interpretation backend has shipped a substantial set of analysis capabilities. This section catalogs what is in production today; the release timeline at the end gives chronological context.
Global Spectral Deconvolution (GSD) — opt-in analysis backend
Section titled “Global Spectral Deconvolution (GSD) — opt-in analysis backend”The opt-in POST /spectrum/analyze/gsd endpoint runs industry-standard Global Spectral Deconvolution on a processed spectrum and returns peaks auto-classified as compound | solvent | impurity | artifact | 13C_satellite. It ships behind a per-request experimental: true flag while the soak loop runs, and graduates per-tenant or platform-wide on a measured verdict — see the GSD experimental rollout section in the deployment guide.
- Detection algorithm — single-pass detection via
scipy.signal.find_peaks; per-peak fitting vialmfitLorentzian / pseudo-Voigt; level-aware overlap resolution at levels 4–5; classification using the Fulmer / Gottlieb residual-solvent tables. (v0.4.0, 2026-05-27) - Algorithm semantics + envelope unification —
cluster_into_environmentsgroups adjacent same-category peaks within a nucleus-aware J-coupling window into one chemical-environment entry. Legacy raw-FID surfaces (/nmr/raw-fid/previewand/nmr/raw-fid/process) gainenvironments/environment_count/environment_countsso the FE renders both detectors with one component. A vectorized_pseudo_voigt_sumplus analytical jacobian gives an 8.5× speedup on dense ¹³C (60000006_13c fixture: 5.5 min → 39 s), bit-exact-equivalent. (v0.5.0, 2026-05-27) - Strict promotion gate cleared — on the NMRShiftDB2 corpus the sidecar cleared its strict production promotion gate (95 % solvent auto-detect plus median compound-environment-count delta ≤ 2). The HMDB-style validation framework forward-models a noisy Lorentzian spectrum from a published peak list and gates against environment-count and multiplet-line deltas on a 20-fixture mini-corpus. Default ¹H clustering window widened 20 Hz → 30 Hz to accommodate strong-coupling AB systems and constrained-ring geminal H-H couplings up to 25–30 Hz. (v0.6.0, 2026-05-28)
- Per-peak QC metrics for legacy raw-FID —
LegacyEnrichedPeak.fit_redchi/fit_rmse/fwhm_ppm/signal_to_noise/baseline_noise_sigma— the same regulatory-tier QC quintuple already published by the GSD endpoint viaPeak.metadata. Both/nmr/raw-fid/previewand/nmr/raw-fid/processpopulate the quintuple before returning. (v0.6.1, 2026-05-28) - Real HMDB validation corpus — a 100-fixture real-instrument HMDB corpus (60 × ¹H + 40 × ¹³C; Bruker 59 / Varian 41; solvent mix Water/D₂O 85, CD₃OD 6, CDCl₃ 5, DMSO-d₆ 4). Result: 95/100 parse cleanly; 53/57 = 93 % solvent auto-detect on the subset with a known solvent reference. The literal Prompt 3 spec is satisfied across NMRShiftDB2 (19 fixtures; 100 % solvent), HMDB synthetic (20), and HMDB real-instrument (100; 95 % parseable, 93 % solvent). (v0.6.2, 2026-05-28)
Multiplet analysis and J-coupling refinement
Section titled “Multiplet analysis and J-coupling refinement”The multiplet capability groups GSD-resolved peaks into multiplets, recognises multiplicity (s / d / t / q / p / sext / sept / dd / dt / td / ddd / m), and recovers the underlying J couplings.
- Multiplet detection plus synthetic overlay —
POST /spectrum/analyze/multipletstakes a GSD peak list and returns recognised multiplets with recovered J couplings. The forward modellergenerate_synthetic_multipletis publicly exposed so the FE can overlay predicted-vs-observed peaks (light red) on the spectrum view. Algorithm: spatial cluster at 30 Hz → first-order Pascal-triangle match → dd analytical inversion / dt-td-ddd J-set enumeration withscipy.optimize.least_squaresrefinement → “m” fallback for unstructured clusters. Validation: 8 quinine multiplets resolved with J within 0.3 Hz of literature; a known hidden 11.4 Hz coupling benchmark recovered where standard peak picking misses it. (v0.7.0, 2026-05-28) - Multiplet J-coupling → unified confidence layer — the recovered J-couplings feed the unified candidate-confidence engine as the 40th evidence layer (
multiplet_jcoupling).POST /candidates/compare/jcouplingreturns per-candidate labels (strong | partial | weak | poor_j_agreementplusj_coupling_contradiction) so the FE can render a J-agreement badge per candidate. A contradiction (observed J above a threshold the candidate topology cannot produce) caps the score at 0.25. Purely additive: existing callers unchanged when no multiplet input is supplied. (v0.7.1, 2026-05-28) - Opt-in Karplus 3J refinement — Layer 40’s topological J-predictor gains an opt-in, conformer-averaged Karplus refinement for sp³ vicinal (³J) couplings (RDKit ETKDGv3 plus MMFF). When enabled (
use_karplus=True), the flat 7.0 Hzaliphatic_vicinalplaceholder is replaced by a geometry-aware estimate. Default-off and byte-for-byte identical when the flag is omitted. (v0.7.2, 2026-05-28) - Karplus validation corpus — an 8-molecule curated literature validation corpus (
karplus_jcoupling_corpus_v1.json) and a pytest accuracy gate: mean absolute error 0.44 Hz (median 0.26, max 1.41), with clean separation between conformationally locked diaxial systems (mean 9.5 Hz, all ≥ 8.49 Hz) and mobile/averaged systems (mean 6.9 Hz, all ≤ 7.14 Hz) with no overlap. (v0.7.3, 2026-05-28) - Opt-in Haasnoot–Altona generalized Karplus plus honest negative result — a second selectable relation (
karplus_method=haasnoot_altona). Per individual conformer it is more literature-faithful (recovers trans-decalin diaxial at 11.64 Hz, above the generic 10.26 Hz ceiling), but the corpus study — shipped as a regression gate — shows HLA does not improve averaged discrimination under the unweighted conformer model, openly documented. (v0.7.4, 2026-05-30) - Boltzmann conformer-population weighting (sugar blind-spot fix) — opt-in
karplus_conformer_weightingfield (uniform | boltzmann, defaultuniform) weights each conformer by its MMFF-energy Boltzmann population at 298.15 K. Measured corpus effect: β-D-galactose recovers from 8.49 → ~10.1 Hz onto its literature value; locked-vs-mobile separation widens (generic: +1.35 → +2.28 Hz). Once conformers are population-weighted, the generic relation discriminates better than HLA — the sugar gap was a weighting problem, not an equation one. (v0.7.5, 2026-05-30) - Karplus corpus scaled to 18 molecules — a new 18-molecule v2 corpus (9 locked diaxial plus 9 mobile/averaged, including five new pyranosides) graded across the {generic, haasnoot_altona} × {uniform, boltzmann} grid shows generic/boltzmann is the only one of the four that cleanly separates locked from mobile at scale. Within-tolerance 1.00, mean abs error 0.57 Hz, locked-vs-mobile separation +1.84 Hz. (v0.7.6, 2026-05-31)
Chemical-shift prediction
Section titled “Chemical-shift prediction”- NMRNet wrapper plus HOSE-code fallback —
predict_shifts(smiles, nuclei)returns predicted ¹H / ¹³C shifts (ppm) with per-atom uncertainty. Two backends: the NMRNet SE(3)-equivariant model (Xu et al., Nat. Comput. Sci. 5, 292, 2025) as an optional, lazily-loaded backend (in-process or remote GPU microservice), and a HOSE-code / NMRShiftDB2 topological fallback (spheres 6 → 1) as the default. NMRNet never fabricates a prediction — it activates only when configured. Exposed viaPOST /spectrum/predict/shifts. (v0.7.8, 2026-06-01) - NMRNet wrapper rework: local-first device strategy — reworked from microservice-first to local-first (Apple-Silicon dev): device resolution CUDA → MPS → CPU (CPU baseline, MPS best-effort with a clean CPU fallback), lazy torch, per-atom uncertainty from the conformer ensemble (std across
n_conformers; null at n=1), Zenodo/HF-mirror weights acquisition (cached, SHA-256). HOSE fallback now requires ≥ 3 references per matched sphere. The QM9-NMR gate targets the paper’s QM9NMR MAE (0.020 / 0.262 ppm). NMRNet is never vendored. (v0.7.9, 2026-06-01)
Automated structure verification (ASV)
Section titled “Automated structure verification (ASV)”- Multi-test ASV scorer —
verify_structure(spectrum, proposed_smiles, prior_confidence=0.5, tests=None, options=None)scores how well a proposed structure explains an experimental 1-D NMR spectrum and combines several independent tests into one auditable posterior confidence. Four tests ship:PredictionBoundsTest,AssignmentsTest,HSQC2DRangesTest,MSMoleculeMatchTest, each returning aTestResult(score, significance,quality = score · tanh(significance/3), diagnostic). Bayesian log-odds combination (logit(p_post) = logit(prior) + Σ quality_i · ln10); verdict thresholds 0.80 (consistent) / 0.20 (inconsistent). Tests with no data abstain rather than fabricate evidence; a per-test error degrades to an abstain. Grounded in published ASV / CASE literature (Golotvin & Williams; Elyashberg et al.); no vendor scoring scheme is reproduced. (v0.8.0, 2026-06-03)
Spectrum retrieval — vector plus set similarity
Section titled “Spectrum retrieval — vector plus set similarity”- FAISS HNSW similarity layer —
moltrace.spectroscopy.similarityprovides a Gaussian-smoothed 256-D spectral encoding[v_1H(128); v_13C(128)]with FAISS HNSW L2 retrieval, plus a Kuhn-Munkres set-similarity score (scipy.optimize.linear_sum_assignment; unmatched peaks allowed → robust to insertion/deletion). Performance: top-100 from 45 k in ≈ 2 ms (target was < 1 s). Implements the NMR-Solver methodology (Jin et al., arXiv:2509.00640, 2025) from the published equations. (v0.8.1, 2026-06-03) POST /spectrum/retrieveendpoint — the similarity layer becomes a typed API. The endpoint matches a query spectrum (¹H/¹³C shift lists or a SMILES) against the server-configured FAISS index (MOLTRACE_SIMILARITY_INDEX) and returns the top-k nearest reference spectra by L2 distance. Gracefulindex_available=falsewhen unset; onespectrum.retrieveaudit event per call. (v0.8.2, 2026-06-03)
Release timeline
Section titled “Release timeline”A chronological summary; see each subsection above for substantive detail.
| Version | Date | Headline |
|---|---|---|
| v0.8.2 | 2026-06-03 | POST /spectrum/retrieve endpoint (similarity retrieval contract) |
| v0.8.1 | 2026-06-03 | FAISS HNSW spectrum retrieval (vector + set similarity) |
| v0.8.0 | 2026-06-03 | Multi-test ASV verification scorer |
| v0.7.9 | 2026-06-01 | NMRNet wrapper reworked (local-first, conformer-ensemble uncertainty) |
| v0.7.8 | 2026-06-01 | NMRNet chemical-shift prediction wrapper + HOSE-code fallback |
| v0.7.6 | 2026-05-31 | Karplus validation corpus scaled to 18 molecules |
| v0.7.5 | 2026-05-30 | Boltzmann conformer-population weighting (sugar blind-spot fix) |
| v0.7.4 | 2026-05-30 | Opt-in Haasnoot–Altona Karplus + honest negative result |
| v0.7.3 | 2026-05-28 | Karplus vicinal-³J validation corpus + accuracy gate |
| v0.7.2 | 2026-05-28 | Opt-in Karplus 3J refinement for Layer 40 vicinal couplings |
| v0.7.1 | 2026-05-28 | Multiplet J-coupling → unified-confidence evidence layer |
| v0.7.0 | 2026-05-28 | Multiplet analysis with GSD-enhanced J-coupling |
| v0.6.2 | 2026-05-28 | 100-fixture real-instrument HMDB corpus |
| v0.6.1 | 2026-05-28 | Per-peak QC metrics + legacy parity |
| v0.6.0 | 2026-05-28 | Validation framework + strict promotion gate cleared |
| v0.5.0 | 2026-05-27 | Algorithm semantics + envelope unification |
| v0.4.0 | 2026-05-27 | Prompt 3 GSD backend launch |