Two-layer database
Normalized domain
Isolate, taxonomy FKs, assays — system of record
Legacy staging
Read-only snapshots until each record is fully migrated
Import & passes fill normalized fields; snapshots stay for audit
Strategy goals
- New normalized schema is the system of record
- Every legacy isolate keeps a read-only snapshot until complete
- Automated passes map obvious fields first
- Dictionaries clean messy legacy strings over time
- Per-field flags track auto, manual, and pending state
- Dual-pane UI compares new values with legacy
- Staging tables can be retired when migration is 100%
Automation passes
Pass 0 — Import
Load legacy SQLite into staging + isolate shells
Pass 1 — Direct map
Codenames, dates, flags, comments
Pass 2 — Taxonomy
Genus/species dictionaries → taxonomy FKs
Pass 3 — Locations
Country/region dictionaries → canonical isolation text
Pass 4 — Parse
Storage strings (gl/bd/NG), biolog links
Pass 5 — Dictionary
Classify unique legacy values, re-run mappers
Pass 6 — Review
Dual-pane curation; mark fields mapped; retire staging when complete
Migration flags
Record status
imported Snapshot loaded; normalize not run
partial Some fields mapped; review needed
reviewed Human confirmed record
complete All tracked fields done; snapshot archival
Field status (field_status JSON)
pending Legacy value shown; new field empty
auto Auto-mapped; may need review
mapped Accepted in new structure
manual Set by curator
skipped Intentionally not migrated
unmappable No target field yet
Feature roadmap
Suggested next phases
P2 — SequenceRecord + 16S import + strain panel
P3 — Dated BiologAssay model
P4 — StorageSlot + pass 4 storage parser
P5 — Generic assays (manual CRUD)
P6 — AST/PCR when Excel fields confirmed
Product v0.1.21 · Web v0.1.21 · API v— · Django — · DRF —