Audit & Compliance#
Every bdp pull automatically records a tamper-evident log of what was downloaded, which checksums were verified, and which hooks ran. When a journal asks for a Data Availability Statement or a funder requires an NIH DMS report, you generate it from that log — no manual tracking required.
What Gets Logged#
BDP records 12 event types automatically:
| Event | When it fires |
|---|---|
init_start / init_success / init_failure | bdp init |
source_add / source_remove | bdp source add/remove |
download_start / download_success / download_failure | Each file download |
verify_checksum | SHA-256 verification after each download |
post_pull_hook | Each hook script execution |
config_change | Configuration modifications |
cache_operation | Cache clean/reset |
Each event records: timestamp (UTC), source spec, a JSON payload specific to the event type, and the machine ID of the machine that ran it.
Where It's Stored#
The audit log lives at .bdp/bdp.db — a SQLite database inside your project directory. It is created by bdp init alongside .bdp/data/ and covered by the same .gitignore entry (# BDP cache and runtime files / .bdp/).
The database has four tables:
audit_events— the main event log with hash chainfiles— tracks each downloaded source file (path, SHA-256, size, timestamp)generated_files— tracks outputs produced by post-pull hooksaudit_snapshots— records of previous export operations
Viewing the Audit Log#
# Show 20 most recent eventsbdp audit list # Show morebdp audit list --limit 50 # Filter to a specific sourcebdp audit list --source uniprot:P01308-fasta@1.0Example output:
→ Showing 3 most recent events: #12 download_success 2026-03-29 14:32:01 UTCSource: uniprot:P01308-fasta@1.0size_bytes: 4096 checksum: sha256:a3f1... #11 verify_checksum 2026-03-29 14:32:01 UTCSource: uniprot:P01308-fasta@1.0verified: true #10 download_start 2026-03-29 14:31:58 UTCSource: uniprot:P01308-fasta@1.0url: https://example.com/data.fastaVerifying Integrity#
Each event is linked to the previous one by a SHA-256 hash. The hash is computed from id|timestamp|event_type|source_spec|previous_hash — modifying any event breaks every subsequent link in the chain.
bdp audit verify→ Verifying audit trail integrity...✓ Audit trail verified successfully→ Hash chain is intact→ No tampering detectedRun this before generating any compliance export to confirm the trail is intact.
Machine ID#
Every event is stamped with a machine ID: a stable, privacy-first identifier in the format {hostname}-{random-suffix} (e.g., mycomputer-a1b2c3d4). It is generated on first use and stored in .bdp/machine-id. No MAC address, no username, no personal information.
Exporting Compliance Reports#
bdp audit export --format <FORMAT> [OPTIONS]| Format | Output | Use case |
|---|---|---|
fda | JSON | FDA 21 CFR Part 11 electronic records |
nih | Markdown | NIH Data Management & Sharing policy |
ema | YAML | EMA ALCOA++ data integrity |
das | Markdown | Data Availability Statement for publications |
json | JSON | Raw event array, for custom tooling |
All options:
bdp audit export \ --format fda \ --output audit-fda.json \ --from 2026-01-01T00:00:00Z \ --to 2026-03-29T23:59:59Z \ --project-name "insulin-variant-analysis" \ --project-version "1.0.0"--output defaults to a timestamped filename (e.g., audit-fda-20260329.json) if omitted.
FDA 21 CFR Part 11#
JSON report documenting electronic records with chain-of-custody verification:
{ "audit_report": { "standard": "FDA 21 CFR Part 11", "generated_at": "2026-03-29T14:00:00Z", "project": { "name": "insulin-variant-analysis", "version": "1.0.0" }, "machine": { "machine_id": "mycomputer-a1b2c3d4" }, "period": { "start": "2026-01-01T00:00:00Z", "end": "2026-03-29T23:59:59Z" }, "event_count": 12, "events": [ ... ], "verification": { "chain_verified": true, "no_gaps_in_sequence": true, "all_timestamps_valid": true }, "disclaimer": "Local records, editable, for documentation purposes" }}NIH Data Management & Sharing#
Markdown report covering NIH DMS policy requirements: data sources with checksums, reproducibility statement, and provenance.
bdp audit export --format nih \ --project-name "insulin-variant-analysis" \ --output nih-dms-report.mdEMA ALCOA++#
YAML report mapping the audit trail against the 10 ALCOA++ data integrity principles: Attributable, Legible, Contemporaneous, Original, Accurate, Complete, Consistent, Enduring, Available, Traceable.
bdp audit export --format ema --output ema-alcoa.yamlData Availability Statement#
Publication-ready text for the data availability section of a paper:
bdp audit export --format das --output data-availability.mdOutput is a Markdown document you can paste directly into a manuscript. It lists every source used, the version pinned in bdp.lock, the SHA-256 checksum, and the download date — everything reviewers and editors expect.
Typical Publication Workflow#
# 1. Pull data (audit events logged automatically)bdp pull # 2. Run your analysispython scripts/analysis.py # 3. Verify the audit trail before submissionbdp audit verify # 4. Generate reportsbdp audit export --format das --output data-availability.mdbdp audit export --format fda --output supplementary-audit.json # 5. Include in submission# - data-availability.md → paste into manuscript# - supplementary-audit.json → supplementary materialDatabase Schema#
-- Main event log with tamper-evident hash chain CREATE TABLE audit_events ( id INTEGER PRIMARY KEY AUTOINCREMENT, timestamp DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, event_type TEXT NOT NULL, source_spec TEXT, -- e.g. "uniprot:P01308-fasta@1.0" details TEXT NOT NULL, -- JSON payload (varies by event type) machine_id TEXT NOT NULL, event_hash TEXT, -- SHA-256 of this event previous_hash TEXT, -- SHA-256 of previous event (chain) notes TEXT, archived BOOLEAN DEFAULT 0 ); -- Downloaded source file registry CREATE TABLE files ( id INTEGER PRIMARY KEY AUTOINCREMENT, source_spec TEXT NOT NULL UNIQUE, file_path TEXT NOT NULL, sha256 TEXT NOT NULL, size_bytes INTEGER NOT NULL, downloaded_at DATETIME, download_event_id INTEGER, -- FK → audit_events.id last_verified_at DATETIME, verification_status TEXT ); -- Hook-generated artifacts CREATE TABLE generated_files ( id INTEGER PRIMARY KEY AUTOINCREMENT, source_file_id INTEGER NOT NULL, -- FK → files.id file_path TEXT NOT NULL, tool TEXT NOT NULL, sha256 TEXT, size_bytes INTEGER, generated_at DATETIME, generation_event_id INTEGER -- FK → audit_events.id ); -- Export history CREATE TABLE audit_snapshots ( id INTEGER PRIMARY KEY AUTOINCREMENT, snapshot_id TEXT NOT NULL UNIQUE, created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, export_format TEXT NOT NULL, event_id_start INTEGER, event_id_end INTEGER, event_count INTEGER NOT NULL, chain_verified BOOLEAN, output_path TEXT );
Limitations#
The audit trail is editable. .bdp/bdp.db is a local SQLite file. Anyone with filesystem access can modify it directly. The hash chain detects modifications — bdp audit verify will flag a broken chain — but it is not a substitute for a write-once, server-controlled record.
This is intentional: the audit trail is designed for research documentation and compliance reporting, not legal evidence or forensic investigation. It answers "what data did this analysis use, when, and was it intact?" — which we believe covers the most common needs, but compliance requirements vary enormously across institutions, funders, and regulatory contexts.
For regulated environments requiring immutable records, contact your institution's data governance team about supplementing the BDP audit trail with server-side controls.
Help Us Get This Right#
We don't have full visibility into how audit trails are actually used day-to-day in wet labs, core facilities, or clinical research environments. If the current tooling doesn't match your workflow — wrong export format, missing fields for your ethics board, report structure that doesn't fit your journal's requirements — we want to know.
Describe your workflow, what the report needs to contain, and which body you're reporting to — we'd rather build something that actually works for your lab than guess.
- Open an issue on Codeberg — public, searchable, good for feature requests
- Chat on Matrix — faster back-and-forth if you want to talk through requirements
- Email directly — if you'd prefer a private conversation