Documentation

Audit & Compliance#

Every bdp pull automatically records a tamper-evident log of what was downloaded, which checksums were verified, and which hooks ran. When a journal asks for a Data Availability Statement or a funder requires an NIH DMS report, you generate it from that log — no manual tracking required.

What Gets Logged#

BDP records 12 event types automatically:

EventWhen it fires
init_start / init_success / init_failurebdp init
source_add / source_removebdp source add/remove
download_start / download_success / download_failureEach file download
verify_checksumSHA-256 verification after each download
post_pull_hookEach hook script execution
config_changeConfiguration modifications
cache_operationCache clean/reset

Each event records: timestamp (UTC), source spec, a JSON payload specific to the event type, and the machine ID of the machine that ran it.

Where It's Stored#

The audit log lives at .bdp/bdp.db — a SQLite database inside your project directory. It is created by bdp init alongside .bdp/data/ and covered by the same .gitignore entry (# BDP cache and runtime files / .bdp/).

The database has four tables:

  • audit_events — the main event log with hash chain
  • files — tracks each downloaded source file (path, SHA-256, size, timestamp)
  • generated_files — tracks outputs produced by post-pull hooks
  • audit_snapshots — records of previous export operations

Viewing the Audit Log#

bash
# Show 20 most recent events
bdp audit list
# Show more
bdp audit list --limit 50
# Filter to a specific source
bdp audit list --source uniprot:P01308-fasta@1.0

Example output:

bash
→ Showing 3 most recent events:
#12 download_success 2026-03-29 14:32:01 UTC
Source: uniprot:P01308-fasta@1.0
size_bytes: 4096 checksum: sha256:a3f1...
#11 verify_checksum 2026-03-29 14:32:01 UTC
Source: uniprot:P01308-fasta@1.0
verified: true
#10 download_start 2026-03-29 14:31:58 UTC
Source: uniprot:P01308-fasta@1.0
url: https://example.com/data.fasta

Verifying Integrity#

Each event is linked to the previous one by a SHA-256 hash. The hash is computed from id|timestamp|event_type|source_spec|previous_hash — modifying any event breaks every subsequent link in the chain.

bash
bdp audit verify
bash
→ Verifying audit trail integrity...
✓ Audit trail verified successfully
→ Hash chain is intact
→ No tampering detected

Run this before generating any compliance export to confirm the trail is intact.

Machine ID#

Every event is stamped with a machine ID: a stable, privacy-first identifier in the format {hostname}-{random-suffix} (e.g., mycomputer-a1b2c3d4). It is generated on first use and stored in .bdp/machine-id. No MAC address, no username, no personal information.

Exporting Compliance Reports#

bash
bdp audit export --format <FORMAT> [OPTIONS]
FormatOutputUse case
fdaJSONFDA 21 CFR Part 11 electronic records
nihMarkdownNIH Data Management & Sharing policy
emaYAMLEMA ALCOA++ data integrity
dasMarkdownData Availability Statement for publications
jsonJSONRaw event array, for custom tooling

All options:

bash
bdp audit export \
--format fda \
--output audit-fda.json \
--from 2026-01-01T00:00:00Z \
--to 2026-03-29T23:59:59Z \
--project-name "insulin-variant-analysis" \
--project-version "1.0.0"

--output defaults to a timestamped filename (e.g., audit-fda-20260329.json) if omitted.

FDA 21 CFR Part 11#

JSON report documenting electronic records with chain-of-custody verification:

json
{
"audit_report": {
"standard": "FDA 21 CFR Part 11",
"generated_at": "2026-03-29T14:00:00Z",
"project": { "name": "insulin-variant-analysis", "version": "1.0.0" },
"machine": { "machine_id": "mycomputer-a1b2c3d4" },
"period": { "start": "2026-01-01T00:00:00Z", "end": "2026-03-29T23:59:59Z" },
"event_count": 12,
"events": [ ... ],
"verification": {
"chain_verified": true,
"no_gaps_in_sequence": true,
"all_timestamps_valid": true
},
"disclaimer": "Local records, editable, for documentation purposes"
}
}

NIH Data Management & Sharing#

Markdown report covering NIH DMS policy requirements: data sources with checksums, reproducibility statement, and provenance.

bash
bdp audit export --format nih \
--project-name "insulin-variant-analysis" \
--output nih-dms-report.md

EMA ALCOA++#

YAML report mapping the audit trail against the 10 ALCOA++ data integrity principles: Attributable, Legible, Contemporaneous, Original, Accurate, Complete, Consistent, Enduring, Available, Traceable.

bash
bdp audit export --format ema --output ema-alcoa.yaml

Data Availability Statement#

Publication-ready text for the data availability section of a paper:

bash
bdp audit export --format das --output data-availability.md

Output is a Markdown document you can paste directly into a manuscript. It lists every source used, the version pinned in bdp.lock, the SHA-256 checksum, and the download date — everything reviewers and editors expect.

Typical Publication Workflow#

bash
# 1. Pull data (audit events logged automatically)
bdp pull
# 2. Run your analysis
python scripts/analysis.py
# 3. Verify the audit trail before submission
bdp audit verify
# 4. Generate reports
bdp audit export --format das --output data-availability.md
bdp audit export --format fda --output supplementary-audit.json
# 5. Include in submission
# - data-availability.md → paste into manuscript
# - supplementary-audit.json → supplementary material

Database Schema#

.bdp/bdp.db (SQLite)
-- Main event log with tamper-evident hash chain
CREATE TABLE audit_events (
  id             INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp      DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  event_type     TEXT NOT NULL,
  source_spec    TEXT,           -- e.g. "uniprot:P01308-fasta@1.0"
  details        TEXT NOT NULL,  -- JSON payload (varies by event type)
  machine_id     TEXT NOT NULL,
  event_hash     TEXT,           -- SHA-256 of this event
  previous_hash  TEXT,           -- SHA-256 of previous event (chain)
  notes          TEXT,
  archived       BOOLEAN DEFAULT 0
);

-- Downloaded source file registry
CREATE TABLE files (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  source_spec       TEXT NOT NULL UNIQUE,
  file_path         TEXT NOT NULL,
  sha256            TEXT NOT NULL,
  size_bytes        INTEGER NOT NULL,
  downloaded_at     DATETIME,
  download_event_id INTEGER,     -- FK → audit_events.id
  last_verified_at  DATETIME,
  verification_status TEXT
);

-- Hook-generated artifacts
CREATE TABLE generated_files (
  id                  INTEGER PRIMARY KEY AUTOINCREMENT,
  source_file_id      INTEGER NOT NULL,  -- FK → files.id
  file_path           TEXT NOT NULL,
  tool                TEXT NOT NULL,
  sha256              TEXT,
  size_bytes          INTEGER,
  generated_at        DATETIME,
  generation_event_id INTEGER            -- FK → audit_events.id
);

-- Export history
CREATE TABLE audit_snapshots (
  id              INTEGER PRIMARY KEY AUTOINCREMENT,
  snapshot_id     TEXT NOT NULL UNIQUE,
  created_at      DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  export_format   TEXT NOT NULL,
  event_id_start  INTEGER,
  event_id_end    INTEGER,
  event_count     INTEGER NOT NULL,
  chain_verified  BOOLEAN,
  output_path     TEXT
);

Limitations#

The audit trail is editable. .bdp/bdp.db is a local SQLite file. Anyone with filesystem access can modify it directly. The hash chain detects modifications — bdp audit verify will flag a broken chain — but it is not a substitute for a write-once, server-controlled record.

This is intentional: the audit trail is designed for research documentation and compliance reporting, not legal evidence or forensic investigation. It answers "what data did this analysis use, when, and was it intact?" — which we believe covers the most common needs, but compliance requirements vary enormously across institutions, funders, and regulatory contexts.

For regulated environments requiring immutable records, contact your institution's data governance team about supplementing the BDP audit trail with server-side controls.

Help Us Get This Right#

We don't have full visibility into how audit trails are actually used day-to-day in wet labs, core facilities, or clinical research environments. If the current tooling doesn't match your workflow — wrong export format, missing fields for your ethics board, report structure that doesn't fit your journal's requirements — we want to know.

Describe your workflow, what the report needs to contain, and which body you're reporting to — we'd rather build something that actually works for your lab than guess.