Ready to start adding hashes

2026-02-17 15:53:42 +00:00
parent b158bfc4a0
commit b361201cf2
2 changed files with 155 additions and 36 deletions
--- a/36
+++ b/36
@@ -1,36 +0,0 @@
 Improve ZXDB downloads with local mirroring and inline preview
 This commit implements a comprehensive local file mirror system for
 ZXDB and WoS downloads, allowing users to access local archives
 directly through the explorer UI.
 Key Changes:
 Local File Mirroring & Proxy:
 - Added `ZXDB_LOCAL_FILEPATH` and `WOS_LOCAL_FILEPATH` to `src/env.ts`
  and `example.env` for opt-in local mirroring.
 - Implemented `resolveLocalLink` in `src/server/repo/zxdb.ts` to map
  database `file_link` paths to local filesystem paths based on
  configurable prefixes.
 - Created `src/app/api/zxdb/download/route.ts` to safely proxy local
  files, preventing path traversal and serving with appropriate
  `Content-Type` and `Content-Disposition`.
 - Updated `docs/ZXDB.md` with setup instructions and resolution logic.
 UI Enhancements & Grouping:
 - Grouped downloads and scraps by type (e.g., Inlay, Game manual, Tape
  image) in `EntryDetail.tsx` and `ReleaseDetail.tsx` for better
  organization.
 - Introduced `FileViewer.tsx` component to provide inline previews
  for supported formats (.txt, .nfo, .png, .jpg, .gif, .pdf).
 - Added a "Preview" button for local mirrors of supported file types.
 - Optimized download tables with badge-style links for local/remote
  sources.
 Guideline Updates:
 - Updated `AGENTS.md` to clarify commit message handling: edit or
  append to `COMMIT_EDITMSG` instead of overwriting.
 - Re-emphasized testing rules: use `tsc --noEmit`, do not restart
  dev-server, and avoid `pnpm build` during development.
 Signed-off-by: junie@lucy.xalior.com
--- a/docs/plans/software-hashes.md
+++ b/docs/plans/software-hashes.md
@@ -0,0 +1,155 @@
 # Software Hashes Plan
 Plan for adding a derived `software_hashes` table, its update pipeline, and JSON snapshot lifecycle to survive DB wipes.
 ---
 ## 1) Goals and Scope (Plan Step 1)
 - Create and maintain `software_hashes` for (at this stage) tape-image downloads.
 - Preserve existing `_CONTENTS` folders; only create missing ones.
 - Export `software_hashes` to JSON after each bulk update.
 - Reimport `software_hashes` JSON during DB wipe in `bin/import_mysql.sh` (or a helper script it invokes).
 - Ensure all scripts are idempotent and resume-safe.
 ---
 ## 2) Confirm Pipeline Touchpoints (Plan Step 2)
 - Verify `bin/import_mysql.sh` is the authoritative DB wipe/import entry point.
 - Confirm `bin/sync-downloads.mjs` remains responsible only for CDN cache sync.
 - Confirm `src/server/schema/zxdb.ts` uses `downloads.id` as the natural FK target.
 ---
 ## 3) Define Data Model: `software_hashes` (Plan Step 3)
 ### Table naming and FK alignment
 - Table: `software_hashes`.
 - FK: `download_id` → `downloads.id`.
 - Column names follow existing DB `snake_case` conventions.
 ### Planned columns
 - `download_id` (PK or unique index; FK to `downloads.id`)
 - `md5`
 - `crc32`
 - `size_bytes`
 - `updated_at`
 ### Planned indexes / constraints
 - Unique index on `download_id`.
 - Index on `md5` for reverse lookup.
 - Index on `crc32` for reverse lookup.
 ---
 ## 4) Define JSON Snapshot Format (Plan Step 4)
 ### Location
 - Default: `data/zxdb/software_hashes.json` (or another agreed path).
 ### Structure
 ```json
 {
  "exportedAt": "2026-02-17T15:18:00.000Z",
  "rows": [
    {
      "download_id": 123,
      "md5": "...",
      "crc32": "...",
      "size_bytes": 12345,
      "updated_at": "2026-02-17T15:18:00.000Z"
    }
  ]
 }
 ```
 ### Planned import policy
 - If snapshot exists: truncate `software_hashes` and bulk insert.
 - If snapshot missing: log and continue without error.
 ---
 ## 5) Implement Tape Image Update Workflow (Plan Step 5)
 ### Planned script
 - `bin/update-software-hashes.mjs` (name can be adjusted).
 ### Planned input dataset
 - Query `downloads` for tape-image rows (filter by `filetype_id` or joined `filetypes` table).
 ### Planned per-item process
 1. Resolve local zip path using the same CDN mapping used by `sync-downloads`.
 2. Compute `_CONTENTS` folder name: `<zip filename>_CONTENTS` (exact match).
 3. If `_CONTENTS` exists, keep it untouched.
 4. If missing, extract zip into `_CONTENTS` using a library that avoids shell expansion issues with brackets.
 5. Locate tape file inside (`.tap`, `.tzx`, `.pzx`, `.csw`):
   - Apply a deterministic priority order.
   - If multiple candidates remain, log and skip (or record ambiguity).
 6. Compute `md5`, `crc32`, and `size_bytes` for the selected file.
 7. Upsert into `software_hashes` keyed by `download_id`.
 ### Planned error handling
 - Log missing zips or missing tape files.
 - Continue after recoverable errors; fail only on critical DB errors.
 ---
 ## 6) Implement JSON Export Lifecycle (Plan Step 6)
 - After each bulk update, export `software_hashes` to JSON.
 - Write atomically (temp file + rename).
 - Include `exportedAt` timestamp in snapshot.
 ---
 ## 7) Reimport During Wipe (`bin/import_mysql.sh`) (Plan Step 7)
 ### Planned placement
 - Immediately after database creation and ZXDB SQL import completes.
 ### Planned behavior
 - Attempt to read JSON snapshot.
 - If present, truncate and reinsert `software_hashes`.
 - Log imported row count.
 ---
 ## 8) Add Idempotency and Resume Support (Plan Step 8)
 - State file similar to `.sync-downloads.state.json` to track last `download_id` processed.
 - CLI flags:
  - `--resume` (default)
  - `--start-from-id`
  - `--rebuild-all`
 - Reprocess when zip file size or mtime changes.
 ---
 ## 9) Validation Checklist (Plan Step 9)
 - `_CONTENTS` folders are never deleted.
 - Hashes match expected MD5/CRC32 for known samples.
 - JSON snapshot is created and reimported correctly.
 - Reverse lookup by `md5`/`crc32`/`size_bytes` identifies misnamed files.
 - Script can resume safely after interruption.
 ---
 ## 10) Open Questions / Confirmations (Plan Step 10)
 - Final `software_hashes` column list and types.
 - Exact JSON snapshot path.
 - Filetype IDs that map to “Tape Image” in `downloads`.