feat: add initial software_hashes JSON snapshot (32,305 rows)
First full run of update-software-hashes.mjs completed: - 32,305 tape-image downloads hashed (MD5, CRC32, size, inner path) - Snapshot at data/zxdb/software_hashes.json for DB wipe recovery claude-opus-4-6@MacFiver
This commit is contained in:
258446
data/zxdb/software_hashes.json
Normal file
258446
data/zxdb/software_hashes.json
Normal file
File diff suppressed because it is too large
Load Diff
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
**Branch:** `feature/software-hashes`
|
**Branch:** `feature/software-hashes`
|
||||||
**Started:** 2026-02-17
|
**Started:** 2026-02-17
|
||||||
**Status:** In Progress
|
**Status:** Complete
|
||||||
|
|
||||||
## Plan
|
## Plan
|
||||||
|
|
||||||
@@ -10,19 +10,19 @@ Implements [docs/plans/software-hashes.md](software-hashes.md) — a derived `so
|
|||||||
|
|
||||||
### Tasks
|
### Tasks
|
||||||
|
|
||||||
- [ ] Create `data/zxdb/` directory (for JSON snapshot)
|
- [x] Create `data/zxdb/` directory (for JSON snapshot)
|
||||||
- [ ] Add `software_hashes` Drizzle schema model
|
- [x] Add `software_hashes` Drizzle schema model
|
||||||
- [ ] Create `bin/update-software-hashes.mjs` — main pipeline script
|
- [x] Create `bin/update-software-hashes.mjs` — main pipeline script
|
||||||
- [ ] DB query for tape-image downloads (filetype_id IN 8, 22)
|
- [x] DB query for tape-image downloads (filetype_id IN 8, 22)
|
||||||
- [ ] Resolve local zip path via CDN mapping
|
- [x] Resolve local zip path via CDN mapping (uses CDN_CACHE env var)
|
||||||
- [ ] Extract `_CONTENTS` (skip if exists)
|
- [x] Extract `_CONTENTS` (skip if exists)
|
||||||
- [ ] Find tape file (.tap/.tzx/.pzx/.csw) with priority order
|
- [x] Find tape file (.tap/.tzx/.pzx/.csw) with priority order
|
||||||
- [ ] Compute MD5, CRC32, size_bytes
|
- [x] Compute MD5, CRC32, size_bytes
|
||||||
- [ ] Upsert into software_hashes
|
- [x] Upsert into software_hashes
|
||||||
- [ ] State file for resume support
|
- [x] State file for resume support
|
||||||
- [ ] JSON export after bulk update (atomic write)
|
- [x] JSON export after bulk update (atomic write)
|
||||||
- [ ] Update `bin/import_mysql.sh` to reimport snapshot on DB wipe
|
- [x] Update `bin/import_mysql.sh` to reimport snapshot on DB wipe
|
||||||
- [ ] Add pnpm script entries
|
- [x] Add pnpm script entries
|
||||||
|
|
||||||
## Progress Log
|
## Progress Log
|
||||||
|
|
||||||
@@ -36,18 +36,39 @@ Implements [docs/plans/software-hashes.md](software-hashes.md) — a derived `so
|
|||||||
- data/zxdb/ directory needs creation
|
- data/zxdb/ directory needs creation
|
||||||
- import_mysql.sh needs software_hashes reimport step
|
- import_mysql.sh needs software_hashes reimport step
|
||||||
|
|
||||||
|
### 2026-02-17T16:04Z
|
||||||
|
- Implemented Drizzle schema model for `software_hashes`.
|
||||||
|
- Created `bin/update-software-hashes.mjs` pipeline script.
|
||||||
|
- Updated `bin/import_mysql.sh` with JSON snapshot reimport.
|
||||||
|
- Added `update:hashes` and `export:hashes` pnpm scripts.
|
||||||
|
|
||||||
|
### 2026-02-17T16:09Z
|
||||||
|
- First full run completed successfully:
|
||||||
|
- 33,525 total tape-image downloads in DB
|
||||||
|
- 32,305 rows hashed and inserted into software_hashes
|
||||||
|
- ~1,220 skipped (missing local zips, `/denied/` prefix, `.p` ZX81 files with no tape content)
|
||||||
|
- JSON snapshot exported: 7.2MB, 32,305 rows at `data/zxdb/software_hashes.json`
|
||||||
|
- All plan steps verified working.
|
||||||
|
|
||||||
## Decisions & Notes
|
## Decisions & Notes
|
||||||
|
|
||||||
- Target filetype IDs: 8 and 22 (tape image + bugfix tape image).
|
- Target filetype IDs: 8 and 22 (tape image + bugfix tape image).
|
||||||
- Tape file priority: .tap > .tzx > .pzx > .csw (most common first).
|
- Tape file priority: .tap > .tzx > .pzx > .csw (most common first).
|
||||||
- CDN_CACHE hard-coded to /Volumes/McFiver/CDN (same as sync-downloads).
|
- CDN_CACHE comes from env var (not hard-coded, unlike sync-downloads.mjs).
|
||||||
- JSON snapshot at data/zxdb/software_hashes.json.
|
- JSON snapshot at data/zxdb/software_hashes.json (7.2MB, committed to repo).
|
||||||
- Use Node.js built-in crypto for MD5, crc32 from buffer-based calculation.
|
- Node.js built-in `crypto` for MD5; custom CRC32 lookup table (no external deps).
|
||||||
|
- `inner_path` column added (not in original plan) to record which file inside the zip was hashed.
|
||||||
|
- `/denied/` and `/nvg/` prefix downloads (~443) are logged and skipped (no local mirror).
|
||||||
|
- `.p` files (ZX81 programs) categorized as tape images but contain no .tap/.tzx/.pzx/.csw — logged as "no tape file".
|
||||||
|
- Uses system `unzip` for extraction (handles bracket-heavy filenames via `execFile` not shell).
|
||||||
|
|
||||||
## Blockers
|
## Blockers
|
||||||
|
|
||||||
None currently.
|
None.
|
||||||
|
|
||||||
## Commits
|
## Commits
|
||||||
|
|
||||||
b361201 - Ready to start adding hashes
|
b361201 - Ready to start adding hashes
|
||||||
|
944a2dc - wip: start feature/software-hashes — init progress tracker
|
||||||
|
f5ae89e - feat: add software_hashes table schema and reimport pipeline
|
||||||
|
edc937a - feat: add update-software-hashes.mjs pipeline script
|
||||||
|
|||||||
Reference in New Issue
Block a user