# XJobs Phase 1 Audit — Working Document

**Started:** Saturday May 2, 2026 (Day 13 afternoon)
**Status:** IN PROGRESS — Layer 2 (Frontend) UI containers complete
**Pilot target:** May 26, 2026 — 24 days remaining
**Repo HEAD:** f92df72 (DOC parser fix, deployed Day 13 morning)

---

## How to use this file

This is a live working document for the Phase 1 audit. It will grow as we inventory more layers. Each session appends, never deletes (except to mark items resolved).

Reading order for context:
1. XJobs-Local-Baseline-Reset.docx (the strategic plan that birthed this audit)
2. XJobs-Anatomy-and-Lean-Down-Plan.docx (the destination map — 8 frontend modules)
3. XJobs-Refactoring-Plan-v2.docx (the audit + execution methodology)
4. This file (the actual audit output)

---

## Conceptual map — the 5 layers

This audit organizes findings by architectural layer, mapped against the system architecture diagram (Day 12 snapshot).

| Layer | What | Files | Status |
|---|---|---|---|
| 1 | User (Browser/iOS/Android) | n/a | Not audited (entry point only) |
| 2 | Frontend (Presentation) | app.html, reset-password.html, admin.html | IN PROGRESS |
| 3 | Server (Business Logic) | server/auth.js, api.js, email.js, index.js | TODO |
| 4 | Data Service (Postgres) | tables, schema, queries | TODO |
| 5 | External Services (Vendors) | Anthropic, ElevenLabs, Google, Stripe, Resend, Cloudflare, Railway | TODO |

---

## Classification framework

Within each layer, every "thing" gets exactly one tag:

| Tag | Meaning |
|---|---|
| ALIVE | Currently used, currently working, canonical |
| ZOMBIE | Old version of an alive thing — duplicate that should be deleted |
| DEAD | Referenced by nothing, fires for nothing |
| UNCLEAR | Cannot tell yet — needs investigation |

Every ALIVE thing also gets tagged with its destination Anatomy module.

---

## LAYER 2 — Frontend Inventory

### Section groups (23 logical groups, ~240 UI containers)

#### Group 1 — Header / Navigation (lines 690-711)
Top of every page — user identity + quotas display.

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 690 | navGuest | Header version when logged out | auth-flow.js | UNCLEAR |
| 695 | navUser | Header version when logged in | user-profile.js | UNCLEAR |
| 697 | navUserAvatar + profileDropdown | Avatar + dropdown | user-profile.js | UNCLEAR |
| 699-700 | profileName, profileEmail | User info | user-profile.js | UNCLEAR |
| 704-705 | profileSub, profileSubExpiry | Subscription tier display | user-profile.js | UNCLEAR |
| 706 | profileQuotas | Quota panel | user-profile.js | UNCLEAR |
| 708-711 | quotaResumes/barResumes, quotaCustomize/barCustomize, quotaInterviews/barInterviews, quotaAts/barAts | 4 quota meters with progress bars | user-profile.js | UNCLEAR (see F1) |

#### Group 2 — Onboarding Path 1 (lines 723-879)
Splash, upload zone, AVA chat, multi-step create form.

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 723 | onboarding | Container for first-time visitor flow | app-bootstrap.js | UNCLEAR |
| 725 | audioUnlockOverlay | "Click anywhere to enter" splash | app-bootstrap.js | UNCLEAR |
| 760 | pathUpload | "I have a resume" choice card | resume-upload.js | UNCLEAR |
| 766 | pathCreate | "I want to build a resume" choice card | NEW MODULE? | UNCLEAR |
| 775-777 | uploadZone, uploadDropzone | Drag-drop zone | resume-upload.js | UNCLEAR |
| 788-820 | createFormWrap, createForm, step1-step4, etc. | Multi-step build form | NEW MODULE? | UNCLEAR (see F2, F7) |
| 814-815 | cSkillInput, skillTags, aiSkillsBlock, aiSkillButtons | Skills entry | NEW MODULE? | UNCLEAR |
| 820-826 | aiSummaryBox, aiSummaryText | AI-generated summary | NEW MODULE? | UNCLEAR |
| 829-841 | avaChatWrap, avaChat, avaChatMessages, avaInputArea, avaResumePanel | AVA voice/chat builder | voice.js | UNCLEAR (see F3) |
| 879 | sessionBanner | Session message banner | app-bootstrap.js | UNCLEAR |

#### Group 3 — Auth Modal (lines 891-917)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 891 | authModal | Login/register modal | auth-flow.js | UNCLEAR |
| 904 | registerFields | Sign-up form | auth-flow.js | UNCLEAR |
| 917 | loginFields | Sign-in form | auth-flow.js | UNCLEAR |

#### Group 4 — Modals: Processing/Payment/Builder (lines 931-1056)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 931 | processing | Generic loading screen | app-bootstrap.js | UNCLEAR |
| 936 | paymentModal | Payment / paywall modal | app-bootstrap.js (cross-cutting) | UNCLEAR |
| 979 | paymentAmount | Amount display | app-bootstrap.js | UNCLEAR |
| 990 | aiResumeBuilder | AI Resume Builder modal | NEW MODULE? | UNCLEAR (see F2, F7) |
| 1025-1027 | completenessProgress, completenessPercent | Builder progress | NEW MODULE? | UNCLEAR |
| 1039 | aiAssistantMessage | Builder assistant message | NEW MODULE? | UNCLEAR |
| 1048 | builderQuestionArea | Question display | NEW MODULE? | UNCLEAR |
| 1055 | resumePreviewContent | Live preview | NEW MODULE? | UNCLEAR |

#### Group 5 — Dashboard (lines 1064-1431)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 1064 | dashboard | Whole dashboard container | dashboard.js | UNCLEAR |
| 1073 | dashUsageBar | Top usage bar (DUPLICATE of header quotas?) | dashboard.js | ZOMBIE? (see F1) |
| 1076 | dashAvatar | Dashboard avatar | dashboard.js | ZOMBIE? (see F1) |
| 1078-1079 | dashUserName, dashUserEmail | Dashboard user info | dashboard.js | ZOMBIE? (see F1) |
| 1084-1087 | dashQResumes, dashQCustomize, dashQInterviews, dashQAts | Quota counters (duplicate of header) | dashboard.js | ZOMBIE? (see F1) |
| 1102-1117 | atsScore, atsStatusLabel, atsStatusText, atsChecks, atsRecommendations, atsOptimizeButton | ATS score widget | dashboard.js | UNCLEAR |
| 1136 | rerunNotice | Rerun analysis notice | dashboard.js | UNCLEAR |
| 1205-1207 | skillsPanel, profileSkills, skillMatchSummary | Skills panel | dashboard.js | UNCLEAR |
| 1218-1219 | profileToggle, profileExpanded, profileSummary, profileWorkHistory, workHistoryList | Profile expand/collapse | user-profile.js | UNCLEAR |
| 1229 | searchCriteriaBar | Search filter chips | matching-engine.js | UNCLEAR |
| 1247-1254 | btnEditSearchProfile, searchProfilePanel | Search profile editor | matching-engine.js | UNCLEAR |
| 1359-1364 | btnToggleWeightSliders, inlineWeightPanel | Matching weight sliders | matching-engine.js | UNCLEAR |
| 1408-1422 | above80Header, above80Jobs, below80Header, below80Jobs | Job match groupings | dashboard.js | UNCLEAR |
| 1431 | applicationTracker | Resume tracker | dashboard.js | UNCLEAR |

#### Group 6 — Customization Modal (lines 1438-1451)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 1438 | customizationModal | Resume customization | dashboard.js | UNCLEAR |
| 1450-1451 | customizationGaps, customizationActions | Gap analysis + actions | dashboard.js | UNCLEAR |

#### Group 7 — Interview Modal (lines 1458-1586)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 1458 | interviewModal | Container | NEW MODULE (interview.js) | UNCLEAR |
| 1470-1493 | icPayment, icBundleCards, icMain | Payment + bundle picker + main UI | NEW MODULE | UNCLEAR |
| 1530-1540 | icTopic, icQuestions | Topic-based questions | NEW MODULE | UNCLEAR |
| 1545-1573 | icScoreScreen, icScoreCircle, icScoreNumber, icScoreVerdict, icScoreFeedback, icGapScores, icQARecap, icQARecapContent, icPracticeAgain, icUnlockBlock, icUnlockMsg | Post-interview score | NEW MODULE | UNCLEAR |
| 1583-1586 | icCheatsheet, icSheet | Cheatsheet output | NEW MODULE | UNCLEAR |

#### Group 8 — Voice Indicator (line 1595)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 1595 | voiceIndicator | "AI Agent Speaking..." floating | voice.js | UNCLEAR |

#### Group 9 — Email Viewer Modal (lines 1597-1612)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 1597 | emailViewerModal | Email content display | gmail-flow.js | UNCLEAR |
| 1604-1612 | emailViewerFrom, emailViewerDate, emailViewerBody | Email metadata + body | gmail-flow.js | UNCLEAR |

#### Group 10 — Gmail Settings Modal (lines 1621-1749)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 1621 | gmailSettingsModal | Container | gmail-flow.js | UNCLEAR |
| 1626 | gmailStep1 | Step 1 | gmail-flow.js | UNCLEAR |
| 1644 | gmailStep2 | Step 2 | gmail-flow.js | UNCLEAR |
| 1674-1678 | gmailStep3, boardGrid | Step 3 with board grid | gmail-flow.js | UNCLEAR |
| 1724 | gmailStep4 | Step 4 | gmail-flow.js | UNCLEAR |
| 1736 | gmailStep6 | Step 6 (out of order!) | gmail-flow.js | UNCLEAR (see F4) |
| 1747-1749 | gmailStep5, scanResults | Step 5 + scan results | gmail-flow.js | UNCLEAR (see F4) |

#### Group 11 — Template-based Resume Builder Path 2 (lines 3439-4805)
LIKELY ZOMBIE OF Group 2/Group 4 — competing resume-builder paradigm.

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 3439 | resumeOnboarding | Resume onboarding (separate from line 723) | NEW MODULE? | ZOMBIE? (see F6) |
| 3526 | templateModal | Template picker | NEW MODULE? | ZOMBIE? (see F7) |
| 3679-3695 | contentTemplateScreen, roleAutocomplete, rolesList | Role-based templates | NEW MODULE? | ZOMBIE? (see F7) |
| 3799 | subcategoriesGrid | Subcategory grid | NEW MODULE? | ZOMBIE? (see F7) |
| 3880 | formatTemplateScreen | Format/style picker | NEW MODULE? | ZOMBIE? (see F7) |
| 4286 | templatePreview | Template preview | NEW MODULE? | ZOMBIE? (see F7) |
| 4334 | jobEditModal | Job entry editing | NEW MODULE? | ZOMBIE? (see F7) |
| 4593 | skillsModal | Skills picker (different from cSkillInput) | NEW MODULE? | ZOMBIE? (see F7) |
| 4667 | templatesModal | Templates modal | NEW MODULE? | ZOMBIE? (see F5) |
| 4735 | contactModal | Contact info modal | NEW MODULE? | ZOMBIE? (see F7) |
| 4805 | templatesModal | DUPLICATE ID — invalid HTML | (none) | ZOMBIE CONFIRMED (see F5) |

#### Group 12 — Building Progress (lines 5097-5916)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 5097 | experience20YearsQuestion | "20 years experience" specific question | NEW MODULE? | UNCLEAR |
| 5895 | buildingProgress | Progress container | NEW MODULE? | UNCLEAR |
| 5896-5916 | section1, section2, section3, section4, section5 | Generic section names | NEW MODULE? | UNCLEAR (see F8) |

#### Group 13 — AVA Live Resume + Dynamic UI (lines 6000-8491)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 6000 | atsWarningPanel | ATS warning UI (in JS string) | dashboard.js | UNCLEAR |
| 6650 | avaRoadmapSteps | AVA roadmap visualization | voice.js | UNCLEAR |
| 6720-6730 | avaSections, avaSectionCount | AVA section toggles | voice.js | UNCLEAR |
| 6815-6862 | avaLR_name, avaLR_title, avaLR_contact, avaLR_summarySection, avaLR_summaryHead, avaLR_summary, avaLR_compSection, avaLR_compHead, avaLR_competencies, avaLR_skillsSection, avaLR_skillsHead, avaLR_skills, avaLR_expSection, avaLR_expHead, avaLR_jobs, avaLR_eduSection, avaLR_eduHead, avaLR_education, avaLR_certSection, avaLR_certHead, avaLR_certifications | AVA Live Resume preview (dynamic) | voice.js | UNCLEAR |
| 7143 | avaInputError | AVA chat error | voice.js | UNCLEAR |
| 7264 | (template card data-id) | AVA template selection (dynamic) | voice.js | UNCLEAR |
| 8043 | avaCompanyGrid | Company picker | voice.js | UNCLEAR |
| 8205 | avaDateError | Date validation | voice.js | UNCLEAR |
| 8408-8415 | avaCompGrid, avaCompCount | Competency picker | voice.js | UNCLEAR |
| 8484-8491 | avaSkillGrid, avaSkillCount | Skills picker | voice.js | UNCLEAR |

#### Group 14 — Third Quota Display (line 9078)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 9078 | userBarUsage | "Runs left" usage bar — THIRD quota display | ??? | ZOMBIE? (see F1) |

#### Group 15 — Profile Details (line 9340)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 9340 | profileDetails | Profile expansion area | user-profile.js | UNCLEAR (possible duplicate of profileExpanded) |

#### Group 16 — Job Source Selector + ATS Panels (lines 9443-9510)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 9443 | jobSourceSelector | Job source selector (FIRST) | matching-engine.js | UNCLEAR (see F9) |
| 9461 | atsDetailsPanelHigh | ATS details for high-match | dashboard.js | UNCLEAR (see F10) |
| 9476 | jobSourceSelector | DUPLICATE ID | (none) | ZOMBIE CONFIRMED (see F9) |
| 9501 | atsDetailsPanel | ATS details (different) | dashboard.js | UNCLEAR (see F10) |
| 9510 | jssAtsChecks, jssAtsRecommendations | ATS checks (duplicate of line 1115-1116) | dashboard.js | ZOMBIE? (see F10) |

#### Group 17 — Quick Match / JD Paste (lines 9592-9911)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 9592 | jdPasteArea | JD paste area | matching-engine.js | UNCLEAR |
| 9597-9604 | jdUploadArea, jdDropzone, jdFileName | JD upload alternative | matching-engine.js | UNCLEAR |
| 9860-9869 | moreGapsWrap, hiddenGaps, quickMatchResult | Quick Match results | matching-engine.js | UNCLEAR |
| 9911 | qmMap | Quick Match map (Leaflet?) | matching-engine.js | UNCLEAR |

#### Group 18 — Password Recovery Modal (lines 10145-10155)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 10145-10155 | pwRecoveryModal, pwRecoveryForm, pwRecoveryError, pwRecoverySuccess | Password recovery (Day 13 commit 97bac2f) | auth-flow.js | ALIVE (just shipped, verified) |

#### Group 19 — Gmail Scan Wait (line 10838)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 10838 | gmailScanWait | "Scanning Gmail for jobs..." | gmail-flow.js | ALIVE (this is the modal stuck during walkthrough) |

#### Group 20 — Tracker / Pricing / Chat / Paywall (lines 11755-13417)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 11755 | qmTracker | Quick Match tracker | matching-engine.js | UNCLEAR |
| 11905 | interviewPriceTag | Interview price | NEW MODULE (interview.js) | UNCLEAR |
| 12296-12298 | chatMessages, chatRecordingStatus | Interview chat UI | NEW MODULE (interview.js) | UNCLEAR |
| 12892-12909 | paywallChoice, paywallPayOnce, paywallUnlimited, stripeCheckoutMount | Paywall UI | app-bootstrap.js | UNCLEAR |
| 12969 | regWallError | Registration wall error (FIRST) | auth-flow.js | UNCLEAR (see F11) |
| 13114 | regWallError | DUPLICATE ID | (none) | ZOMBIE CONFIRMED (see F11) |
| 13408-13417 | paywallBackWrap, paywallBack, paywallBack2 | Paywall back buttons | app-bootstrap.js | UNCLEAR (see F12) |

#### Group 21 — Pricing Section (lines 13759-13790)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 13759 | pricingSection | Pricing UI | app-bootstrap.js or own page | UNCLEAR |
| 13790 | proPriceDisplay | Pro price ($19.99/month) | app-bootstrap.js | UNCLEAR |

#### Group 22 — Splash Progress (lines 14914-14959)
EXACT DUPLICATE BLOCK (7 lines copy-pasted)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 14914-14920 | splashProgress, splashSteps, splashS1-splashS5 | Splash progress (FIRST copy) | gmail-flow.js | UNCLEAR (see F13) |
| 14953-14959 | splashProgress, splashSteps, splashS1-splashS5 | EXACT DUPLICATE | (none) | ZOMBIE CONFIRMED (see F13) |

#### Group 23 — Profile Wizard (lines 15009-15039)

| Line | ID | What it is | Provisional target | Tag |
|---|---|---|---|---|
| 15009 | profileWizard | Wizard overlay | user-profile.js | UNCLEAR |
| 15015-15017 | wDot0, wDot1, wDot2 | Wizard step dots | user-profile.js | UNCLEAR |
| 15021-15039 | wStep0, wStep1, wStep2, wizLocBox, wizLocText, wizLocSub | Wizard steps + location detection | user-profile.js | UNCLEAR |

---

## FINDINGS — 13 confirmed/suspected issues

### F1 — Quota duplication (THREE instances)
- Severity: CRITICAL
- Confidence: HIGH (3 separate code blocks render same data)
- Locations:
  - profileQuotas / quotaResumes / barResumes (lines 706-711) — header
  - dashUsageBar / dashQResumes / dashQCustomize / dashQInterviews / dashQAts (lines 1073-1087) — dashboard
  - userBarUsage (line 9078) — third instance
- Action: Identify canonical, mark others as zombies. ~30 min after we see which one actually fires.

### F2 — Two resume-builder UIs
- Severity: MEDIUM
- Confidence: MEDIUM
- Locations:
  - createForm with step1-step4 (lines 791-820)
  - aiResumeBuilder (line 990) with completenessProgress, builderQuestionArea, etc.
- Action: Determine which fires when user clicks "I want to build a resume."

### F3 — AVA chat path placement
- Severity: UNCLEAR
- Confidence: UNCLEAR
- Locations: avaChatWrap (line 829) + extensive AVA UI throughout file
- Action: Determine if AVA is the canonical resume-builder, replacing F2's competitors.

### F4 — gmailStep5/6 out of order
- Severity: MEDIUM
- Confidence: MEDIUM
- Locations: gmailStep6 at line 1736 appears BEFORE gmailStep5 at line 1747
- Action: Investigate flow order. May be intentional (display:none triggered programmatically) or a sloppy edit.

### F5 — Duplicate ID `templatesModal` (CONFIRMED ZOMBIE)
- Severity: HIGH
- Confidence: CONFIRMED (invalid HTML, JS getElementById only finds first)
- Locations:
  - Line 4667 (first instance)
  - Line 4805 (DUPLICATE — guaranteed zombie)
- Action: Verify which is canonical, delete the other.

### F6 — Two onboarding flows
- Severity: HIGH
- Confidence: HIGH
- Locations:
  - onboarding (line 723) — main app onboarding
  - resumeOnboarding (line 3439) — separate, different layout
- Action: Determine which is canonical for current app. Other is zombie.

### F7 — Two complete resume-builder paradigms coexisting
- Severity: HIGH (BIGGEST FINDING)
- Confidence: HIGH
- Pattern:
  - Path 1: onboarding -> pathCreate -> createForm (step1-step4) -> aiResumeBuilder -> avaChat
  - Path 2: resumeOnboarding -> templateModal -> contentTemplateScreen -> formatTemplateScreen -> templatePreview
- Action: Major investigation in next session. Identify canonical path, classify entire other path as zombie. Likely the largest single bloat source in app.html.

### F8 — Generic section IDs
- Severity: LOW (smell, not bug)
- Confidence: LOW
- Locations: section1, section2, section3, section4, section5 at lines 5896-5916
- Action: Rename for clarity OR delete if zombie. Low priority.

### F9 — Duplicate ID `jobSourceSelector` (CONFIRMED ZOMBIE)
- Severity: HIGH
- Confidence: CONFIRMED
- Locations:
  - Line 9443 (first instance)
  - Line 9476 (DUPLICATE — guaranteed zombie)
- Action: Verify which is canonical, delete the other.

### F10 — Multiple ATS result panels
- Severity: HIGH
- Confidence: HIGH
- Locations:
  - atsScore + atsChecks + atsRecommendations (lines 1102-1116) — main widget
  - atsDetailsPanelHigh (line 9461) — separate panel
  - atsDetailsPanel (line 9501) — yet another
  - jssAtsChecks + jssAtsRecommendations (line 9510) — duplicates
- Action: Identify canonical. Likely 2-3 are zombies.

### F11 — Duplicate ID `regWallError` (CONFIRMED ZOMBIE)
- Severity: HIGH
- Confidence: CONFIRMED
- Locations:
  - Line 12969 (first instance)
  - Line 13114 (DUPLICATE — guaranteed zombie)
- Action: Verify which is canonical, delete the other.

### F12 — Versioned IDs (paywallBack + paywallBack2)
- Severity: MEDIUM
- Confidence: MEDIUM
- Locations: Lines 13408-13417
- Action: Investigate why "2" exists. Either delete original (if 2 is the new canonical) or delete 2 (if it's leftover).

### F13 — EXACT 7-line code duplicate (splash progress)
- Severity: CRITICAL
- Confidence: CONFIRMED (exact byte-for-byte copy)
- Locations:
  - Lines 14914-14920 (first copy)
  - Lines 14953-14959 (EXACT DUPLICATE)
- Action: Delete one of them. JS only addresses the first by ID — second is unreachable.

---

## Summary statistics

| Metric | Value |
|---|---|
| Total UI containers grepped | ~240 |
| Logical section groups identified | 23 |
| Confirmed zombies (duplicate IDs) | 4 (F5, F9, F11, F13) |
| Suspected major zombies (competing implementations) | 3 patterns (F1, F7, F10) |
| Items still UNCLEAR (need investigation) | ~230 |
| Items confirmed ALIVE | 2 (Password Recovery, Gmail Scan Wait) |

Net read: Founder hypothesis confirmed. Significant deprecated code mixed in. At minimum 4 confirmed zombies + 3 suspected major zombie patterns. Full classification pending.

---

## TODO — what's still left in the audit

### Layer 2 (Frontend) — partial
- [ ] Modals by class (`class="modal"` and `class="modal-overlay"`) — may catch ones without IDs
- [ ] Forms / inputs — `<input>`, `<textarea>`, `<select>` elements
- [ ] JavaScript function definitions — `function name(){...}` declarations (the PROCEDURE DIVISION)
- [ ] Event handlers — `onclick=`, `onchange=`, `onsubmit=`, `addEventListener(`
- [ ] Global state variables — `var/let/const` at top of `<script>` blocks
- [ ] fetch() calls — the bridge from frontend to Layer 3 (server)
- [ ] localStorage usage — identity-related reads/writes (per Phase 2 plan to migrate to cookies)

### Layer 3 (Server) — not started
- [ ] API routes — every `app.get/post/put/delete(...)` in server/auth.js, api.js, email.js
- [ ] Route handlers — function bodies for each route
- [ ] Business logic functions — scoreJob, parseDocFile, runMatching, etc.
- [ ] Middleware — JWT, rate limiting, auth gates
- [ ] External service wrappers — sendEmail, scoreWithClaude, etc.

### Layer 4 (Data) — not started
- [ ] Tables — list every table in Postgres
- [ ] Schema state — columns per table, NOT NULL constraints, FKs
- [ ] Queries used — every `pool.query(...)` across server code
- [ ] Read patterns — which endpoints read which tables
- [ ] Write patterns — which endpoints write which tables

### Layer 5 (Vendors) — not started
- [ ] Integration points — where each vendor is called
- [ ] API keys — which env vars hold credentials
- [ ] Failure modes — what happens when vendor is down
- [ ] Cost-risk tier — per-use vendors flagged

---

## Walkthrough symptoms (Day 13 morning) — for cross-reference

For full context see XJobs-Local-Baseline-Reset.docx. Summary:

| # | Symptom | Likely root cause |
|---|---|---|
| 1 | Splash overlay leaky | F1+F7 — eager rendering |
| 2 | 48 form-field a11y issues | Single sweep fix — hygiene |
| 3 | "Test User" auto-login in incognito | Identity model — Phase 2 |
| 4 | ElevenLabs TTS fails | Vendor/config issue |
| 5 | ATS=95 but Skills=0 | Display/data mismatch |
| 6 | "11L OFF" header badge unclear | Possible dev-leak |
| 7 | AI Agent floating button | Same as #4 |
| 8 | 8 console errors after upload | Need to capture |
| 9 | Quota meter wrong | F1 + Phase 2 identity |
| 10 | Google sign-in form vs picker | OAuth URL missing prompt=select_account |
| 11 | "I Have Gmail" UX expectation | Same as #10 |
| 12 | scoreJob() null crash | REGRESSION from April 27 sprint |
| 13 | "Bezerk page" reinterpreted | Was Issue #12 manifesting |
| 14 | No timeout on stuck scan | Missing error handling |

---

## Next session opener

Paste this to next agent:

> "Continuing the Phase 1 audit. Read in order: XJobs-Local-Baseline-Reset (the strategic plan), XJobs-Anatomy-and-Lean-Down-Plan (the destination), XJobs-Refactoring-Plan-v2 (the methodology), this file (PHASE-1-AUDIT-WORKING.md). State check: git status, git log --oneline main -3 -> expect HEAD at f92df72. Then continue from the TODO list in this file. We've finished Layer 2 UI containers. Next: function definitions and event handlers, then Layer 3 (server), then Layers 4-5."

---

End of working audit document. Append, don't overwrite, on subsequent sessions.
---

## Day 15 PM — Layer 3 server audit (May 4, 2026)

**Layer 2 closed at 100% (morning session). Layer 3 advanced from 0% → 70%.**

Files audited: api.js (endpoints + helpers + 29 DB queries), auth.js, auth-middleware.js, admin.js, index.js, stripe.js.

**44 new findings (F73-F116). Critical security blockers totaling 26 across the audit. Phase 2 fix scope locked at ~7 hours of disciplined wiring on Day 17-18.**

Headline findings:
- F73: /tts POST handler defined twice in api.js (silent override pattern)
- F74: 14+ api.js routes have NO auth check (admin/subscription/AVA/TTS endpoints unprotected)
- F82: Two JWT_SECRET declarations with different fallback values across auth.js and auth-middleware.js
- F83: Two JWT signing functions with different payloads — login JWTs lack role
- F85: All 5 admin.js routes expose user PII + financial data with NO auth (GDPR/CCPA exposure)
- F91: Dual route mounting (/auth + /api/auth, /admin + /api/admin) — every endpoint has TWO URLs
- F92: Rate limiter bypassable via /api/auth/* path
- F93: NO global auth middleware exists — confirms F74/F85 architecturally
- F98: /stripe/activate-subscription has NO auth + accepts ANY email = free Pro for anyone
- F99: paidSessions Set in-memory only — lost on every restart
- F100: Webhook DB updates loose-coupled (email lookup, no audit trail, silent errors)
- F102: Server trusts client-supplied amount (pay $0.50 for $19.99 Pro)
- F111, F112: Multi-step DB writes not transactional (subscription reset, activate-subscription)
- F94: XSS in /payment-success (query params injected into HTML/JS)
- F96: CORS allows credentials from ANY origin (CSRF wide open after Phase 2 cookie migration)

**Strategic insight:** auth-middleware.js (Mar 29, 63 lines) has all 5 needed middleware exports (authenticateToken, requireAdmin, optionalAuth, generateToken, checkFeatureAccess). They just aren't wired in. **Phase 2 is a wiring exercise, not a rewrite.**

Phase 2 mandatory work estimate: ~7 hours. Day 17-18 timeline holds.

**Maria's first refactor — completed (May 4 PM).** Cache subsystem extracted from api.js into server/cache.js. 6 functions migrated, state moved, cleanup timer relocated, server restarts clean with [DiskCache] Loaded successfully on boot. Pattern validated: extract + import + delete + verify. Phase 4 modularization template established.


---

## Day 16 AM — Layer 3 server audit, subscription.js (May 5, 2026)

**File:** server/subscription.js (171 lines)
**Findings:** 13 (F117-F129). 2 CRITICAL, 4 HIGH, 5 MEDIUM, 2 LOW.

### F117 — Two competing subscription activation paths (F73 disease, cross-handler)
- Severity: CRITICAL
- Confidence: CONFIRMED
- Locations:
  - Webhook handler `checkout.session.completed` subscription branch (subscription.js ~line 110)
  - /verify endpoint (subscription.js ~line 155, minified block)
- Pattern: Both paths fire on a successful subscription checkout. Both UPDATE users + INSERT payments. They write DIFFERENT values for `subscription_plan`, `feature`, `amount_cents`, and `cycles_remaining`. Race-condition outcome: whichever query commits last wins.
- Action: Pick canonical path (webhook is authoritative — Stripe-driven, idempotent design). Gut the DB writes from /verify; keep /verify as a read-only "did the webhook process yet?" check that returns subscription state from the DB. Phase 2 fix.

### F118 — subscription_plan column is non-deterministic ('monthly_unlimited' vs 'pro_monthly')
- Severity: CRITICAL
- Confidence: CONFIRMED
- Locations:
  - Webhook writes `subscription_plan = 'monthly_unlimited'` and `feature = 'monthly_unlimited'`
  - /verify writes `subscription_plan = 'pro_monthly'` and `feature = 'pro_monthly'`
- Action: Audit downstream queries that filter on `subscription_plan` — any `WHERE subscription_plan = 'X'` will silently miss half the user base. Pick one canonical name post-F117, run a one-time migration UPDATE to normalize existing rows. Block any Phase 2 work that depends on plan name until normalized.

### F119 — Hardcoded feature pricing in /feature-checkout (F102 cousin)
- Severity: HIGH
- Confidence: CONFIRMED
- Locations: subscription.js ~lines 50-56 (`pricing` object inside handler)
- Pattern: Subscription pricing pulled from `./pricing.getSubscriptionConfig()` in /verify. Feature pricing hardcoded inline. Two sources of truth, one of them wrong. Admin UI's price-adjustment flow won't reach the inline values.
- Action: Move feature pricing into `./pricing` module alongside subscription config. Phase 2 wiring task.

### F120 — Webhook subscription activation not transactional (F111/F112 family)
- Severity: HIGH
- Confidence: CONFIRMED
- Location: Webhook `checkout.session.completed` subscription branch — UPDATE users then INSERT payments as separate queries
- Action: Wrap in BEGIN/COMMIT or use `pool.connect()` + transaction. If INSERT fails, user is "active" with no payment audit row. Same disease as F111/F112.

### F121 — Webhook errors silently swallowed (F100 family)
- Severity: HIGH
- Confidence: CONFIRMED
- Location: Webhook handler outer try/catch — `console.error` then `res.json({received: true})`
- Pattern: Stripe gets 200 OK regardless of DB success. No retry. No audit trail. Payment processed in Stripe, no record locally.
- Action: On DB error, return 500 so Stripe retries. Add structured logging to a `webhook_events` table for forensics. Phase 2.

### F122 — Open redirect via req.headers.origin in success_url / cancel_url
- Severity: HIGH
- Confidence: HIGH
- Locations: subscription.js /checkout (line ~36) and /feature-checkout (line ~75) — both use `req.headers.origin || 'http://localhost:4000'`
- Pattern: Origin header is client-controlled. An attacker setting `Origin: https://evil.com` on a checkout request gets Stripe to redirect their victim to evil.com after successful payment.
- Action: Replace with server-side allowlist (`['https://xjobsfinder.com', 'http://localhost:4000']`) and reject mismatches. Phase 2 security pass.

### F123 — customer.subscription.updated marks status='canceled' on cancel_at_period_end
- Severity: HIGH
- Confidence: HIGH
- Location: Webhook `customer.subscription.updated` handler (line ~125, nested ternary)
- Pattern: User cancels mid-cycle. Stripe sets `cancel_at_period_end=true` but `status='active'` until period ends. Our code immediately writes 'canceled'. User loses Pro access prematurely while still paid through end of period.
- Action: Map `cancel_at_period_end=true` → keep status 'active', set a separate flag (e.g. `subscription_cancel_pending`). Honor `subscription_ends_at` for actual access cutoff. Phase 2.

### F124 — No idempotency on webhook (duplicate payments rows on Stripe retry)
- Severity: MEDIUM
- Confidence: HIGH
- Location: Webhook subscription branch INSERT into payments
- Pattern: Stripe delivers events at-least-once. UPDATE on users is naturally idempotent; INSERT on payments is not. Retry creates duplicate payment rows.
- Action: Add `ON CONFLICT (stripe_session_id) DO NOTHING` (requires unique constraint on stripe_session_id) or check-then-insert. Phase 2.

### F125 — /cancel returns success without updating local DB
- Severity: MEDIUM
- Confidence: CONFIRMED
- Location: /cancel handler (line ~95)
- Pattern: Calls Stripe's cancel-at-period-end, returns success. Local DB not touched — relies entirely on the webhook to update status. Webhook delay or failure leaves user UI showing wrong state.
- Action: Set local `subscription_cancel_pending = true` in same handler before returning. Webhook still authoritative for final state. Phase 2.

### F126 — /verify endpoint code style anomaly (collapsed/minified)
- Severity: MEDIUM (smell)
- Confidence: CONFIRMED
- Location: subscription.js lines ~155-170
- Pattern: Entire endpoint written as collapsed one-liner with no whitespace. Different style from rest of file. Hallmark of AI-generated bolt-on or hot-fix that bypassed code review. Combined with F117, this is the "duplicate path that got bolted on after the original webhook stopped feeling reliable" smell.
- Action: After F117 resolution either delete or reformat.

### F127 — payments table accumulates orphan 'pending' rows
- Severity: MEDIUM
- Confidence: CONFIRMED
- Location: /feature-checkout (line ~80) — INSERT payment with status='pending' before redirecting to Stripe
- Pattern: User abandons checkout → row stays 'pending' forever. Stripe sessions expire after 24h but DB row persists. Table grows.
- Action: Add daily cleanup job (delete pending older than 24h) or rely on webhook `checkout.session.expired` to mark abandoned. Phase 2 or later — hygiene, not blocker.

### F128 — stripe_payment_intent stored as null for subscription mode
- Severity: MEDIUM
- Confidence: HIGH
- Location: Webhook subscription branch — `session.payment_intent` for subscription mode is typically null (subscriptions flow through invoices)
- Pattern: payments row gets null payment_intent. Reconciliation queries that join on payment_intent fail silently for subscription rows.
- Action: For subscription mode, store `session.invoice` or pull `subscription.latest_invoice.payment_intent`. Phase 2.

### F129 — UNLIMITED_PRICE_ID env var not validated at boot
- Severity: LOW
- Confidence: CONFIRMED
- Location: subscription.js line 6
- Pattern: If env var missing, /checkout fails with cryptic Stripe error at first user attempt instead of failing loud at server boot.
- Action: Add boot-time validation in index.js (alongside other env checks). Trivial.

---

**Layer 3 progress:** 70% → ~78%. Files remaining: db.js, db-init.js, email.js, pricing.js, cost-tracker.js (all small per Jorge's plan).

**Phase 2 scope update:** F117 + F118 add ~1.5 hours (subscription path normalization + plan-name migration). Revised Phase 2 estimate: ~8.5 hours. Day 17-18 timeline still holds.

**Phase 4 modularization note:** subscription.js is already a clean extraction candidate (single Express router, well-bounded). After F117 fix lands, this file becomes Maria's second refactor target — same pattern as cache.js but smaller surface (no shared state to hoist).

---

## Day 16 AM (cont.) — Layer 3 server audit, db.js (May 5, 2026)

**File:** server/db.js (23 lines)
**Findings:** 5 (F130-F134). 0 CRITICAL, 0 HIGH, 3 MEDIUM, 2 LOW.
**Net read:** File is fundamentally sound — standard pg.Pool setup. All findings are production-hardening / silent-failure items. No bugs.

### F130 — DATABASE_URL not validated at module load (F129 family)
- Severity: LOW
- Confidence: CONFIRMED
- Location: db.js line 8
- Pattern: If env var missing, Pool constructed with `connectionString: undefined`. First query fails with cryptic pg error instead of server failing loud at boot.
- Action: Add to boot-time env validation block in index.js alongside JWT_SECRET, STRIPE_SECRET_KEY, UNLIMITED_PRICE_ID checks. Trivial.

### F131 — SSL certificate verification disabled in production
- Severity: MEDIUM
- Confidence: CONFIRMED
- Location: db.js line 14 — `ssl: { rejectUnauthorized: false }`
- Pattern: Connection IS encrypted but cert chain not validated. Theoretically vulnerable to MITM on the path between Node process and Postgres. Common pattern for Railway/Heroku/Supabase managed DBs that use self-signed or rotating certs — acceptable risk if documented.
- Action: Either (a) pull Railway's CA cert and use `ssl: { ca: fs.readFileSync('./railway-ca.crt') }`, or (b) document as accepted risk in Phase 2 security register. Phase 2 decision, not blocker.

### F132 — No graceful shutdown / pool.end() on SIGTERM
- Severity: MEDIUM
- Confidence: CONFIRMED
- Location: Origin in db.js (no shutdown handler attached); responsibility likely belongs in index.js
- Pattern: When Railway redeploys or process receives SIGTERM/SIGINT, pool is not drained. In-flight queries killed mid-transaction. Idle connections may leak briefly until DB-side timeout reaps them. Combined with F120 (non-transactional webhook writes), a redeploy mid-checkout could leave users in inconsistent subscription state.
- Action: Add `process.on('SIGTERM', async () => { await pool.end(); process.exit(0); })` and matching SIGINT in index.js. Phase 2.

### F133 — No statement_timeout / query_timeout configured (compounds F121)
- Severity: MEDIUM (trending HIGH at pilot scale)
- Confidence: CONFIRMED
- Location: db.js Pool config (lines 7-16) — neither `statement_timeout` nor `query_timeout` set
- Pattern: Runaway query holds a connection indefinitely. Pool max=20. Twenty stuck queries = total DB lockout for all users until process restart. Compounds with F121 (webhook errors silently swallowed) — a webhook handler hung on a slow query never times out, never returns 500 to Stripe, never gets retried, just sits there consuming a connection slot.
- Action: Add `statement_timeout: 30000` (30s) to pool options. Critical pilot-week hardening, low risk to add. Phase 2.

### F134 — connectionTimeoutMillis 2000ms aggressive for cloud DB
- Severity: LOW
- Confidence: HIGH
- Location: db.js line 11
- Pattern: 2-second connect timeout reasonable for local dev; aggressive for Railway free tier (cold-start latency) or peak-traffic scenarios. Cascading connection failures during traffic spikes.
- Action: Bump to 5000-10000ms. Trivial.

---

**Layer 3 progress:** 78% → ~82%. Files remaining: db-init.js, email.js, pricing.js, cost-tracker.js.

**Cross-file pattern emerging:** F121 (silent webhook errors) + F132 (no graceful shutdown) + F133 (no statement timeout) + F120 (non-transactional writes) form a connected silent-failure cluster. Phase 2 should bundle these as one "operational integrity" workstream rather than treating them as isolated fixes. Estimated combined effort: ~2 hours.

---

## Day 16 AM (cont.) — Layer 3 server audit, db-init.js (May 5, 2026)

**File:** server/db-init.js (~180 lines, schema-init script)
**Findings:** 8 (F135-F142). 2 CRITICAL, 1 HIGH, 4 MEDIUM, 1 LOW.
**Net read:** Worst-condition file in the codebase so far. Two ship-blockers requiring pre-pilot rewrite. No canonical schema source exists in code; production schema was built incrementally by hand and has drifted from this file. Pre-pilot action required Day 17 AM (1-2 hours).

### F135 — db-init.js has invalid JavaScript syntax (file cannot execute)
- Severity: CRITICAL
- Confidence: CONFIRMED (verified via `node --check`)
- Location: db-init.js lines ~158+ (after `init();` call)
- Pattern: Naked SQL (`CREATE TABLE IF NOT EXISTS feature_usage ...` and gmail_tokens block) sits outside any JavaScript string, after the `init()` call. JavaScript parser rejects the file. `npm run db:init` and `npm run db:reset` cannot execute.
- Implication: Schema initialization has been broken for an unknown amount of time. Production tables created out-of-band (likely manual psql or earlier working version of this file). No reproducible deploy path.
- Action: PRE-PILOT REWRITE. Day 17 AM. Dump current production schema via `pg_dump --schema-only $DATABASE_URL`, paste into the `schema` template literal, restructure file as canonical source. Verify with `node --check` before commit. Test in a scratch DB.

### F136 — Schema in db-init.js diverges massively from production
- Severity: CRITICAL
- Confidence: CONFIRMED
- Location: Entire `schema` template literal vs actual production tables
- Pattern: Production has tables and columns the init script knows nothing about:
  - Missing tables: `payments`, `feature_usage` (only present as orphan SQL after init()), `gmail_tokens` (same), `subscription_cycles`, `subscription_usage`
  - Missing columns on `users`: `subscription_status`, `subscription_id`, `subscription_plan`, `subscription_started_at`, `subscription_ends_at`, `stripe_customer_id`, `cycles_remaining`
- Implication: Cannot stand up new environment (staging, recovery, alternate cloud) from this code. Schema is undocumented. Code review for schema changes is impossible because there's no source of truth to diff against.
- Action: Bundled with F135 rewrite. Output of `pg_dump --schema-only` becomes the new canonical schema. After rewrite, add CI check that compares schema dump against db-init.js to detect future drift.

### F137 — `dropAll` incomplete: --reset leaves orphan tables
- Severity: HIGH
- Confidence: CONFIRMED
- Location: db-init.js `dropAll` constant (lines ~135-141)
- Pattern: Drops only 5 tables (cheatsheets, interview_sessions, campaigns, resumes, users). Leaves `feature_usage`, `gmail_tokens`, `payments`, `subscription_cycles`, `subscription_usage` standing. Post-reset state has orphan tables with FK references to nonexistent users (CASCADE drops the FKs but data persists with dead user_ids).
- Action: Expand dropAll to include all production tables. Bundled with F135/F136 rewrite.

### F138 — feature_usage missing query-path index (performance time-bomb)
- Severity: MEDIUM
- Confidence: HIGH
- Location: feature_usage table definition (in floating SQL after init())
- Pattern: subscription.js usage endpoint queries `WHERE user_id=$1 AND created_at >= $2 GROUP BY feature`. No index on (user_id, created_at). Full table scan on every quota check. With pilot at 100+ users × multiple quota fetches per session, table grows fast and scans get slow.
- Action: `CREATE INDEX idx_feature_usage_user_created ON feature_usage(user_id, created_at);`. Trivial. Add to rewritten schema.

### F139 — Inconsistent ON DELETE CASCADE on user-FK columns
- Severity: MEDIUM
- Confidence: CONFIRMED
- Locations: `feature_usage.user_id REFERENCES users(id)` and `gmail_tokens.user_id REFERENCES users(id) UNIQUE` — no ON DELETE clause
- Pattern: Other tables (resumes, campaigns, interview_sessions, cheatsheets) use ON DELETE CASCADE. These two omit it. User deletion either fails with FK constraint violation or leaves orphan rows (depending on Postgres default behavior — default is NO ACTION = constraint violation).
- Action: Add `ON DELETE CASCADE` to both during F135/F136 rewrite. Verify production has this too (drift check).

### F140 — db-init.js Pool config has no SSL setting (inconsistent with db.js)
- Severity: MEDIUM
- Confidence: CONFIRMED
- Location: db-init.js line 10 — `new Pool({ connectionString: process.env.DATABASE_URL })`
- Pattern: db.js conditionally sets SSL for production (`rejectUnauthorized: false`). db-init.js sets nothing. Running `npm run db:init` against production Railway DB may fail on connect unless DATABASE_URL includes `?sslmode=require`. Different posture across two files using the same connection string.
- Action: Match db.js posture. Bundled with rewrite. Long-term, factor pool config into a shared module both files import.

### F141 — Possible zombie columns on users table
- Severity: MEDIUM
- Confidence: UNCLEAR (need code grep)
- Location: db-init.js users table — `free_campaign_used BOOLEAN DEFAULT FALSE`, `paid_unlocked BOOLEAN DEFAULT FALSE`, `paid_campaigns_used INTEGER DEFAULT 0`
- Pattern: These look like a legacy per-campaign pricing model superseded by the current Stripe subscription + feature_usage model (free trial campaign, paid unlock, campaign count). May be unused in current code.
- Action: Grep server/ for reads/writes of these column names. If unreferenced, mark ZOMBIE and drop in F135 rewrite. If still referenced, document why.

### F142 — Inconsistent dotenv loading discipline
- Severity: LOW
- Confidence: CONFIRMED
- Location: db-init.js line 7 — `require('dotenv').config()`. db.js does not call dotenv.
- Pattern: db-init.js runs as standalone script via npm scripts → loads .env directly. db.js runs as part of server process → relies on index.js to load .env first. Different posture, both defensible. Document for new contributors.
- Action: Either standardize (call dotenv in both, harmless) or document the convention. Cleanup item.

---

**Layer 3 progress:** 82% → ~88%. Files remaining: email.js, pricing.js, cost-tracker.js.

**Pre-pilot work added:** F135 + F136 + F137 require a db-init.js rewrite on Day 17 AM. ~1-2 hours. This is a NEW pre-pilot blocker not previously scoped.

**Revised Phase 2 estimate:** 8.5h subscription/operational + 1.5h db-init rewrite (separable, Day 17 AM) = ~10h total. Day 17-18 timeline still holds but tighter.

**Schema drift problem (architectural):** No canonical schema source exists in the codebase. Production was built by hand. Recommend: post-pilot, adopt a migration tool (Knex / node-pg-migrate / Prisma migrate) so every schema change is captured as a numbered migration file in version control. For pilot, a regenerated db-init.js from `pg_dump --schema-only` is sufficient.

---

## Day 16 AM (cont.) — Layer 3 server audit, email.js (May 5, 2026)

**File:** server/email.js (~95 lines, Resend-based transactional email)
**Findings:** 7 (F143-F149). 0 CRITICAL, 1 HIGH, 2 MEDIUM, 4 LOW.
**Net read:** Cleanest Layer 3 file audited so far. Two well-formed functions, minimal magic, sensible structure. F143 verified against Resend dashboard during audit session — comment-only bug, no deliverability issue. Rest is hardening.

### F143 — Stale "Domain:" comment in file header (RESOLVED in session)
- Severity: LOW
- Confidence: CONFIRMED
- Location: email.js line 3 header comment originally read `Domain: send.xjobsfinder.com (verified)`. Actual verified domain in Resend is `xjobsfinder.com`.
- Pattern: Stale comment, likely leftover from initial setup planning. FROM_ADDRESS (`noreply@xjobsfinder.com`) matches the actually-verified domain. No deliverability impact.
- Action: Comment fixed in audit session via sed. Verified emails route correctly. No further action.

### F144 — No rate limiting on password reset email (cross-file)
- Severity: HIGH
- Confidence: CONFIRMED
- Location: Concern lives in route handler that invokes sendPasswordReset (likely server/auth.js), not in email.js itself. Logged here because it's the cross-cutting concern surfaced by this file's audit.
- Pattern: Attacker hits `/auth/request-password-reset` repeatedly with any email. Each call triggers a Resend API call (costs money, hurts sender reputation) and reveals via timing/response whether email is registered (user enumeration). At pilot scale, an attacker could exhaust Resend quota in minutes.
- Action: Phase 2. Rate-limit by IP (e.g., 3/hour) AND by target email (e.g., 3/hour). Use existing rate-limiter middleware confirmed in F92 / auth-middleware.js inventory. Cross-reference with F74 (auth gaps).

### F145 — HTML interpolation without escaping
- Severity: MEDIUM
- Confidence: CONFIRMED
- Locations: email.js line ~21 `${resetUrl}` in href, line ~63 `${firstName}` in `<h2>`
- Pattern: Both interpolated directly into HTML without HTML-escape. `firstName` is derived from `full_name` DB column → user-controlled. Email-client sandboxing makes XSS unlikely (Gmail/Outlook strip scripts), but defense-in-depth says escape. Also: a stray quote in resetUrl breaks the href entirely.
- Action: Add a small `escapeHtml(s)` helper, wrap interpolations. Phase 2 hygiene.

### F146 — No retry / queue for transactional email sends
- Severity: MEDIUM
- Confidence: CONFIRMED
- Location: Both functions in email.js — single send attempt, on failure either throws (sendPasswordReset) or returns null (sendWelcome)
- Pattern: Resend transient failure (rate limit, API blip, regional outage) → password reset email never arrives. User waits an hour assuming it's coming, eventually contacts support. At pilot scale this is acceptable; longer-term needs a job queue.
- Action: Phase 2 — add simple in-process retry (3 attempts, exponential backoff). Phase 4 — proper job queue (BullMQ on Redis or pg-boss on existing Postgres). Pilot can ship without this if F143 is clean.

### F147 — RESEND_API_KEY not validated at boot (F129/F130 family)
- Severity: LOW
- Confidence: CONFIRMED
- Location: email.js line 7 — `new Resend(process.env.RESEND_API_KEY)` runs at module load with potentially undefined value
- Pattern: First send fails with cryptic error instead of failing loud at boot. Same family as F129 (UNLIMITED_PRICE_ID), F130 (DATABASE_URL).
- Action: Add to boot-time env validation block in index.js. Trivial. Bundle the whole env-check fix together.

### F148 — Inconsistent error semantics between sendPasswordReset and sendWelcome
- Severity: LOW
- Confidence: CONFIRMED
- Locations: sendPasswordReset throws on missing key and on Resend error. sendWelcome returns null on missing key, on error, and on exception (try/catch).
- Pattern: Defensible asymmetry (password reset is auth-critical, welcome is nice-to-have) but undocumented. Looks like two different authors or evolution over time.
- Action: Add JSDoc to both functions documenting the contract. Cleanup item.

### F149 — Hardcoded "15 minutes" expiry text in email body
- Severity: LOW
- Confidence: CONFIRMED
- Location: email.js line ~19 (HTML), line ~25 (text)
- Pattern: Text claims expiry is 15 minutes. Actual JWT expiry is set in auth.js (or wherever the token is signed). If those drift, the email lies to the user.
- Action: Pass expiry minutes as parameter to sendPasswordReset, or pull from shared config. Trivial. Bundle with F148.

---

**Layer 3 progress:** 88% → ~93%. Files remaining: pricing.js, cost-tracker.js.

**F143 resolved in audit session** — Resend domain verified as `xjobsfinder.com`, comment fixed. No pilot blocker.

**Cross-file env-validation cluster:** F129 (UNLIMITED_PRICE_ID) + F130 (DATABASE_URL) + F147 (RESEND_API_KEY) + likely more in pricing.js / cost-tracker.js. Bundle into one ~10-line boot-time check block in index.js. Estimated effort: 15 minutes total. Phase 2.

---

# DAY 16 — Layer 4 + Layer 5 audit (May 5, 2026)

**Phase 1 status: 100% complete.** Layers 4 + 5 closed in this session. Env-validation cluster (F129/F130/F147) closed via 6-line boot-time check block in server/index.js. 6 new findings logged (F148–F153 Layer 4, F154–F156 Layer 5). Two CRITICAL findings on schema documentation drift. One HIGH finding on duplicate Stripe webhook handlers — the billing-system equivalent of F15 (cousin-contamination pattern).

## Step 1 — Env-check block landed ✅

Closes F129 (STRIPE_UNLIMITED_PRICE_ID) + F130 (DATABASE_URL) + F147 (RESEND_API_KEY) atomically. Block placed between `require('dotenv').config();` and the express requires in server/index.js. Smoke-tested both paths: validator fires with correct error message when vars missing; server boots cleanly when vars present. Backup retained at server/index.js.bak.20260505_220625.

Noted during placement: env var name in code is `STRIPE_UNLIMITED_PRICE_ID` (not `UNLIMITED_PRICE_ID` as audit-note F129 originally read). Confirmed via `grep -rn "UNLIMITED_PRICE_ID" server/` — only used in subscription.js:6, sourced correctly.

## Layer 4 audit — Postgres

### F148 (CRITICAL) — Schema documentation is fiction

Both `server/db-init.js` (JS template literal schema) and `server/db/001_full_schema.sql` are out of date. Live database is canonical, neither file matches it.

- **Live `users` table: 33 columns.** db-init.js describes 14. SQL file describes 21.
- **Live database: 17 tables.** db-init.js creates 5. SQL file creates 8.
- **Undocumented live columns in `users`:** industry, target_role, location_city, location_state, latitude, longitude, profile_complete, email_verified, cycles_remaining, cycles_rollover, free_run_used, usage_credits.
- **Undocumented live tables:** password_reset_tokens, plan_limits, session_tokens, subscription_cycles, subscription_usage.
- **Migration history is invisible.** No record of how live state diverged from documented files.

If anyone runs `npm run db:init --reset` against a fresh database, the resulting app is broken — missing 12 columns and 6 tables that production code depends on.

**Resolution: deferred to post-pilot Phase 4.** Generate `pg_dump --schema-only` of live DB, replace both fiction files with it, delete or rewrite db-init.js. Live schema snapshot captured at `Documentation/Audit/live-schema-snapshot.md` (178 columns across all tables) — interim source of truth.

### F149 (CRITICAL) — Duplicate name columns in `users`

Live table has both `full_name` (from db-init.js lineage) and `name` (from SQL file lineage). One came from each schema attempt. Code is reading/writing one of them; the other holds dead data. Need to identify which is canonical before deletion. Phase 4 cleanup task.

### F153 (HIGH) — Dead-code tables exist in production

`feature_usage` and `gmail_tokens` are defined in db-init.js OUTSIDE the `init()` schema template — after the `init();` call as floating SQL strings. This means **the table-creation never executes** when `npm run db:init` runs. Yet both tables exist in the live database. Someone created them manually with no migration record. Same Phase 4 cleanup target as F148.

### F150 (MEDIUM) — Missing FK cascades in 001_full_schema.sql

- `payments.campaign_id` — no `ON DELETE` clause (NULL allowed, probably fine for refund accounting)
- `provider_costs.user_id` — no cascade, orphans on user delete
- `api_usage_log.user_id` — same

Note: this finding describes the SQL-file schema, not necessarily live state. Verify against live DB before fixing.

### F151 (MEDIUM) — Missing indexes on hot paths

- `payments.stripe_session_id` — webhook lookups
- `users.google_id` — OAuth callback every login
- `interview_sessions.user_id`
- `cheatsheets.user_id`
- `gmail_tokens.user_id` — every Gmail call
- `feature_usage.user_id` — usage queries

Verify which already exist on live DB (they may; documentation just doesn't show them).

### F152 (LOW) — `ssl: { rejectUnauthorized: false }` in db.js

Standard for Railway/Heroku-style deploys. Not a fix needed pre-pilot. Defer.

## Layer 5 audit — vendors

Vendor surface mapped via `grep -rn -E "@anthropic|elevenlabs|googleapis|google-auth|stripe|resend|cloudflare" *.js` across server/. 7 external services in scope: Anthropic, ElevenLabs, Google, Stripe, Resend, Cloudflare, Railway. Cloudflare and Railway are infrastructure-only (no code-level dependencies). 5 vendors with code touchpoints; all reviewed.

### F154 (HIGH) — Two parallel Stripe implementations

**`server/stripe.js` and `server/subscription.js` both register `/api/stripe/webhook`.** Both initialize Stripe via `process.env.STRIPE_SECRET_KEY`. Both create checkout sessions. Both write to the `payments` table. Both call `stripe.webhooks.constructEvent` with `STRIPE_WEBHOOK_SECRET`.

If both are loaded, Express registers only one webhook handler — whichever loads last silently overrides the first. The other's logic never runs. **This is the billing-system equivalent of F15 (`finalizeResume` defined twice). Same cousin-contamination pattern that drove the May 2 strategic pivot.**

Smoking-gun evidence:
- `stripe.js:19` — `app.post('/api/stripe/webhook', ...)`
- `subscription.js:100` — same path, different code

**Severity rationale:** if a previously-shipped Stripe fix has been silently overridden, real customer billing has been affected. Highest-priority post-audit finding for Phase 2 work. Resolution: read both files end-to-end, identify which has the correct logic, delete the other, single source of truth for billing routes.

### F155 (MEDIUM) — `googleapis` required twice in auth.js

Lines 302 and 326 both call `const { google } = require('googleapis');` inside function bodies. Node module cache memoizes the require, so this is harmless at runtime — but it's a code smell signaling refactor leftovers. Trivial Phase 2 cleanup: hoist to top-level require, remove the two function-local copies.

### F156 (MEDIUM) — ElevenLabs SDK not used; raw fetch instead

`api.js:1649` and `api.js:1702` call ElevenLabs via raw `fetch` to `https://api.elevenlabs.io/v1/text-to-speech/...`. No official `elevenlabs` SDK import anywhere. Means: streaming response handling is hand-rolled, no retry logic, no rate-limit handling, no upstream error parity if ElevenLabs changes their response shape. Pre-pilot acceptable; post-pilot consider migrating to official SDK.

### Vendor health summary

| Vendor | Status | Notes |
|---|---|---|
| Anthropic | ✅ Clean | Single SDK import (api.js:19). No duplicates. |
| Resend | ✅ Clean | Single import, single client (email.js:6, 8). Env-validated by Step 1. |
| Google OAuth | 🟡 Minor | F155 — double require inside auth.js. Harmless but ugly. |
| Stripe | 🔴 HIGH | F154 — duplicate webhook handler across stripe.js + subscription.js. |
| ElevenLabs | 🟡 Minor | F156 — raw fetch instead of SDK. |
| Cloudflare | ✅ N/A | Infrastructure-only. No code touchpoints. |
| Railway | ✅ N/A | Infrastructure-only. No code touchpoints. |

## Phase 1 — 100% complete ✅

All 5 architectural layers audited:

- Layer 1 (Browser): N/A — out of scope per Day 12 architecture decision
- Layer 2 (Frontend / app.html): F1–F55 — closed in Day 14
- Layer 3 (Server / api, auth, email, index, stripe, subscription, cost-tracker, pricing): F129, F130, F143, F147 — closed Day 15. Cost-tracker.js + pricing.js cleared of env-validation cluster. Env-check block deployed.
- Layer 4 (Postgres): F148, F149, F150, F151, F152, F153 — closed Day 16. Live schema snapshot captured.
- Layer 5 (Vendors / 5 SDKs): F154, F155, F156 — closed Day 16.

**Total findings catalogued:** 156. Severity breakdown to be re-counted at end of Phase 1.5.

## Phase 1.5 deletion candidates — confirmed list

- F5 — duplicate ID `templatesModal` (app.html lines 4667 + 4805). Invalid HTML, unreachable from JS.
- F9 — duplicate ID `jobSourceSelector` (app.html lines 9443 + 9476). Same.
- F11 — duplicate ID `regWallError` (app.html lines 12969 + 13114). Same.
- F13 — exact 7-line duplicate, splash progress (app.html lines 14914-14920 vs 14953-14959).
- `server/admin copy.js` — file with space in name. Almost certainly duplicate of admin.js. Verify before deleting.
- `server/email.js.bak` — leftover backup file. Safe to delete.

Phase 1.5 begins next session block.


## Phase 1.5 deletion attempts — Day 16 PM session corrections

### F5 / F9 / F11 — scope corrected, deletion deferred to Phase 2

Initial Phase 1.5 plan classified F5, F9, F11 as "safe deletes — invalid HTML means duplicate IDs are unreachable from JS." Surgical inspection of F5 (templatesModal) on Day 16 disproved this:

**F5 — templatesModal block at line 4667 vs 4805 are NOT byte-identical.** Same modal definition, but the role-selection element differs:
- Line 4673: `<button onclick="selectAlternativeTemplate(...)">` (accessible, correct practice)
- Line 4811: `<div onclick="selectAlternativeTemplate(...)">` (older, less accessible)

This is the UI counterpart to F14 (selectAlternativeTemplate defined twice). One version is the fix, the other is the regression that silently overrode it via JavaScript's last-definition-wins semantics. **Deleting blindly risks shipping the regression into pilot.**

By extension, F9 (jobSourceSelector) and F11 (regWallError) likely have the same divergent-implementation pattern. Cannot be safely deleted without first identifying which version is winning at runtime (browser devtools inspection of rendered DOM + which function definitions are bound).

**Resolution path:** Defer F5, F9, F11 surgery to Phase 2 (AKA Identity Refactor), where the underlying duplicate function definitions (F14, F15, F16, F17) will already be under inspection. Bundle UI duplicate resolution with code duplicate resolution — single source of truth review per pair.

**Lesson logged:** "Invalid HTML = unreachable from JS = safe to delete" is too coarse a heuristic when the duplicates contain divergent code paths. Future Phase 1.5 candidates require byte-level diff verification before deletion. The Day 13 audit notes flagging F5/F9/F11 as "safe deletes" should be considered superseded.

### Phase 1.5 deletions completed today

- server/admin copy.js (93 lines, byte-identical to admin.js — verified via diff)
- server/email.js.bak (legacy backup file)

### F18 — 7 empty interview function stubs — TARGETED NEXT

F18 (lines 12481-12488 per Day 13 audit): beginFirstQuestion, askNextQuestion, selectChoice, evaluateAnswer, provideFeedback, startListening, stopListening — all defined as empty functions. Empty functions cannot contain divergent implementations, so this finding does NOT have the F5/F9/F11 nuance. Safe deletion target if line numbers verify and bodies confirmed empty.


### F18 — RESOLVED ✅

7 empty function stubs deleted from app.html via sed pass on Day 16. All 7 verified pre-deletion as having only 1 occurrence in the file (the definition itself, never called). Post-deletion verification: 0 occurrences for all 7 names, surrounding code intact, nextQuestion() (the real function) untouched.

Functions deleted: beginFirstQuestion, askNextQuestion, selectChoice, evaluateAnswer, provideFeedback, startListening, stopListening.

Backup retained: app.html.bak.20260505_22XXXX (timestamped during F5 inspection earlier in session).


---

## F154 — REFRAMED + PRE-SURGERY STATE CAPTURED (Day 16 PM)

### Initial framing (Day 15) was wrong; here is the correct picture

Original F154 (Day 15): "Two parallel Stripe webhook handlers, one wins at runtime, the other is silently dead — billing-system equivalent of F15 cousin pattern."

Corrected F154 (Day 16, after end-to-end inspection): subscription.js is not mounted anywhere. index.js has no require for ./subscription. Confirmed via grep across index.js, api.js, auth.js — zero references to the file. subscription.js is fully orphaned dead code at the file level.

This is worse than a duplicate handler. The dead file contains the more sophisticated webhook logic — handles 4 event types, syncs full subscription metadata to DB, writes payments table on subscription. The live file (stripe.js) handles only checkout.session.completed, only by email, doesn't write payments, doesn't track subscription_id, started_at, or ends_at.

### Timestamp evidence — subscription.js is the abandoned original

stripe.js: 7067 bytes, last modified Apr 14 06:28
subscription.js: 10045 bytes, last modified Apr 2 18:46

subscription.js is 12 days older. Pattern: original implementation built April 2, refactor begun April 14, refactor never finished, original never deleted. Fits the cousin-contamination pattern from May 2 — except in this case the rewrite dropped sophistication on the floor instead of adding regressions.

### Real customer impact (currently in production)

Live billing system (stripe.js only) is missing:
- subscription_id, subscription_started_at, subscription_ends_at not written on activation. DB doesn't know when subscriptions expire.
- No customer.subscription.updated handler. Renewals, plan changes, and cancel-at-period-end flags are not synced.
- No customer.subscription.deleted handler. Customers who cancel still show as subscription_status='active' indefinitely.
- No invoice.payment_failed handler. Failed payments don't flag user as past_due.

### F154 sub-findings (split for tracking)

- F154a (HIGH): subscription.js is dead code. Delete the file.
- F154b (HIGH): Three subscription-activation paths write three different subscription_plan strings ('pro', 'monthly_unlimited', 'pro_monthly'). Standardization needed. Defer to Phase 2 polish — pick 'pro' as canonical.
- F154c (HIGH): Missing webhook event handlers in stripe.js for customer.subscription.updated, customer.subscription.deleted, invoice.payment_failed. Resolution: lift from dead subscription.js into live stripe.js before deleting the dead file.
- F154d (MEDIUM): paidSessions is in-memory Set, lost on restart. Defer to Phase 3 polish — move to Postgres or Redis.
- F154e (MEDIUM): No DB transactions around webhook updates. If subscription update succeeds but payments insert fails, DB is inconsistent. Defer to Phase 2 polish.

### Pre-surgery state (Day 16 PM)

- stripe.js: 7067 bytes, mounted via registerStripeRoutes(app) at index.js line 25
- subscription.js: 10045 bytes, NOT mounted, fully orphaned
- Stripe CLI: not installed at session start. Installation in progress for webhook testing.
- Backup paths to be recorded here after surgery begins
- Webhook URL configured in Stripe Dashboard: to be confirmed via CLI

### Surgery plan (Day 16 PM, in progress)

1. Install Stripe CLI via brew, authenticate test mode
2. Confirm Stripe Dashboard webhook URL matches /api/stripe/webhook
3. Backup stripe.js and subscription.js with timestamps
4. Lift 4 missing event handlers from subscription.js into stripe.js webhook block
5. Reconcile user lookup: keep email-based for backwards compat, add metadata.xjobs_user_id as primary
6. Smoke test: start server, fire stripe trigger test events for each new event type, verify DB state changes
7. Delete subscription.js
8. Final smoke test post-deletion
9. Document outcome in this file and in Day 16 PM close-out

### Out of scope tonight (deferred)

- F154b standardization — different commit
- F154d in-memory paidSessions migration — Phase 3
- F154e DB transactions — Phase 2 polish
- F148 schema fiction — Phase 4

### Why we are doing surgery tonight (founder's call)

Path B chosen over my recommendation of Path A (pre-flight tonight, surgery tomorrow). Founder's reasoning preserved verbatim from Day 16 PM session:

"I gave myself ample runway but this is the time to forge ahead to save for the rainy days and I know they are coming. I have a firm commitment to myself and other potential investors and I prefer to show ahead of time rather than have to explain myself why I need an extra week. You and I are supposed to be the experts and this is the time to capitalize on the slack."

Recorded for the record. Surgery proceeding.


---

## Day 16 PM continuation — F161 discovered + fixed, F157 GCF backend surfaced

Path B chosen earlier in session (push F154 surgery forward tonight). Surgery preparation surfaced two new findings BIGGER than F154 itself. F161 is now fixed and verified. F157 is a new audit gap deferred to Phase 2 work.

### F161 (CRITICAL, NEW + RESOLVED) — Express middleware order broke ALL Stripe webhook signature verification

Discovered during F154 surgery preparation. Stripe CLI was paired and used to forward a test event to localhost. The existing checkout.session.completed handler in stripe.js rejected the event with: "Webhook payload must be provided as a string or a Buffer instance representing the raw request body. Payload was provided as a parsed JavaScript object instead."

Root cause: line 34 of index.js had `app.use(express.json({ limit: 10mb }));` running as global middleware before the Stripe webhook route mounted. By the time express.raw({type: application/json}) inside stripe.js tried to act, req.body was already a parsed JS object, not a Buffer. Stripe signs raw HTTP body bytes; signature verification failed because the original signed material was no longer accessible.

Production impact: every Stripe webhook for the past N weeks has been failing signature verification and returning 400. Stripe has been retrying them, eventually giving up. The single working handler (checkout.session.completed) was rejecting every event. Subscriptions activated only via the in-memory paidSessions Set + /verify-session retrieve fallback (which works because it does not go through the webhook signature path).

Fix applied: middleware-level URL exception in index.js. Replaces the single-line global JSON parser with a 7-line conditional that skips JSON parsing for /api/stripe/webhook and applies it normally to all other routes. Industry-standard Stripe + Express integration pattern, documented in Stripe official docs.

Diff applied (conceptually):
- OLD: app.use(express.json({ limit: 10mb }));
- NEW: app.use((req, res, next) => req.originalUrl === /api/stripe/webhook ? next() : express.json({ limit: 10mb })(req, res, next));

Verification: server restarted with override STRIPE_WEBHOOK_SECRET set to Stripe CLI session signing secret. stripe trigger checkout.session.completed fired through the listener tunnel. Listener tab showed [200] POST for all 7 fixture events (product.created, price.created, payment_intent.created, customer.created, payment_intent.succeeded, charge.succeeded, checkout.session.completed). Server tab showed [Stripe] Paid log line confirming the existing handler executed correctly. End-to-end webhook signature verification confirmed working.

Backups taken before fix:
- /Users/jorgereyes/xjobs-final/server/index.js.bak.f161.20260505_232720
- /Users/jorgereyes/xjobs-final/server/index.js.bak.20260505_220625 (from earlier env-check work)

Status: RESOLVED. Pre-pilot blocker for billing eliminated.

### F157 (CRITICAL, NEW) — Split-brain Stripe webhook routing across two backends

Discovered during F154 surgery prep when stripe webhook_endpoints list was run via CLI in test mode and live mode. XJobsFinder live billing is split across two webhook backends:

Backend 1 (Railway / Express server, the one we have been auditing):
- URL: https://xjobs-final-production.up.railway.app/api/stripe/webhook
- Status: enabled (live mode)
- API version: 2020-08-27 (5 years old)
- Subscribed events: checkout.session.completed, customer.subscription.created, customer.source.updated, customer.card.updated, customer.bank_account.updated, customer.subscription.deleted, invoice.payment_succeeded
- Code lives in server/stripe.js
- F161 fix applies here

Backend 2 (Google Cloud Functions, completely orthogonal codebase):
- URL: https://us-central1-xjobs-app.cloudfunctions.net/apidev/payment/webhook
- Status: live mode = enabled, test mode = auto-disabled by Stripe after 9 days of HTTP 500 failures (93 failed requests)
- Subscribed events: payment_intent.succeeded, payment_intent.payment_failed
- Code location: UNKNOWN. Not in this repo. Different project: xjobs-app on GCF (separate from xjobs-final on Railway).
- Created January 2025 per timestamp evidence

Implications:
- Two parallel backends are receiving real customer Stripe events right now in production
- Subscription lifecycle events go to Railway; payment-intent events go to GCF
- Neither backend is wrong on its own, but the split is undocumented and creates ambiguity about source of truth for billing state
- Auto-disabled test-mode endpoint suggests the GCF backend has been broken for at least 9 days; if live mode has the same bug, payment intent processing on the live side may also be broken, just not auto-disabled yet because it has not crossed Stripe failure-rate thresholds
- The Spanish description on the live mode endpoint ("Punto de conexión en live mode") suggests a previous developer or earlier project iteration set this up

Resolution path (deferred to Phase 2):
1. Locate GCF source code (separate repo? archived? local-only?)
2. Audit GCF webhook handler — confirm what events it processes, what state it writes
3. Decide architecture: revive GCF, retire GCF (move payment_intent handling to Express), or rebuild from scratch
4. If retiring GCF: subscribe payment_intent.* events to Railway endpoint, add handlers in stripe.js, remove GCF endpoint from Stripe Dashboard
5. Test consolidation end-to-end before tagging v2.0-baseline

### F158 (CRITICAL, NEW) — Six unhandled webhook event types on Railway

Stripe is sending 7 event types to the Railway endpoint. Code in stripe.js only handles checkout.session.completed. The other 6 (customer.subscription.created, customer.source.updated, customer.card.updated, customer.bank_account.updated, customer.subscription.deleted, invoice.payment_succeeded) arrive at the webhook, return 200 (post-F161 fix), and produce no DB or business-logic effect.

Customer impact:
- customer.subscription.deleted: cancellations are not synced to users.subscription_status. Customers who cancel still show as active indefinitely.
- invoice.payment_succeeded: renewal confirmations are not recorded. No record of which subscription cycles paid successfully.
- customer.source/card/bank_account.updated: payment method changes are not tracked. Lower-priority, but customers expect their billing settings to reflect current state.

This was the original F154 framing; it has now been refined. F154 is "delete dead subscription.js + lift its handlers into stripe.js" — F158 is the receiving outcome (handlers active in production webhook flow).

Resolution: scheduled for Day 17 (May 6) F154 surgery. Handlers exist in dead subscription.js — surgery is to lift them into stripe.js with reconciliation, then delete subscription.js.

### F154 — refined scope

Original Day 15 framing: "Two parallel Stripe webhook handlers, one wins at runtime, the other silently dead — billing-system equivalent of F15 cousin pattern."

After end-to-end inspection (Day 16 AM): "subscription.js is dead code (not mounted anywhere), contains the more sophisticated logic that the live stripe.js is missing."

After F161 discovery (Day 16 PM): "The dead-code surgical lift is meaningful only NOW that F161 is fixed. Pre-F161, every webhook was rejected at signature verification, so adding more handlers would have been useless. Post-F161, lifting handlers from subscription.js to stripe.js becomes a real customer-impact fix."

F154 sub-findings stand:
- F154a: subscription.js is dead code → delete after lift
- F154b: three different subscription_plan strings written by different paths → standardize to "pro" in Phase 2
- F154c: missing event handlers → resolved by lifting from subscription.js (Day 17 surgery)
- F154d: paidSessions in-memory Set → defer to Phase 3
- F154e: no DB transactions around webhook updates → defer to Phase 2 polish

Status: surgery deferred to Day 17 morning. F161 fix tonight gates the value of F154 surgery; both are necessary, F161 was first.

### Tools added this session

Stripe CLI v1.40.9 installed via direct binary (Homebrew install blocked by outdated macOS Command Line Tools — sidestepped via curl + tar + sudo mv to /usr/local/bin/stripe). CLI authenticated against the live Stripe account in test mode. Documentation captured in Documentation/Tools/stripe-cli.md.

Use cases established this session:
- stripe webhook_endpoints list — surfaced F157 (the GCF backend)
- stripe listen --forward-to localhost:4000/api/stripe/webhook — created session-scoped tunnel for testing
- stripe trigger checkout.session.completed — verified F161 fix end-to-end

End-of-session security TODO carried forward: rotate test mode API key (was visible in stripe config --list output earlier in session). Live mode key was only partially visible (prefix + last 4 chars), no rotation needed.

### Day 16 PM session summary

Findings opened: F148 (Layer 4, schema fiction), F149 (Layer 4, duplicate name columns), F150-F153 (Layer 4 medium/low cleanup), F154 (refined Layer 5, dead Stripe file), F155 (googleapis double-require), F156 (ElevenLabs raw fetch), F157 (GCF split-brain), F158 (six unhandled events), F159 (GCF unaudited), F160 (Stripe API version 5 years old), F161 (middleware order, FIXED).

Findings resolved: F129 + F130 + F147 (env-cluster, single fix), F143 (Resend domain comment, fixed earlier in week), F18 (7 empty interview stubs deleted), F161 (middleware order fixed).

Phase 1.5 deletions today: server/admin copy.js, server/email.js.bak, F18 stubs from app.html.

F5/F9/F11 deferred to Phase 2 with scope correction logged (UI duplicates contain divergent code, not safe deletes).

Phase 1: 100% complete. Phase 1.5: started, ~30% done. Phase 2 surgery (F154 lift + F157 GCF audit) queued for Day 17.


---

## Day 17 AM — F154 surgery COMPLETE

F154 resolved end-to-end. Surgery sequence:

1. Backup taken: server/stripe.js.bak.f154.20260506_044117
2. Patch applied to stripe.js webhook block — added 4 new handlers BEFORE the existing checkout.session.completed handler:
   - customer.subscription.deleted: sets subscription_status to free, clears subscription_id, subscription_plan, subscription_ends_at
   - customer.subscription.updated: syncs subscription_status (active/past_due/canceled based on cancel_at_period_end + sub.status) and subscription_ends_at
   - invoice.payment_failed: flags subscription_status as past_due
   - customer.source/card/bank_account.updated: no-op acknowledgment, logs only

3. Each handler tested individually via stripe trigger and verified in server logs:
   - stripe trigger customer.subscription.deleted -> [Stripe] Subscription canceled for customer: cus_xxx
   - stripe trigger customer.subscription.updated -> [Stripe] Subscription updated for customer: cus_xxx status: active
   - stripe trigger invoice.payment_failed -> [Stripe] Payment failed, marked past_due for customer: cus_xxx
   - stripe trigger customer.source.updated -> [Stripe] Payment method updated: customer.source.updated — acknowledged, no DB write

4. server/subscription.js deleted. Backup retained at subscription.js.bak.20260505_231636.

5. Server restart confirmed nothing in the codebase required subscription.js. Boots clean.

6. Final baseline smoke test: stripe trigger checkout.session.completed -> [Stripe] Paid line confirmed. No regression on existing handler.

### Customer impact (active in code, ships to prod with v2.0-baseline on May 7)

- Cancellations now sync to DB (users.subscription_status set to free)
- Renewals and plan changes sync
- Failed payments flag past_due
- Payment method updates acknowledged

### Sub-finding status

- F154a (subscription.js dead code): RESOLVED
- F154b (three different subscription_plan strings): NOT addressed in this surgery — Phase 2 polish task
- F154c (missing event handlers): RESOLVED
- F154d (paidSessions in-memory Set): NOT addressed — Phase 3 polish
- F154e (no DB transactions around webhook updates): NOT addressed — Phase 2 polish

### F158 (six unhandled events) status

5 of 6 events now have handlers. The 6th, invoice.payment_succeeded, currently returns 200 with no log line (silent acknowledgment). Renewal billing reconciliation can be added as a small follow-up if needed; not blocking pilot.

### Outstanding from F154 cluster

F157 (GCF backend split-brain) — separate audit task, deferred. Day 17 AM-2.


---

## F162 (HIGH, NEW) — Infrastructure documentation gap

Surfaced during F157 GCF backend hunt on Day 17 AM. The Google Cloud Functions backend at xjobs-app project, us-central1, /apidev/payment/webhook has been receiving real customer Stripe payment_intent events in production for over a year (created Jan 30, 2025). It is currently the only handler for payment_intent.succeeded and payment_intent.payment_failed events in live mode.

Documentation search for this infrastructure component returned only the audit file itself:
  grep -rln "xjobs-app|cloudfunctions|us-central1" Documentation/ -> only phase1audit.md matches

Translation: aside from the entry written into phase1audit.md last night when we surfaced F157, no prior architecture doc, infrastructure registry, vendor record, or system map mentions the GCF backend. A new engineer, new agent, or six-months-future founder would have no way to know it exists without looking at the Stripe dashboard.

This is a Phase 1 audit gap. The Day 13 audit was scoped to the local codebase (5 architectural layers: browser, frontend, server, data, vendors). It did not include deployed infrastructure inventory. The codebase audit reasonably caught the 5 vendors with code-level dependencies (Anthropic, ElevenLabs, Google, Stripe, Resend). It did not catch the 6th component: a parallel Cloud Functions deployment processing real customer billing events.

### Resolution path

Two items, sequenced:

1. (Today, AM-2 continuation): locate the GCF source code. Audit it. Decide consolidation path (revive on GCF, retire and route events to Express, or rebuild from scratch). Same plan as F157 — F162 is the documentation finding, F157 is the technical finding. Both close together.

2. (Phase 4 or earlier if time permits): create an Infrastructure Registry document. Lists every deployed component beyond the obvious (Railway hosting, Postgres, Cloudflare DNS): also vendors, OAuth integrations, deployed serverless functions, scheduled jobs, third-party webhooks, monitoring agents, anything with an externally-routable URL or scheduled trigger. For each: name, owner, source code location, last-known-good state, runtime dependencies. Goal: any future audit can verify reality against documentation in one pass.

### Why this matters for pilot

Pilot launch ships against a v2.0-baseline that is supposed to represent the canonical clean state of the project. If F162 is unresolved at pilot, that baseline silently excludes a billing-critical component. Any future incident touching payment_intent events could surface a backend nobody on the team knows about, has access to, or can fix. Pilot resilience requires this gap closed, even if the GCF code itself ends up retired or rebuilt.


---

## F157, F158, F162 — severity reframed (Day 17 AM)

Founder confirmed during F157 GCF investigation: project has been operating exclusively in Stripe test mode (sandbox) since inception. No real customer payments have ever flowed through any Stripe webhook in this codebase or the GCF backend. Pilot launch (May 26, 2026) is the FIRST time live-mode billing will be active.

Implications for findings opened Day 16 PM:

F157 (split-brain Stripe routing) — severity dropped from CRITICAL to MEDIUM. The Stripe live-mode dashboard shows two webhook endpoints (Railway + GCF), both enabled. But because no live-mode payments have ever been processed, the split-brain routing has had zero customer impact. The architectural finding still holds — at pilot launch the live-mode config must be clean, single source of truth — but it is no longer "real customer money may be affected."

F158 (six unhandled Railway webhook events) — severity dropped from CRITICAL to MEDIUM. The events arriving at Railway have been TEST mode events (from CLI testing or test-mode customer simulations). No real customer subscription state was being silently dropped. Resolved by F154 surgery this morning, regardless.

F162 (infrastructure documentation gap) — severity HOLDS at HIGH. The lesson is independent of whether money flowed: undocumented production infrastructure exists. Pre-pilot it must be inventoried.

### Pre-pilot live-mode billing prep (added to roadmap)

Before May 26 pilot launch, live-mode Stripe configuration must be cleaned:

1. Remove the GCF webhook endpoint from Stripe live-mode (https://us-central1-xjobs-app.cloudfunctions.net/apidev/payment/webhook). It points at infrastructure not owned by jorgenoelreyes@gmail.com Google account and currently returns HTTP 500s.
2. Verify the Railway live-mode webhook endpoint is configured with the F154-surgery handler set (matching test-mode coverage).
3. Subscribe payment_intent.* events to the Railway endpoint if needed (currently only subscription/checkout events are subscribed in live mode).
4. Document the final live-mode config in an Infrastructure Registry doc (Phase 4 deliverable).

Estimated timeline: 1 hour. Schedule for Day 18 (May 7) alongside v2.0-baseline tag, OR Day 25 (May 25) as final pre-pilot cleanup. Whichever the founder prefers.

### Key lesson logged

The GCP project xjobs-app is not visible in jorgenoelreyes@gmail.com Google Cloud account. It was likely created under a different account — possibly a previous developer, a forgotten Google Workspace account, or "cousin." Locating who owns it is non-blocking for pilot (test-mode only operation continues until May 26), but should be resolved before live-mode webhook cleanup.


---

## Day 17 PM — F15 surgery COMPLETE (rename approach)

F15 resolved via rename (Path A) rather than deletion. Reasoning: static analysis showed the line 4872 finalizeResume was already silently overridden by the line 5950 version at runtime (JavaScript function hoisting + last-definition-wins). Only one call site exists in the codebase (line 5934 setTimeout), and it would always have been calling the line 5950 version. So line 4872 was definitively dead code. However, both functions are 68 vs 75 lines of distinctly different code paths — line 4872 reads from window.onboardingData and loops through job history (Path 1 multi-step builder), line 5950 reads from builderResumeData and runs ATS scoring (Path 2 template builder). This is paradigm-split code, not duplicate code. F15 is therefore the same finding as F7 (three resume-builder paradigms not yet consolidated), surfaced at the function level.

Surgery:
1. Backup taken: app.html.bak.f15.20260506_051257
2. Line 4872 renamed: function finalizeResume() -> function finalizeResumeOrphan(). Inline comment added flagging the rename + reason.
3. Verified line 5934 (the only call site) and line 5950 (the winning function) untouched.
4. Behavioral guarantee: zero runtime change. Line 5950 won before the rename; line 5950 wins after.

Why rename rather than delete:
- Reversible (deletion is not as easily reversible if we later discover the abandonment was undeserved)
- Doesn't force the F7 paradigm decision (which is queued for Days 19-21 polish)
- Surfaces the orphan visibly to any future reader so cleanup is easier when F7 decision is made
- Closes F15's "silent override" smoking-gun pattern without product implications

What this finding actually reveals:
The cousin-pattern hypothesis (someone fixed finalizeResume in one place, didn't realize a second copy existed, second copy silently overrode the fix) was partially correct. But the deeper truth is that finalizeResume was never one function with two versions — it was two completely different functions belonging to two different resume-builder paradigms that happened to share a name. The "silent override" was JavaScript hoisting + last-definition-wins ratifying a paradigm-abandonment that nobody documented. F7 paradigm consolidation work will surface which path actually shipped to production users and which one was abandoned.

Outstanding F15-adjacent items:
- F14 (selectAlternativeTemplate + viewAlternativeTemplates each defined twice): same Phase 2 pattern, similar surgery applies
- F16 (avaCapitalizeName defined twice, 6 lines apart): likely real duplicate, smaller surgery
- F17 (getInterviewPrice defined twice, lines 9154 + 11854): MEDIUM, scope unclear

These remain on the Phase 2 list. Estimated 30-60 minutes total for the three combined.


---

## Day 17 PM — F16 surgery COMPLETE (delete approach)

F16 (avaCapitalizeName defined twice at lines 6615 + 6621, 6 lines apart) resolved via deletion of the first copy.

Diff confirmed byte-identical functions — pure copy-paste accident, not paradigm split. Deletion was safe because the second copy (line 6621) was what JavaScript runtime used anyway (function hoisting + last-definition-wins). Removing the first copy removes only dead code.

Surgery:
1. Backup taken: app.html.bak.f15.20260506_051257 (covers both F15 and F16, same session)
2. Lines 6615-6620 deleted via Python script with strict expected-content check
3. Verified: single avaCapitalizeName definition now exists, at line 6615 (shifted up 6 lines after deletion)

Confidence: 99%. Byte-identical duplicates carry no behavioral risk on deletion.

Time elapsed: ~10 minutes. F15 (rename) + F16 (delete) together in ~25 minutes of focused work.


---

## Day 17 PM — F17 reframed (NOT a duplicate-function bug)

F17 (getInterviewPrice defined twice, lines 9148 + 11849) investigated and found NOT to be a duplicate-function code-cleanup task. It is a real product finding: two different pricing models coexist for interview coaching.

The two models:

Line 9148 (global): returns pricingTiers[userSalaryTier].interviewPrice
- Salary-tier pricing (basic / professional / executive based on userSalaryTier)
- Vestigial of an earlier pricing strategy

Line 11849 (nested inside interview gap-selection function): returns 5 / 10 / 15 based on selected gap count (1-3, 4-6, 7+)
- Gap-count pricing — matches backend subscription.js pricing dispatch
- CANONICAL per founder decision (Day 17 PM): "we started thinking we could charge $5 for the first 3 gaps, increase another $5 for the next 3 and finally do an all you can eat for $15"

Both are actively called:
- 9148 (global) called at lines 1775, 12060, 13302, 13925 (outside the parent function of the nested version)
- 11849 (nested) called at lines 11858, 11934 (inside its parent function only)

The 11849 version is NOT a duplicate of 9148. It is a legitimately scoped local override returning a completely different result (gap-count vs salary-tier).

Implication: users in the interview gap-selection flow see gap-count pricing. Users in OTHER UI contexts (data-price-interview elements at line 1775, resume-display screens, general feature pricing dispatch in featurePriceMap at line 13925) see salary-tier pricing. Backend charges gap-count. **Therefore: there is likely a visible price-display inconsistency in the product right now.** Some screens may show one price, the gap-selection screen shows another, and the actual charge matches the gap-count model.

Resolution path (canonical model now confirmed):

1. ✅ Pricing strategy decided: gap-count ($5 / $10 / $15)
2. Migration: replace all 4 global call sites with the gap-count function logic
3. Hoist the gap-count function from nested scope to global scope (rename to getInterviewBundlePrice or similar)
4. Delete the now-orphaned salary-tier global

Estimated effort post-decision: 30-60 minutes. Was a 1-2 hour task before the product call clarified canonical pricing.

F17 status: severity HIGH (visible user-facing pricing inconsistency). Resolution: scoped, deferred to a separate clean session. NOT today.


---

## Day 18 AM — Phase 2.5 regression audit (in progress)

### Sprint commit audits — 4 of 15 complete

#### #1 (be6f790, SECURITY: remove .env fallback in /gmail/oauth2callback) — VERIFIED INTACT
HEAD api.js line 443 has the security fix in place. Anonymous Gmail flow correctly logs warning and skips token save. No regression.

#### #2 (7772dc8, OAuth callback env var) — VERIFIED INTACT
HEAD auth.js lines 306 + 330 use `process.env.GOOGLE_CALLBACK_URL || <localhost-fallback>` pattern in both call sites. No regression.

#### #3 (8e63647, null guard in launchDashboard) — VERIFIED INTACT + REGRESSION FOUND
The original fix at HEAD line 10257 is intact. However, audit surfaced TWO additional unguarded call sites of renderATSCompliance(resumeData.atsScore) at lines 9523 and 10638 that need the same defensive treatment. Logged as F163 below.

### F163 (NEW, MEDIUM, RESOLVED) — Two unguarded renderATSCompliance call sites surfaced by Phase 2.5

The April 29 fix (commit 8e63647) added a null guard at the launchDashboard call site to prevent crashes when first-time Gmail users reach the dashboard before uploading a resume (resumeData is null). The same defensive pattern was needed at two other call sites that escaped the original fix:

- Line 9523: dashboard render path (after loadUserBarUsage). resumeData could be null in same first-time Gmail flow that the original fix addressed. Real regression risk.
- Line 10638: setupSalaryTierDetection input event handler. resumeData likely non-null in practice but no guarantee in async event handlers; defensive guard appropriate.

Surgery applied:

Both lines transformed from:
  renderATSCompliance(resumeData.atsScore);
to:
  if(resumeData && resumeData.atsScore != null) renderATSCompliance(resumeData.atsScore);

Backup taken: app.html.bak.f163.20260506_111437

Verification: grep confirms 3 of 4 call sites now guarded (line 13641 uses a different variable newScore in a different code path, legitimately not part of F163 scope).

#### Audit cadence note

Phase 2.5 has surfaced 1 active regression in 3 commits audited. If the rate holds across the remaining 12 commits, expect ~3-4 more findings over the next 90 minutes of audit work.


---

## Day 18 PM — F13 inspection update (deletion deferred)

### F13 reframed — duplicate region is larger than originally catalogued

Day 13 audit recorded F13 as a "7-line exact duplicate" at lines 14914-14920 vs 14953-14959. Day 18 PM byte-level verification confirmed those 7 lines are byte-identical (diff returned empty). However, surrounding code inspection revealed the duplicate region extends well beyond the 7 lines:

- `updateSplashStep` function defined twice (the original 7-line catch, body of the function)
- `hideGmailSplash` function defined twice in the same region
- `showGmailSplash` function defined twice in the same region

The full extent of the duplicated region is approximately lines 14952 through 14975+ (exact end-of-duplicate boundary not yet determined). What looked like a 7-line zombie is more likely a 20+ line zombie covering 3 splash-related function definitions in sequence.

### Why deletion is deferred

Same lesson the audit logged for F5 on Day 16: surgical assumptions need byte-level verification before cutting. F13's audit catch was correct but understated the scope. Comprehensive deletion of the whole duplicate region requires careful boundary identification — exactly the careful-reading task that's degraded at end-of-day session fatigue.

Deletion deferred to Day 19 (fresh head) or Day 21 polish window. Plan:
1. Identify exact start + end lines of the second duplicate block
2. Verify by byte-level diff that the entire block is byte-identical to the first occurrence
3. Confirm no unique code is sandwiched inside (i.e., the duplicate block has no "second-copy-only" content)
4. Backup, delete, grep verify, commit

### Phase 1.5 progress unchanged

F13 status: still PENDING. Phase 1.5 progress remains at ~50% (no surgery completed Day 18 PM beyond F163 catch which falls under Phase 2.5).

### Lesson — second time we've underestimated a Phase 1.5 finding

Day 16: F5 ("safe delete") turned out to have divergent code paths — deferred. F9, F11 retroactively flagged as same pattern.
Day 18: F13 ("7-line duplicate") turned out to be 20+ line duplicate — deferred.

Pattern: Day 13 audit catalogued the surface signal (duplicate ID, byte-identical block) but did not always inspect the full extent or surrounding context. Phase 1.5 surgical work needs an extra pre-surgery investigation step — not just "is this a duplicate?" but "what is the FULL extent of the duplicate region, and is there any non-duplicate code mixed in?"

Carry-forward Phase 1.5 candidates (all need re-inspection before surgery):
- F1 — quota display 3rd duplicate (suspected, not confirmed)
- F5 — templatesModal divergent code paths (deferred to Phase 2)
- F7 — competing resume-builder paradigm (suspected major zombie)
- F9 — jobSourceSelector (likely F5 pattern)
- F10 — ATS-check duplication (suspected)
- F11 — regWallError (likely F5 pattern)
- F13 — splash functions duplicate region (boundaries to be determined)


---

## Day 18 PM (continued — late session) — F13 reframed AGAIN

### Third inspection round confirms F13 is F5-pattern, not safe-delete

Day 13 audit: F13 logged as "7-line exact duplicate, splash progress."
Day 18 PM (earlier): inspection revealed duplicate region extends to ~3 functions, deferred for fresh-head boundary identification.
Day 18 PM (late session): boundary identification proceeded with the cleaner-energy approach. Sanity-checked Python patch caught a line-counting error and aborted before any damage. Re-investigation with precise grep revealed the duplicate region contains THREE function definitions, not two: showGmailSplash, updateSplashStep, hideGmailSplash — all duplicated.

Critical finding: byte-level diff between the two showGmailSplash definitions (lines 14893 vs 14932) returned NON-EMPTY output. The two copies are NOT byte-identical:

- First copy (line 14893+): uses `+ 'string'` syntax (operator at START of next line)
- Second copy (line 14932+): uses `'string' +` syntax (operator at END of previous line)

Same runtime behavior, different code style. This is the F5 pattern — two stylistic versions of the same function coexisting. Whichever appears later in the file wins at runtime via JS hoisting + last-definition-wins. Deleting the second copy means silently reverting to the first version's syntax, possibly throwing away an intentional edit.

The other two duplicates (updateSplashStep at 14913 vs 14952, hideGmailSplash at 14926 vs 14965) ARE byte-identical. But because they live inside the same duplicate region as the divergent showGmailSplash, they cannot be cleanly excised without strategic decision on which entire trio is canonical.

### F13 reclassified — deletion deferred to Phase 2 cohort

F13 moves from "byte-identical zombie, surgical delete" to F5-pattern: divergent code paths requiring code archaeology to identify canonical version. Bundle with F5/F9/F11 Phase 2 cohort.

Phase 1.5 progress unchanged at 50%. No surgery completed.

### Lesson — Day 13 audit catalog systematically understated complexity

Three confirmed instances:
- F5 (Day 16): "safe delete" → divergent code paths discovered → deferred
- F11/F9 (Day 16, by extension): same pattern suspected, deferred
- F13 (Day 18): "7-line exact duplicate" → 3-function region with one divergent definition → deferred

Pattern: Day 13 audit identified surface signal (duplicate IDs, byte-identical small blocks) but did not always inspect:
1. Full extent of duplicate region (F13)
2. Surrounding function definitions in same region (F13)
3. Whether all duplicates within a region are truly byte-identical (F13)
4. Whether duplicates contain divergent code that produces identical runtime output (F5, F13)

### Implication for remaining Phase 1.5 candidates

Every finding in the carry-forward list (F1, F5, F7, F9, F10, F11, F13) now requires the same multi-step pre-surgery investigation:

1. Grep all occurrences of the duplicated symbol/ID across the file
2. Identify exact line ranges of each occurrence
3. Byte-level diff between every pair of occurrences
4. If any pair returns non-empty diff: reclassify as F5-pattern, defer
5. If all pairs are byte-identical: examine surrounding context for unique code mixed in
6. Only proceed to surgical deletion if all checks pass

Estimated time per finding: 15-25 minutes of investigation BEFORE any deletion.

### What survived this round of inspection

The discipline did. Three sanity-check abort points worked exactly as designed:
- Sanity check on the Python patch caught the line-counting error (no file modified)
- Byte-level diff check caught the F5-pattern in showGmailSplash (no deletion attempted)
- Audit pattern recognition mapped this to F5 instead of pushing through (no commit made)

Phase 2.5 audit framework + Phase 1.5 surgical discipline proved themselves. Slow is fast.


---

## Day 18 PM (continued) — Late-night session 22:45 ET → 01:30 ET (May 6 → May 7)

Session continuation after Maria's handoff at 22:44. Founder-led JWT investigation. Resolved that thread plus one MEDIUM/HIGH side-finding plus one diagnostic false-alarm worth recording for the methodology.

Session opener: founder requested an audit-of-the-audit on Maria's handoff before any code work. Audit-of-audit produced four findings worth carrying forward (F179 fix shipped on a diagnosis later invalidated; F82/F83 latent JWT findings sitting open since Day 15; pre-pilot CRITICAL security stack accumulating with discovery outpacing closure; meta-finding that Day 13 audit catalog has systematically understated complexity three times).

### F181 (NEW, CRITICAL, RESOLVED) — Recurring Gmail OAuth identity bypass — server-side fix supersedes all frontend click-site patches

Diagnosis sequence:

1. F82 hypothesis (JWT_SECRET fallback mismatch) inspected. Confirmed real divergence — auth.js line 11 falls back to 'dev-secret', auth-middleware.js line 4 falls back to 'xjobs_dev_secret', api.js inlines 'dev-secret' seven times. Real inconsistency but latent in production (Railway env var is set, all fallbacks bypassed). Not the active bug.

2. F83 hypothesis (payload mismatch between two jwt.sign sites) inspected. Both signers carry id and email. Payloads differ (auth.js adds name, auth-middleware.js adds role) but neither omits the field the Gmail flow needs. Real divergence, not the active bug.

3. Active bug located at server/api.js line 625 (the /gmail/authorize route handler). The route reads userId from query string with empty-string fallback. No JWT verification. No auth middleware. Trusted blindly. Six call sites in app.html each manually pass ?userId=<X>; missing one click site's parameter silently produces orphan tokens. Maria's F179 fix patched line 10745. Five other sites still vulnerable.

Surgery applied (server-side fix that supersedes any frontend patch):

server/api.js line 625 — replaced 15-line block with 35-line JWT-verifying version. Route now reads JWT from ?token= query param. Verified via jwt.verify(token, process.env.JWT_SECRET || 'dev-secret'). userId derived from decoded.id. Three failure modes redirect cleanly: missing token to /?error=login_required, missing id to /?error=session_invalid, invalid token to /?error=session_expired.

app.html — five frontend call sites updated via single sed pass to send ?token=<JWT> instead of ?userId=<id>:
- Line 3185: connectGmail() function — localStorage.getItem('xjobs_token')
- Line 10113: startGmailOAuth() function — localStorage.getItem('xjobs_token')
- Line 10747: popup window.open from Gmail step 4 — localStorage.getItem('xjobs_token')
- Line 13145: login flow that just received data.token — data.token (in scope)
- Line 14103: setTimeout-chained pending Gmail connect — localStorage.getItem('xjobs_token')

Line 13077 left untouched — comment, not executable code.

Backups taken: server/api.js.bak.gmailauth.20260506_231525, app.html.bak.gmailauth.20260506_231525, app.html.bak.f171.20260507_002245, server/api.js.bak.WORKING_VERIFIED.20260507_010346, app.html.bak.WORKING_VERIFIED.20260507_010346. Git tag pre-jwt-f171-push placed on origin/main as instant-rollback point.

Verification: Server boots clean. Gmail connect shows Google account picker (not cold sign-in form). OAuth completes, returns to app. gmail_tokens table receives row with correct user_id and token_len > 0.

Confidence: High — 95%. Fix moves trust boundary from client to server. Cannot recur from frontend drift because server refuses to issue OAuth URLs for unauthenticated requests, regardless of what any click site does.

What this finding closes: F179 (Maria's frontend-only fix at line 10745) — superseded; piece of F74 family (the route had no auth check).

What it leaves open: F82 (JWT_SECRET fallback divergence) and F83 (generateToken payload divergence) — both latent, deferred post-pilot.

### F171 (RESOLVED) — Double Gmail scan on OAuth return

Logged in Maria's Day 18 handoff. The OAuth-return code path at app.html line 10835+ called fetchGmailJobs() to "preview" jobs, then conditionally called startGmailScan() which itself calls fetchGmailJobs() internally. Result: every Gmail connect produced two consecutive full-inbox scans, doubling Gmail API quota cost per user.

Confirmed via server log: "Gmail scan complete: 6 jobs from 50 emails" appearing twice consecutively for the same user_id with the same filter.

Surgery applied via Python script with strict expected-content match. Old code (39 lines, lines 10835-10873): setInterval waiting for token, then async IIFE doing fetchGmailJobs(), then conditional startGmailScan() in success branch AND a redundant boards-fallback branch that ALSO ended with startGmailScan(). Two paths to scan #2.

New code (14 lines): setInterval waiting for token, then directly call startGmailScan(). Lets the canonical entry point handle its own splash UI, fetch, boards fallback, matching, and TTS.

Net change: -25 lines.

Verification: server log on next connect showed ONE "Gmail scan complete" line. Browser dashboard rendered with varied match scores (85%, 80%, 80%, 63%) and skill-by-skill green/red breakdowns intact.

### F180 (NEW, INVESTIGATIVE FINDING — methodology-relevant) — Stale app_*.html files in Downloads cause diagnostic false alarms

After F171 ship, founder's smoke test showed all jobs scoring 50% with 0/N skills breakdowns. Initial reaction: "matching is broken and has never failed before, please fix tonight."

Investigation: server-side audit ruled out the JWT fix and F171 fix as causes (diff showed 99 lines of changes were entirely in /gmail/authorize route + 5 frontend click sites + F171 OAuth-return block — none touch matching, scoring, or resume parsing). git diff origin/main -- app.html | wc -l returned 99, confirming local matching code byte-identical to production HEAD c015542. Founder ran prod smoke test and reported "broken in prod as well."

Then founder reloaded against the LOCAL server and got real working output: [DEBUG] scoreJob - requirements: 4 resumeSkills: 119 gaps: 2. Matching working perfectly with 119 skills extracted, varied gap counts, 3 jobs above 80% / 2 below.

Root cause: prior smoke test had been running against file:///Users/jorgereyes/Downloads/app_20.html — a copy of app.html from March 27, 2026. Six weeks old. Stale API contracts. Browser had loaded the old file via file:// instead of localhost:4000.

Investigation rabbit hole burned ~30 minutes. Total avoidable.

Methodology lesson: when symptom is "this used to work and now doesn't" and the diff against origin/main is small, first-line check is to confirm URL bar shows localhost:4000 or production URL, NOT a file:// path. Costs 2 seconds. Saves hours.

Cleanup recommendation: Day 19 housekeeping task — rm ~/Downloads/app_*.html.

### Process findings worth recording

The "fix at one click site" anti-pattern is now confirmed the dominant codebase bug-class. F5, F13, F163, F168, F179 have all manifested it. F181's resolution this session was different in kind: instead of patching the latest click site, the trust boundary moved from client to server, eliminating the recurrence vector entirely. Worth promoting "move the validation server-side" as the canonical answer for any future multi-call-site auth/identity finding.

Diagnostic discipline held under fatigue. Session ran 6 hours past Maria's 22:44 handoff close. Two near-misses where founder pushed for fast-and-wrong (push without testing in prod first; chase the matching bug at 1am) were redirected to slower-but-correct path. The redirect cost 5 minutes of pushback and saved at least 60 minutes of avoidable work.

Audit-of-audit produces real findings. The session opener pass on Maria's handoff surfaced four signals all subsequently validated. Audit-of-audit is cheap and high-yield.

### Late-session governance work (post-fix)

After the code work was verified, session pivoted to creating two permanent governance artifacts at repo root:

- **AGENT-ONBOARDING.md** (256 lines) — the contract every future agent reads first. Codifies founder communication preferences, artifact conventions, the 5 recurring patterns to avoid, session opener/closer templates, the audit framework discipline, and Section 17 (production state to document each session — GitHub + Railway sub-fields).
- **HANDOFF.md** (292 lines) — LIVING handoff document. CURRENT STATE block at top (snapshot, overwritten each session). SESSION LOG below (append-only, newest at top). Each session entry includes GitHub action, Railway action, production verification, what shipped, what I got wrong, what I got right, open thread for next agent, files touched. Backfilled with last 6 sessions of work.

Why these were needed: founder has been re-explaining context to every new agent for 18+ days. Documentation-as-memory pattern fills the cross-session memory gap. Session-log "what I got wrong" entries are explicitly institutional memory, not performance review — agents document their own mistakes honestly so the next agent doesn't repeat them.


### F182 (NEW, MEDIUM, OPEN) — Incognito session leak across fresh windows

Day 19 PM smoke test surfaced this. Founder opened a "fresh" incognito window, navigated to production, and was authenticated as user 38 (created May 1, 6 days old). No login attempted; session was already active. Confirmed via DevTools: window.currentUser?.id returned 38.

Origin tracing: Maria's Day 18 PM handoff logged this as F177 with status "INVESTIGATE — initially flagged, then dismissed as tab confusion, then re-observed." Tonight's reproduction confirms it is NOT tab confusion. State persists somehow across incognito windows.

Most likely root cause: Chrome's multi-window-incognito state sharing. When multiple incognito windows are open simultaneously, they share localStorage and cookies. Closing one window does not clear state for the others. A "fresh" incognito window opened while another is still open inherits the existing session.

Less likely but possible: a long-lived cookie or URL parameter carrying JWT silently. Worth checking with truly-cold incognito (all incognito windows closed first, then Cmd+Shift+N) but founder did not need to re-test as the hypothesis was sufficient to explain the observed behavior.

Pilot impact: LOW. Testers will not be stacking incognito windows during real testing.

Post-pilot consideration: explicit /logout endpoint that clears server-side session AND issues a Set-Cookie max-age=0 for any cookies the app sets, so users have a clear "log out" path that doesn't depend on browser session lifecycle. Maria's audit already had F55 logged for "missing session housekeeping pattern" — this is the same architectural gap.

### F183 (NEW, CRITICAL, OPEN — pre-pilot blocker) — Resume state lost during Gmail OAuth round-trip

REVISED from initial framing. Original framing assumed "user reached dashboard without uploading a resume" was the underlying state. Founder pushed back: signup flow enforces resume upload, so that case shouldn't exist. Re-diagnosis with founder's lead produced the correct framing below.

Symptom (visible): results dashboard shows all jobs at 50% match, skills breakdowns 0/N across the board. Same visual as F180 (stale file) but with a different root cause.

Diagnosis (founder-led, agent confirmed via console + psql):
- Fresh user signs up → uploads resume → resume parse succeeds → window.resumeData populated in memory.
- User clicks Connect Gmail → OAuth round-trip (popup or redirect to Google → consent → callback to server → return to app).
- Somewhere in the OAuth round-trip, the resume reference is dropped. By the time the dashboard renders results, window.resumeData is null.
- Matching engine correctly defaults to 50% / 0/N because there is nothing to compare jobs against.
- DB symptom: user 38 (jorgenoelreyes+test1@gmail.com, May 1) shows has_resume_in_db = false despite a signup flow that enforces resume upload. Suggests either DB write failed silently OR resume was written to a different user_id than the one currently authenticated post-OAuth.

This is the same architectural pattern as F181: state crossing a trust boundary (client → server → Google → server → client) and not surviving the round-trip. F181 was about user identity; F183 is about resume reference. Same bug class, different payload.

Four possible mechanisms (to be disambiguated by upcoming console + psql diagnostic):

1. Full page reload after OAuth callback wipes window.resumeData (in-memory only, page doesn't re-fetch on init).
2. OAuth popup completion triggers a state refresh in the main window that resets resumeData.
3. Server-side session containing the resume reference rotates during OAuth, orphaning the resume.
4. Pre-OAuth signup creates a temporary user_id with the resume; OAuth callback creates a "real" user with the Gmail email; the resume is stored on the orphan user_id, the JWT now points to the new user.

Hypothesis 1 most common in single-page-app + OAuth patterns. Hypothesis 4 most consistent with user 38 having no DB resume despite a forced-upload flow.

Pilot impact: CRITICAL. Every fresh-user pilot day-1 flow is affected. Testers will sign up Friday May 8, complete resume upload, connect Gmail, and see "matching broken" output. They will report it as a bug. They will be correct. F183 has to be fixed before tester re-engagement.

Diagnostic plan (next session):
1. Fresh signup → resume upload. Console check: window.resumeData populated? psql check: resume row in DB attached to current user_id?
2. Click Connect Gmail. Complete OAuth. Land on results.
3. Console check: window.resumeData? localStorage state? user_id?
4. psql check: same user_id has resume row? OR different user_id created during OAuth?

Outcome of step 1 vs step 3 vs step 4 distinguishes the four hypotheses. Fix shape becomes obvious from there.

Status: priority Day 19 PM continued or Day 20 AM. Pre-pilot blocker. Audit-of-audit lesson: the original F183 framing assumed a state that should not exist (user reaching dashboard without resume); founder caught it; re-diagnosis produced the correct framing. Worth noting that initial agent framings can subtly mislead — founder's domain knowledge is the corrective.


### F184 (NEW, CRITICAL pre-pilot blocker, OPEN) — Signup flow does not enforce resume upload

Day 19 PM continued surfaced this. User 39 (jorge+pilot5@gmail.com, created May 8 00:42 UTC) reached the dashboard and matching results without ever uploading a resume. Has_resume_in_db = false confirmed via psql. Founder explicitly stated the intended flow: "you cannot go to the next page unless you create or upload a resume." Current production does not enforce this.

Likely cause: the gate that should block dashboard navigation when no resume is present is either missing, conditionally bypassed, or runs after dashboard render has already started.

Pilot impact: CRITICAL. Combined with F183 (localStorage activeResume not user-scoped), a fresh tester can sign up, skip resume upload, reach dashboard, see leftover stranger's resume data populating the UI. Worst-case privacy + worst-case UX.

Fix shape: hard gate at the dashboard entry point (or post-signup routing). User without DB resume row -> redirect to upload flow. No bypass.

Status: pre-pilot blocker. Day 20 work.

### F185 (NEW, MEDIUM, OPEN) — Dashboard renders with stale data when resume missing

Day 19 PM continued. When the system detects "resume not available" mid-flow, it surfaces the message but ALSO renders the dashboard with leftover localStorage data instead of blocking the user back to upload. Founder reported seeing the "resume not available" message both today and last night - the system has been correctly DETECTING the issue, but proceeding to render anyway.

Pilot impact: MEDIUM. Combined with F183 and F184, contributes to misleading UX. Standalone, it's a fail-safe that fails open.

Fix shape: when "resume not available" detected, halt dashboard render. Redirect to upload step. Do not let the error message be informational only.

Status: pre-pilot fix candidate. Bundled with F183/F184 surgery.

### Diagnostic plan for F183 (saved as Documentation/diag-resume-flow.js)

A monkey-patch instrumentation script has been saved to Documentation/diag-resume-flow.js. It hooks 7 functions in app.html (connectGmail, fetchGmailJobs, startGmailScan, startGmailOAuth, launchDashboard, autoFlow, runMatching) and logs breadcrumbs at ENTRY and EXIT of each. Each breadcrumb captures: window.resumeData state, window.currentUser id, localStorage activeResume email, timestamp delta from start.

Test procedure for tomorrow:
1. Quit Chrome completely (Cmd+Q).
2. Reopen Chrome, open fresh incognito window (Cmd+Shift+N).
3. Navigate to https://xjobs-final-production.up.railway.app, wait for landing page to load.
4. Open DevTools (Cmd+Option+I), click Console tab.
5. Open Documentation/diag-resume-flow.js in a text editor, copy entire contents.
6. Paste into DevTools Console (NOT Mac terminal). Press Enter.
7. Confirm output shows "hooked:" lines for all 7 functions plus the INIT breadcrumb.
8. Run the flow normally: sign up fresh user, upload real resume, click Connect Gmail, complete OAuth, land on results page.
9. Stop. Don't reload. Copy ALL the [BC ...] breadcrumb lines from the console.
10. The last breadcrumb showing resume present and the first showing it null/undef define the gap where the resume dies.

That gap is the surgical target for the F183 architectural fix.


### F186 (NEW, CRITICAL pre-pilot blocker, RESOLVED tonight) - Session housekeeping pattern - F177/F182 architectural fix

Day 19 PM continued (May 7 night session, ~21:30-23:50 ET).

This finding subsumes F177 and F182 (incognito session leak observations) and addresses the architectural defect logged on Day 14 as F55 (missing session housekeeping pattern). Tonight the symptoms manifested as: fresh incognito window auto-authenticated as a 6-day-old user without active login, prior user's resume data bleeding into new user's session, untagged localStorage caches surviving across signups.

Diagnosis (founder-led):
- localStorage held auth + user-scoped data with no automatic cleanup on logout, on auth failure, or on page-load init
- No single function existed to clear user-scoped state coherently
- refreshUserProfile() existed but was never called on page load - only after fresh login form submit
- Stale tokens from prior sessions auto-authenticated fresh window loads

Architectural fix shipped in three parts plus a verified hotfix:

Part A - clearAllUserState() function inserted before logout() at app.html line 14104. Allowlist-based: wipes all localStorage keys except explicit allowlist (theme, pref_*). Clears window.resumeData, module-scope resumeData, window.currentUser, jobPool, discoveredJobs, interviewJobs. Logs every wipe with reason and key list for diagnostic visibility.

Part B - Three integration points:
1. logout() rewritten to call clearAllUserState('user logout') then redirect to /
2. refreshUserProfile() auth-failure path: when /api/auth/me returns non-OK, wipe and bail
3. refreshUserProfile() no-token path: defensive wipe if localStorage non-empty but no token

Part B-1 (hotfix) - Page-load auto-init. Discovered during Part B testing that refreshUserProfile() was never called on page load with stale token, only after fresh login. Added DOMContentLoaded listener (with readyState gate matching line 14203's pattern) that fires refreshUserProfile() on every page load. Without this, the auth-failure auto-wipe never had a chance to execute on hard refresh.

Part C deferred to Day 20: user-scoping the 6 remaining untagged resume writes (activeResume, resumeData at lines 4557, 6486, 10597, 14544, 14553, 14562, 14571, 14580, 14853) and the 4 Gmail-related writes. With Parts A+B+B1 in place, the remaining bleed is narrow (requires a still-valid prior-user token to trigger), and Part B's auto-wipe clears it on the next page load anyway. Part C is a polish closing the last narrow window, not a critical pre-pilot blocker.

Verification (all on localhost, four console-tested scenarios):
- clearAllUserState('manual test') wipes 5 keys cleanly, returns array of wiped keys
- Logout button click triggers clearAllUserState('user logout'), localStorage becomes empty
- Bogus token + hard refresh: page-load auto-init calls refreshUserProfile, server returns 401, auto-wipe fires, localStorage becomes empty, lands on landing page
- Valid token + hard refresh: F183 v2 hydration fires alongside (resume hydrated both scopes, skills: 119)

Backups retained for rollback:
- server/api.js.bak.WORKING_VERIFIED.20260507_235104
- app.html.bak.WORKING_VERIFIED.20260507_235104
- Plus per-part backups: app.html.bak.partA.20260507_225825, app.html.bak.partB.20260507_230542, app.html.bak.partB1.20260507_234004

Pilot impact: HIGH POSITIVE. F177/F182 closed. Fresh-user pilot day-1 flow no longer inherits prior session state. The architectural defect logged Day 14 as F55 is finally addressed.

Methodology lessons:
- Diagnosis-before-fix worked. Founder pushed back on band-aid proposals (resume guard at matching trigger, modal asking user to upload). Forced re-framing to architectural cause: state lives in too many untagged places with no clean-on-init.
- Test gates between parts caught a real defect (Part B without Part B-1 didn't fire on hard refresh). Without per-part testing we'd have shipped Part B thinking it worked.
- Part C explicitly scoped and deferred rather than forced into tonight's session. Honest scope management.

### F183 v2 update - confirmed working in localhost

Earlier tonight's F183 surgery v1 was incomplete - only wrote to window.resumeData, missed the module-scope resumeData variable that scoreJob (line 11121) actually reads. v1 was rolled back via the .bak.f183.20260507_215545 backups. v2 corrected the scope issue, now writes to both window.resumeData AND module-scope resumeData, mirroring the existing pattern at line 14748.

Verified via DevTools console after fresh signup + resume upload + Gmail OAuth round-trip:
- [F183] resume hydrated (both scopes), skills: 119 (smoking gun confirmation in console)
- window.resumeData?.skills?.length === 119
- resumeData?.skills?.length === 119 (module scope)
- Match scores varied with multiple jobs above 80%

Server endpoint added: GET /api/resume/current at server/api.js line 1413. Returns the authenticated user's base resume from the resumes table.

F183 status: RESOLVED in localhost. Pending production push.


### F186 + F183 v2 production verification - May 7, 2026 ~23:55 ET

Pushed commit 30ac9a5 to origin/main. Railway auto-deploy succeeded.

Production smoke test on https://xjobs-final-production.up.railway.app:
- Quit Chrome completely, reopened fresh incognito
- Navigated to production URL
- Sign up flow as fresh user
- Resume upload completed successfully
- Click Connect Gmail -> Google account picker shown (F181 still working)
- OAuth completed -> single Gmail scan in server log (F171 still working)
- Results page rendered with real varied match scores
- Console confirmed: [housekeeping] page-load auto-init fired
- Console confirmed: [F183] resume hydrated (both scopes), skills: 119
- Dashboard showed multiple jobs above 80% with real green/red skill breakdowns

F183 v2: PRODUCTION VERIFIED.
F186 (Parts A + B + B-1): PRODUCTION VERIFIED.

Production HEAD post-deploy: 30ac9a5.
No rollback triggered. Two architectural fixes shipped successfully on Day 19.

The recurring family of bugs (F55, F177, F182, F183, parts of F184) traceable to "state lives in too many places with no clean-on-init pattern" is now closed at the architectural level. Part C (user-scoping the 6 remaining untagged resume writes + 4 gmail writes) deferred to Day 20 as a polish closing the last narrow window.


### Documentation consolidation decision — May 8, 2026 ~03:30 ET (Day 19 very-late close)

Founder identified document sprawl as a real risk at end of Day 19. Cumulative effect of producing one new doc per session was creating contradiction risk and update burden. 

Decision: consolidate to four canonical documents (AGENT-ONBOARDING.md, HANDOFF.md, phase1audit.md, SPRINT.md). Retire daily DAY-N-MASTER docx pattern in favor of SPRINT.md day cards. Weekly summary docx kept as the only recurring Word artifact. Document sprawl rule (one-in, one-out) codified in AGENT-ONBOARDING.md Section 19.

Rationale: the daily 60-minute master-doc generation was producing redundancy with HANDOFF.md and the dashboard. SPRINT.md replaces that with a 15-25 minute daily ritual and keeps the audit trail intact. Section 19 in AGENT-ONBOARDING.md codifies the streamlined ritual so future agents inherit the discipline.

No findings opened or closed by this decision. Pure governance / documentation infrastructure change.