How It Works
8 peer-reviewed papers. 7 clinical ratios. One trajectory pipeline that turns blood test snapshots into a health story.
Users upload blood test PDFs via a protected upload form, triggering a server action that stores files in Cloudflare R2 and parses them with Unstructured API. The parsed markers are inserted into the blood_markers table via Drizzle ORM, and Qwen generates 1024-dimensional embeddings for each marker and test summary, stored in Neon pgvector tables. For AI Q&A, Drizzle raw SQL queries retrieve relevant embeddings, perform cosine similarity searches, and feed context to QwenClient.chat for responses. Research paper discovery queries Semantic Scholar API based on abnormal markers, displaying results with summaries and links.
Key findings: 1024-dim Embedding vector dimensionality for semantic search (Qwen text-embedding-v4 model configuration); 7 Predefined clinical ratios with published thresholds (e.g., TG/HDL, NLR) (Domain-specific implementation in trajectory tracking); 4 API fallback layers for research paper retrieval (Semantic Scholar → OpenAlex → CrossRef → CORE) (Multi-source design in lib/semantic-scholar.ts); 3 Embedding granularity levels: test, marker, and condition (Multi-level embedding strategy in lib/embeddings.ts); O(log n) Query performance for vector similarity searches with pgvector indexes (PostgreSQL vector indexing for cosine similarity).
Technical Foundations
- Next.js 15 App Router — Vercel (2024). Finding: Server-side rendering by default with React Server Components, enabling efficient data fetching and reduced client-side JavaScript. Relevance: Used for all pages in app/protected/ (e.g., blood-tests, appointments) with async data fetching, Suspense boundaries for loading states, and server actions like uploadBloodTest. [link]
- Neon PostgreSQL + Drizzle ORM — Neon / Drizzle Team (2024). Finding: Serverless PostgreSQL with branching, autoscaling, and pgvector support, paired with a type-safe ORM for schema management. Relevance: Stores core tables like blood_tests, blood_markers, and appointments via Drizzle schema, with pgvector HNSW indexes for embedding similarity search and Cloudflare R2 for file storage. [link]
- Qwen Embeddings — Alibaba Cloud (2024). Finding: text-embedding-v4 model generates 1024-dimensional vectors for semantic search and retrieval-augmented generation (RAG). Relevance: Powers embedding generation via QwenClient for test summaries (formatTestForEmbedding) and individual markers (formatMarkerForEmbedding), stored in blood_test_embeddings and blood_marker_embeddings tables. [link]
- Better Auth — Better Auth (2024). Finding: Framework-agnostic authentication library with Drizzle adapter, email/password support, and Next.js cookie management. Relevance: Handles all authentication via lib/auth.ts with Drizzle adapter, providing server-side session checks via withAuth() and client-side auth via authClient hooks. [link]
- Radix UI Themes — Radix UI (2024). Finding: Accessible component library with built-in dark theme support and primitive building blocks. Relevance: Provides UI components like Dialog, Dropdown, and Skeleton across pages (e.g., app/protected/blood-tests/page.tsx) and enables theme management via next-themes. [link]
- Unstructured API — Unstructured (2024). Finding: Document parsing service that extracts structured data from PDFs and other file formats. Relevance: Parses uploaded blood test PDFs in the uploadBloodTest server action, converting them into marker data for insertion into the blood_markers table. [link]
- Semantic Scholar API — Allen Institute for AI (2024). Finding: Academic search engine providing access to millions of research papers with metadata and summaries. Relevance: Used in lib/semantic-scholar.ts for research paper discovery, querying based on abnormal markers with fallbacks to OpenAlex, CrossRef, and CORE APIs. [link]
- Cloudflare R2 — Cloudflare (2024). Finding: S3-compatible object storage with zero egress fees, used for storing blood test PDF uploads. Relevance: Replaces Supabase Storage for file uploads via lib/storage.ts using @aws-sdk/client-s3, with files stored in the healthcare-blood-tests bucket. [link]
Pipeline
- PDF Upload and Parsing — Users upload blood test PDFs via the UploadForm component at /app/protected/blood-tests/upload-form, triggering the uploadBloodTest server action. This stores the file in Cloudflare R2, then calls the Unstructured API to parse the PDF into structured marker data. The parsed data is inserted into the blood_markers table via Drizzle ORM. Research basis: Unstructured Client for document parsing, Cloudflare R2 for file storage.
- Embedding Generation and Storage — After parsing, the system generates embeddings using QwenClient: test-level embeddings via formatTestForEmbedding() and marker-level embeddings via formatMarkerForEmbedding(). These 1024-dimensional vectors are stored in the blood_test_embeddings and blood_marker_embeddings tables, enabling semantic search capabilities. Research basis: Qwen text-embedding-v4 model for vector generation, PostgreSQL for vector storage.
- AI Health Q&A Retrieval — When a user asks a question, Drizzle raw SQL queries retrieve relevant embeddings from Neon using pgvector cosine similarity. A hybrid search combining FTS + vector similarity is performed on blood_marker_embeddings to find top-k relevant markers. This context is combined with the question and fed to QwenClient.chat() using the qwen-plus model. Research basis: Retrieval-Augmented Generation (RAG) pattern with vector similarity search.
- Research Paper Discovery — Abnormal markers flagged in blood tests trigger queries to the Semantic Scholar API via lib/semantic-scholar.ts. The query is built from marker names, values, and flags, using bulk search with filters for year and citation count. Results are ranked by relevance and displayed in the ResearchSection component with TLDR summaries and PDF links. Research basis: Semantic Scholar API for academic paper retrieval, multi-source fallback design.
- Trajectory Tracking and Alerts — The system calculates health trajectories by comparing 1024-dimensional embeddings across time using cosine similarity in the database. Velocity alerts are generated by computing per-day rate-of-change for each biomarker, and clinical ratios (e.g., TG/HDL, NLR) are evaluated against published thresholds to detect early trends. Research basis: Vector mathematics for pattern detection, clinical ratio integration.
- Appointment Management — Users manage health appointments via the appointments module at /app/protected/appointments/page.tsx, using the AddAppointmentForm for creation and deleteAppointment server action for deletion. Data is stored in the appointments table via Drizzle ORM with fields like title, provider, and appointmentDate. Research basis: Drizzle ORM for type-safe queries, Next.js server actions for mutations.