How It Works

A Next.js 15 platform that transforms blood test PDFs into AI-driven health insights using Neon pgvector, Qwen embeddings, and Drizzle ORM.

Users upload blood test PDFs via a protected upload form, triggering a server action that stores files in Cloudflare R2 and parses them with Unstructured API. The parsed markers are inserted into the blood_markers table via Drizzle ORM, and Qwen generates 1024-dimensional embeddings for each marker and test summary, stored in Neon pgvector tables. For AI Q&A, Drizzle raw SQL queries retrieve relevant embeddings, perform cosine similarity searches, and feed context to QwenClient.chat for responses. Research paper discovery queries Semantic Scholar API based on abnormal markers, displaying results with summaries and links.

Key findings: 1024-dim Embedding vector dimensionality for semantic search (Qwen text-embedding-v4 model configuration); 7 Predefined clinical ratios with published thresholds (e.g., TG/HDL, NLR) (Domain-specific implementation in trajectory tracking); 4 API fallback layers for research paper retrieval (Semantic Scholar → OpenAlex → CrossRef → CORE) (Multi-source design in lib/semantic-scholar.ts); 3 Embedding granularity levels: test, marker, and condition (Multi-level embedding strategy in lib/embeddings.ts); O(log n) Query performance for vector similarity searches with pgvector indexes (PostgreSQL vector indexing for cosine similarity).

Technical Foundations

Next.js 15 App Router — Vercel (2024). Finding: Server-side rendering by default with React Server Components, enabling efficient data fetching and reduced client-side JavaScript. Relevance: Used for all pages in app/protected/ (e.g., blood-tests, appointments) with async data fetching, Suspense boundaries for loading states, and server actions like uploadBloodTest. [link]
Neon PostgreSQL + Drizzle ORM — Neon / Drizzle Team (2024). Finding: Serverless PostgreSQL with branching, autoscaling, and pgvector support, paired with a type-safe ORM for schema management. Relevance: Stores core tables like blood_tests, blood_markers, and appointments via Drizzle schema, with pgvector HNSW indexes for embedding similarity search and Cloudflare R2 for file storage. [link]
Qwen Embeddings — Alibaba Cloud (2024). Finding: text-embedding-v4 model generates 1024-dimensional vectors for semantic search and retrieval-augmented generation (RAG). Relevance: Powers embedding generation via QwenClient for test summaries (formatTestForEmbedding) and individual markers (formatMarkerForEmbedding), stored in blood_test_embeddings and blood_marker_embeddings tables. [link]
Better Auth — Better Auth (2024). Finding: Framework-agnostic authentication library with Drizzle adapter, email/password support, and Next.js cookie management. Relevance: Handles all authentication via lib/auth.ts with Drizzle adapter, providing server-side session checks via withAuth() and client-side auth via authClient hooks. [link]
Radix UI Themes — Radix UI (2024). Finding: Accessible component library with built-in dark theme support and primitive building blocks. Relevance: Provides UI components like Dialog, Dropdown, and Skeleton across pages (e.g., app/protected/blood-tests/page.tsx) and enables theme management via next-themes. [link]
Unstructured API — Unstructured (2024). Finding: Document parsing service that extracts structured data from PDFs and other file formats. Relevance: Parses uploaded blood test PDFs in the uploadBloodTest server action, converting them into marker data for insertion into the blood_markers table. [link]
Semantic Scholar API — Allen Institute for AI (2024). Finding: Academic search engine providing access to millions of research papers with metadata and summaries. Relevance: Used in lib/semantic-scholar.ts for research paper discovery, querying based on abnormal markers with fallbacks to OpenAlex, CrossRef, and CORE APIs. [link]
Cloudflare R2 — Cloudflare (2024). Finding: S3-compatible object storage with zero egress fees, used for storing blood test PDF uploads. Relevance: Replaces Supabase Storage for file uploads via lib/storage.ts using @aws-sdk/client-s3, with files stored in the healthcare-blood-tests bucket. [link]

Pipeline

PDF Upload and Parsing — Users upload blood test PDFs via the UploadForm component at /app/protected/blood-tests/upload-form, triggering the uploadBloodTest server action. This stores the file in Cloudflare R2, then calls the Unstructured API to parse the PDF into structured marker data. The parsed data is inserted into the blood_markers table via Drizzle ORM. Research basis: Unstructured Client for document parsing, Cloudflare R2 for file storage.
Embedding Generation and Storage — After parsing, the system generates embeddings using QwenClient: test-level embeddings via formatTestForEmbedding() and marker-level embeddings via formatMarkerForEmbedding(). These 1024-dimensional vectors are stored in the blood_test_embeddings and blood_marker_embeddings tables, enabling semantic search capabilities. Research basis: Qwen text-embedding-v4 model for vector generation, PostgreSQL for vector storage.
AI Health Q&A Retrieval — When a user asks a question, Drizzle raw SQL queries retrieve relevant embeddings from Neon using pgvector cosine similarity. A hybrid search combining FTS + vector similarity is performed on blood_marker_embeddings to find top-k relevant markers. This context is combined with the question and fed to QwenClient.chat() using the qwen-plus model. Research basis: Retrieval-Augmented Generation (RAG) pattern with vector similarity search.
Research Paper Discovery — Abnormal markers flagged in blood tests trigger queries to the Semantic Scholar API via lib/semantic-scholar.ts. The query is built from marker names, values, and flags, using bulk search with filters for year and citation count. Results are ranked by relevance and displayed in the ResearchSection component with TLDR summaries and PDF links. Research basis: Semantic Scholar API for academic paper retrieval, multi-source fallback design.
Trajectory Tracking and Alerts — The system calculates health trajectories by comparing 1024-dimensional embeddings across time using cosine similarity in the database. Velocity alerts are generated by computing per-day rate-of-change for each biomarker, and clinical ratios (e.g., TG/HDL, NLR) are evaluated against published thresholds to detect early trends. Research basis: Vector mathematics for pattern detection, clinical ratio integration.
Appointment Management — Users manage health appointments via the appointments module at /app/protected/appointments/page.tsx, using the AddAppointmentForm for creation and deleteAppointment server action for deletion. Data is stored in the appointments table via Drizzle ORM with fields like title, provider, and appointmentDate. Research basis: Drizzle ORM for type-safe queries, Next.js server actions for mutations.

System Architecture

The app uses a Next.js 15 App Router with server components by default for pages like /app/protected/blood-tests/, leveraging Suspense boundaries for loading states. Data flows through a monorepo structure with Turbopack bundling, integrating Neon PostgreSQL via Drizzle ORM for type-safe queries and Better Auth for authentication. AI components are built with a custom QwenClient for embeddings and chat, and external APIs like Semantic Scholar are used for research retrieval.

Database Design

Core tables include blood_tests (id, user_id, status), blood_markers (test_id, name, value, flag), and appointments (user_id, title, appointment_date), defined in Drizzle schema at lib/db/schema.ts. Vector tables like blood_test_embeddings and blood_marker_embeddings store 1024-dimensional pgvector embeddings for semantic search with HNSW indexes. All queries are scoped to the authenticated user via withAuth(), and indexes are applied for performance on user_id and test_date columns.

Security & Auth

Authentication is handled by Better Auth with email/password, using a Drizzle adapter in lib/auth.ts. Middleware checks session cookies for route protection, and each page/action calls withAuth() server-side, redirecting unauthenticated users to /auth/login. All queries are scoped to the authenticated userId, and server actions like uploadBloodTest include validation. Environment variables secure API keys for DashScope and other services.

Deployment & Infrastructure

The app is deployed on Vercel with Neon PostgreSQL for the database, Better Auth for authentication, and Cloudflare R2 for file storage. The monorepo structure allows standalone deployments. Drizzle Kit manages schema migrations via `drizzle-kit push`, and HNSW vector indexes are created via raw SQL for optimal embedding search performance.

AI Integration

AI capabilities are centered on Qwen models: text-embedding-v4 for generating 1024-dim vectors and qwen-plus for chat. Embeddings are created at test, marker, and condition levels via functions like formatTestForEmbedding(). RAG patterns enable health Q&A with cosine similarity searches on vector tables. Evaluation frameworks like promptfoo and Braintrust are used for LLM evaluation and experiment tracking.

UI/UX Design

The interface uses Radix UI Themes for components like Dialog and Skeleton, with Geist font and lucide-react icons. Pages feature status badges (done, error) and flag indicators (low, normal, high) in tabular displays. Progressive enhancement is achieved through Suspense and skeleton loaders, and theme management is handled by next-themes for dark/light mode support.