A RAG-grounded chat interface for interviewing synthetic research personas derived from qualitative study data, built using the PersonaCite methodology and grounded in an interview study of genomics researchers. Each persona responds only when relevant evidence exists.
The system constructs 7 synthetic evaluator personas across four groups — Biologists, Computational Biologists, Bioinformaticians, and Software Engineers — based on participant data from a qualitative interview study. When asked a question, each persona retrieves semantically similar quotes from its evidence pool and generates a response grounded in that evidence.
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your API key
Configure your LLM provider in .env:
OPENAI_API_KEY=your-key-here
# or
ANTHROPIC_API_KEY=your-key-here
LLM_PROVIDER=openai # or anthropic
python server.py
Open http://localhost:8000 in your browser. API docs are at http://localhost:8000/docs.
The pipeline scripts build the persona data and evidence store from source files:
| Script | Description |
|---|---|
step1_parse_personas.py |
Parse personas.xlsx → data/personas.json |
step2_parse_evidence.py |
Extract quotes and codes from interview transcripts |
step3_retrieve.py |
Embed evidence and build the retrieval index |
step4_build_prompts.py |
Construct system prompts and evidence blocks |
step4b_validate.py |
Filter retrieved evidence by relevance to the question |
step5_chat.py |
CLI chat interface |
step6_test.py |
Batch question testing |
step7_interview.py |
Structured interview runner |
step8_evaluators.py |
Define the 7 evaluator personas |
provider fieldstatic/, streaming responses via SSE