Domain-Specific LLM Reasoning with PEFT

Built a PEFT pipeline (BitFit + few-shot) for health insurance claim reasoning, fine-tuning <0.1% of parameters (bias terms only). Improved reasoning accuracy from 47% to 73% on clause-extracted scenarios.

Stack

LLMPEFTBitFitFew-Shot LearningAI

Links

🔒 Private repo

Notes

A study in how little you can actually train and still meaningfully shift behavior. BitFit touches only the bias terms - under a tenth of a percent of parameters - so this was as much a diagnostic as a fine-tune: is the base model close enough that a nudge gets us there. The unexpected finding was that domain coverage of the few-shot examples mattered far more than count; three well-chosen scenarios beat ten near-duplicates. I wouldn't ship BitFit alone in production, but it's now my first step before reaching for LoRA - it tells you whether the problem is reachable from where the base model already sits.