Personalized LLM arXiv 2025
Personalized LLM icon

A Survey of Personalized Large Language Models: Progress and Future Directions

—— A systematic taxonomy of personalized LLMs across prompting, adaptation, alignment, evaluation, and future directions.

Authors Jiahong Liu1, Zexuan Qiu1, Zhongyang Li2, Quanyu Dai2, Wenhao Yu1, Jieming Zhu2, Minda Hu1, Menglin Yang3, Tat-Seng Chua4, Irwin King1

affiliations 1CUHK   2Huawei   3HKUST(GZ)   4NUS

Paper arXiv:2502.11528, latest version updated on September 20, 2025

Why personalization?

General LLMs know a lot, but they do not know you.

A general-purpose LLM is usually optimized for population-level usefulness. That is powerful, but it also creates a one-size-fits-all behavior: the same query tends to receive a generic answer even when users differ in taste, history, language style, goals, and constraints. Personalized Large Language Models (PLLMs) aim to move from this one-size-fits-all setting toward one-size-fits-one systems.

Comparison between general LLMs and personalized LLMs
Personalized LLMs adapt to different users instead of forcing diverse preferences into one shared response pattern.

What This Survey Organizes

Personalized Data

Profiles, relationships, historical dialogues, historical content, interactions, and preference signals.

Technical Levels

Input-level prompting, model-level adaptation, and objective-level alignment.

Evaluation Landscape

Benchmarks across extraction, abstraction, generalization, classification, generation, and recommendation.

The taxonomy

Three places to inject personalization

The paper frames PLLM methods around the personalization operation: how user-specific data is turned into behavior that changes a model's response. The key organizing idea is simple and useful. Personalization can happen before the model sees the input, inside the model through adapted parameters or modules, or in the learning objective that defines which responses are preferred.

Taxonomy framework for personalized large language models
The survey groups PLLM techniques into Personalized Prompting, Personalized Adaptation, and Personalized Alignment, while also tracking personalized data types, query types, and task families.

Input Level: Prompting

Keep the base LLM fixed. Build prompts, retrieved memories, soft prompts, or contrastive steering signals from user data.

Model Level: Adaptation

Change model behavior through PEFT modules, user embeddings, LoRA variants, MoE-style routing, or per-user adapters.

Objective Level: Alignment

Optimize or decode with user-specific preferences, reward models, model merging, ensembles, or test-time feedback.

What does a personalized query ask?

Extraction, abstraction, and generalization require different memory behavior

Not every personalized query is equally hard. Some questions ask the system to extract an explicit fact from a user's history. Others require abstraction, where the model must summarize or infer higher-level preferences. The hardest cases often require generalization: the model must use personal evidence plus external knowledge to produce a response that fits the user but is not directly stated in the history.

Examples of personalized data and query types
The survey distinguishes query types by how directly the answer can be grounded in user data and whether external knowledge is needed.

Path 1

Personalized Prompting: efficient, flexible, but bounded by context

Prompting-based methods put personalization around the frozen LLM. They are attractive because they are cheap to deploy, work well with black-box models, and can update memories without retraining the generator. The survey separates this family into four subtypes: profile-augmented prompting, retrieval-augmented prompting, soft-fused prompting, and contrastive prompting.

Personalized prompting methods
Prompting methods differ in how they transform user history into context: summaries, retrieved records, soft embeddings, or contrastive steering signals.
Method Family How It Works Strength Main Risk
Profile-Augmented Summarize user history into profile tokens. Efficient and easy to cache. Compression may lose useful details.
Retrieval-Augmented Retrieve relevant memories and concatenate them with the query. Good fit for long-term memory and explicit facts. Retrieval can be noisy or expensive.
Soft-Fused Encode user data into embeddings, prefixes, attention signals, or logits. Captures semantic nuance beyond text summaries. Less interpretable and often harder for black-box deployment.
Contrastive Compare model states with and without personal context. More controllable and interpretable. Sensitive to steering scale and hyperparameters.

Path 2

Personalized Adaptation: deeper personalization with parameter trade-offs

Adaptation-based methods modify a small set of parameters or modules, often through PEFT. This makes them more capable than pure prompting when the target behavior is implicit, stylistic, or hard to express as retrieved text. The core design choice is whether all users share one personalized module or each user owns a separate module.

Personalized adaptation methods
Shared adapters improve scalability; per-user adapters improve isolation and personalization depth but introduce storage and training overhead.
Adaptation Strategy Best For Pros Cons
One PEFT for All Users Large-scale services with many users and limited adapter budget. Parameter-efficient, scalable, easier to maintain. May blur individual differences and depend heavily on user data encoding.
One PEFT Per User High-stakes or private settings where user isolation matters. Stronger personalization, better separation between users. Higher storage, training, synchronization, and cold-start cost.
Collaborative or Federated PEFT Settings that need both personalization and cross-user transfer. Can share useful population-level signals without raw data sharing. Must balance privacy leakage, communication cost, and robustness.

Path 3

Personalized Alignment: preferences are not universal

Generic alignment optimizes for broad human preferences. Personalized alignment asks a different question: what if users disagree about style, values, depth, risk tolerance, or decision behavior? The survey treats this as a preference modeling problem, often connected to multi-objective reward learning, decoding-time model combination, and test-time feedback.

Personalized alignment methods
Personalized alignment can happen through multi-objective RLHF, user-weighted model merging, or ensembles of specialized policies.
Alignment Route Personalization Mechanism Strength Limitation
Training-Time Personalization Use user-specific reward mixtures during policy optimization. Strong personalization and efficient inference. High training cost and less flexibility after training.
Decoding-Time Personalization Merge or ensemble policies using user-specific weights at inference. Flexible and can adapt without retraining the base model. Extra storage and inference overhead.
Test-Time Feedback Update prompts, personas, or reward signals from live interactions. Promising for evolving user preferences. Benchmarks and stability guarantees remain underdeveloped.

Evaluation

Personalization should be evaluated by data type, query type, and task

The benchmark landscape is broad because personalization itself is broad. Dialogue-based benchmarks often test memory extraction from conversation histories. Content-based benchmarks such as LaMP and LongLaMP test whether the model can incorporate a user's historical text. Preference-based benchmarks emphasize subjective alignment. Interaction-based benchmarks connect personalization to recommendation and user behavior modeling.

Benchmark Group Personalized Data Typical Query Common Metrics
MemoryBank, PerLTQA, LoCoMo, LongMemEval, MMRC, IMPLEXCONV, MemBench User profiles and dialogues Mostly extraction, with some abstraction and generalization LLM-E, F1, Recall, Human-E, Acc
LaMP, LongLaMP, PEFT-U, pGraphRAG, LaMP-QA, DPL, PERSONABench Historical content Abstraction and generalization Acc, F1, MAE, RMSE, ROUGE, BLEU, METEOR, LLM-E
PRISM, PersonalLLM, ALOE, HiCUPID Human preferences and personas Preference-aware generation BLEU, ROUGE-L, LLM-E, human evaluation
REGEN, PersonalWAB, RecBench+ User interactions and profiles Recommendation or behavior-grounded generation Acc, Precision, Recall, ROUGE-L, BLEU, SBERT

Future directions

The next frontier is memory that can remember, adapt, and evolve

The survey's forward-looking view is that PLLMs should not only retrieve user-specific facts, but also abstract from long-term evidence and evolve as users change. This creates a difficult trilemma: stronger personalization tends to need more computation or more private data; stronger privacy often limits cross-user transfer; and scalable deployment on edge devices makes both constraints sharper.

Future directions for personalized large language models
Future PLLMs should improve efficacy, efficiency, and trustworthiness across extraction, abstraction, generalization, and lifelong evolution.

Complex User Data

Move beyond text-only histories toward multi-source, graph-like, and multimodal user signals.

Edge Computing

Support lightweight personalization on phones and local devices through small models, quantization, and distillation.

Edge-Cloud Collaboration

Balance local privacy with cloud-scale capability while reducing synchronization cost.

Model Updates

Update user-specific modules when base LLMs change without retraining everything from scratch.

Lifelong Updating

Let personal memories change over time without catastrophic forgetting or stale preferences.

Takeaways

A practical reading map

Use prompting when personalization mainly means retrieving explicit facts or adding lightweight context.

Use adaptation when the model must internalize implicit style, preferences, and behavior patterns.

Use alignment when the key problem is subjective preference, value trade-offs, or dynamic feedback.

Evaluate carefully because success depends on data type, query type, task type, and whether the metric actually measures personalization rather than generic generation quality.

Citation

BibTeX
@article{liu2025survey,
  title={A Survey of Personalized Large Language Models: Progress and Future Directions},
  author={Liu, Jiahong and Qiu, Zexuan and Li, Zhongyang and Dai, Quanyu and Yu, Wenhao and Zhu, Jieming and Hu, Minda and Yang, Menglin and Chua, Tat-Seng and King, Irwin},
  journal={arXiv preprint arXiv:2502.11528},
  year={2025}
}