A Survey of Personalized Large Language Models: Progress and Future Directions

—— A systematic taxonomy of personalized LLMs across prompting, adaptation, alignment, evaluation, and future directions.

Authors Jiahong Liu¹, Zexuan Qiu¹, Zhongyang Li², Quanyu Dai², Wenhao Yu¹, Jieming Zhu², Minda Hu¹, Menglin Yang³, Tat-Seng Chua⁴, Irwin King¹

affiliations ¹CUHK ²Huawei ³HKUST(GZ) ⁴NUS

Paper arXiv:2502.11528, latest version updated on September 20, 2025

Paper (arXiv PDF) arXiv Page Resources BibTeX

Why personalization?

General LLMs know a lot, but they do not know you.

A general-purpose LLM is usually optimized for population-level usefulness. That is powerful, but it also creates a one-size-fits-all behavior: the same query tends to receive a generic answer even when users differ in taste, history, language style, goals, and constraints. Personalized Large Language Models (PLLMs) aim to move from this one-size-fits-all setting toward one-size-fits-one systems.

Comparison between general LLMs and personalized LLMs — Personalized LLMs adapt to different users instead of forcing diverse preferences into one shared response pattern.

What This Survey Organizes

Personalized Data

Profiles, relationships, historical dialogues, historical content, interactions, and preference signals.

Technical Levels

Input-level prompting, model-level adaptation, and objective-level alignment.

Evaluation Landscape

Benchmarks across extraction, abstraction, generalization, classification, generation, and recommendation.

The taxonomy

Three places to inject personalization

The paper frames PLLM methods around the personalization operation: how user-specific data is turned into behavior that changes a model's response. The key organizing idea is simple and useful. Personalization can happen before the model sees the input, inside the model through adapted parameters or modules, or in the learning objective that defines which responses are preferred.

Taxonomy framework for personalized large language models — The survey groups PLLM techniques into Personalized Prompting, Personalized Adaptation, and Personalized Alignment, while also tracking personalized data types, query types, and task families.

Input Level: Prompting

Keep the base LLM fixed. Build prompts, retrieved memories, soft prompts, or contrastive steering signals from user data.

Model Level: Adaptation

Change model behavior through PEFT modules, user embeddings, LoRA variants, MoE-style routing, or per-user adapters.

Objective Level: Alignment

Optimize or decode with user-specific preferences, reward models, model merging, ensembles, or test-time feedback.

What does a personalized query ask?

Extraction, abstraction, and generalization require different memory behavior

Not every personalized query is equally hard. Some questions ask the system to extract an explicit fact from a user's history. Others require abstraction, where the model must summarize or infer higher-level preferences. The hardest cases often require generalization: the model must use personal evidence plus external knowledge to produce a response that fits the user but is not directly stated in the history.

Examples of personalized data and query types — The survey distinguishes query types by how directly the answer can be grounded in user data and whether external knowledge is needed.

Path 1

Personalized Prompting: efficient, flexible, but bounded by context

Prompting-based methods put personalization around the frozen LLM. They are attractive because they are cheap to deploy, work well with black-box models, and can update memories without retraining the generator. The survey separates this family into four subtypes: profile-augmented prompting, retrieval-augmented prompting, soft-fused prompting, and contrastive prompting.

Personalized prompting methods — Prompting methods differ in how they transform user history into context: summaries, retrieved records, soft embeddings, or contrastive steering signals.

Method Family	How It Works	Strength	Main Risk
Profile-Augmented	Summarize user history into profile tokens.	Efficient and easy to cache.	Compression may lose useful details.
Retrieval-Augmented	Retrieve relevant memories and concatenate them with the query.	Good fit for long-term memory and explicit facts.	Retrieval can be noisy or expensive.
Soft-Fused	Encode user data into embeddings, prefixes, attention signals, or logits.	Captures semantic nuance beyond text summaries.	Less interpretable and often harder for black-box deployment.
Contrastive	Compare model states with and without personal context.	More controllable and interpretable.	Sensitive to steering scale and hyperparameters.

Path 2

Personalized Adaptation: deeper personalization with parameter trade-offs

Adaptation-based methods modify a small set of parameters or modules, often through PEFT. This makes them more capable than pure prompting when the target behavior is implicit, stylistic, or hard to express as retrieved text. The core design choice is whether all users share one personalized module or each user owns a separate module.

Personalized adaptation methods — Shared adapters improve scalability; per-user adapters improve isolation and personalization depth but introduce storage and training overhead.

Adaptation Strategy	Best For	Pros	Cons
One PEFT for All Users	Large-scale services with many users and limited adapter budget.	Parameter-efficient, scalable, easier to maintain.	May blur individual differences and depend heavily on user data encoding.
One PEFT Per User	High-stakes or private settings where user isolation matters.	Stronger personalization, better separation between users.	Higher storage, training, synchronization, and cold-start cost.
Collaborative or Federated PEFT	Settings that need both personalization and cross-user transfer.	Can share useful population-level signals without raw data sharing.	Must balance privacy leakage, communication cost, and robustness.

Path 3

Personalized Alignment: preferences are not universal

Generic alignment optimizes for broad human preferences. Personalized alignment asks a different question: what if users disagree about style, values, depth, risk tolerance, or decision behavior? The survey treats this as a preference modeling problem, often connected to multi-objective reward learning, decoding-time model combination, and test-time feedback.

Personalized alignment methods — Personalized alignment can happen through multi-objective RLHF, user-weighted model merging, or ensembles of specialized policies.

Alignment Route	Personalization Mechanism	Strength	Limitation
Training-Time Personalization	Use user-specific reward mixtures during policy optimization.	Strong personalization and efficient inference.	High training cost and less flexibility after training.
Decoding-Time Personalization	Merge or ensemble policies using user-specific weights at inference.	Flexible and can adapt without retraining the base model.	Extra storage and inference overhead.
Test-Time Feedback	Update prompts, personas, or reward signals from live interactions.	Promising for evolving user preferences.	Benchmarks and stability guarantees remain underdeveloped.

Evaluation

Personalization should be evaluated by data type, query type, and task

The benchmark landscape is broad because personalization itself is broad. Dialogue-based benchmarks often test memory extraction from conversation histories. Content-based benchmarks such as LaMP and LongLaMP test whether the model can incorporate a user's historical text. Preference-based benchmarks emphasize subjective alignment. Interaction-based benchmarks connect personalization to recommendation and user behavior modeling.

Benchmark Group	Personalized Data	Typical Query	Common Metrics
MemoryBank, PerLTQA, LoCoMo, LongMemEval, MMRC, IMPLEXCONV, MemBench	User profiles and dialogues	Mostly extraction, with some abstraction and generalization	LLM-E, F1, Recall, Human-E, Acc
LaMP, LongLaMP, PEFT-U, pGraphRAG, LaMP-QA, DPL, PERSONABench	Historical content	Abstraction and generalization	Acc, F1, MAE, RMSE, ROUGE, BLEU, METEOR, LLM-E
PRISM, PersonalLLM, ALOE, HiCUPID	Human preferences and personas	Preference-aware generation	BLEU, ROUGE-L, LLM-E, human evaluation
REGEN, PersonalWAB, RecBench+	User interactions and profiles	Recommendation or behavior-grounded generation	Acc, Precision, Recall, ROUGE-L, BLEU, SBERT

Future directions

The next frontier is memory that can remember, adapt, and evolve

The survey's forward-looking view is that PLLMs should not only retrieve user-specific facts, but also abstract from long-term evidence and evolve as users change. This creates a difficult trilemma: stronger personalization tends to need more computation or more private data; stronger privacy often limits cross-user transfer; and scalable deployment on edge devices makes both constraints sharper.

Future directions for personalized large language models — Future PLLMs should improve efficacy, efficiency, and trustworthiness across extraction, abstraction, generalization, and lifelong evolution.

Complex User Data

Move beyond text-only histories toward multi-source, graph-like, and multimodal user signals.

Edge Computing

Support lightweight personalization on phones and local devices through small models, quantization, and distillation.

Edge-Cloud Collaboration

Balance local privacy with cloud-scale capability while reducing synchronization cost.

Model Updates

Update user-specific modules when base LLMs change without retraining everything from scratch.

Lifelong Updating

Let personal memories change over time without catastrophic forgetting or stale preferences.

Takeaways

A practical reading map

Use prompting when personalization mainly means retrieving explicit facts or adding lightweight context.

Use adaptation when the model must internalize implicit style, preferences, and behavior patterns.

Use alignment when the key problem is subjective preference, value trade-offs, or dynamic feedback.

Evaluate carefully because success depends on data type, query type, task type, and whether the metric actually measures personalization rather than generic generation quality.

Citation

BibTeX

@article{liu2025survey,
  title={A Survey of Personalized Large Language Models: Progress and Future Directions},
  author={Liu, Jiahong and Qiu, Zexuan and Li, Zhongyang and Dai, Quanyu and Yu, Wenhao and Zhu, Jieming and Hu, Minda and Yang, Menglin and Chua, Tat-Seng and King, Irwin},
  journal={arXiv preprint arXiv:2502.11528},
  year={2025}
}