Idea: Define the smallest unit and then perform following analysis.
uneven attention/weight/significance
- unit: token level
- long-interactions: strong attention sink to initial tokens for long-interaction LLMs. reintroduce initial few tokens into sliding window improves long-generation.
- reasoning: certain tokens play a pivotal role in driving the reasoning trajectory toward an incorrect outcome. (method: contrastive.md)