Idea: Define the smallest unit and then perform following analysis.

uneven attention/weight/significance

  • unit: token level
    • long-interactions: strong attention sink to initial tokens for long-interaction LLMs. reintroduce initial few tokens into sliding window improves long-generation.
    • reasoning: certain tokens play a pivotal role in driving the reasoning trajectory toward an incorrect outcome. (method: contrastive.md)