Textual Frequency Law (TFL) proposes preferring higher-frequency data for LLM training when meanings are identical. The authors introduce TFL, frequency distillation, and curriculum training methods, achieving up to 8% accuracy gains in math reasoning and BLEU improvements across 99/100 translation…
核心问题
Which data should be favored during LLM training and prompting when computational resources are limited, specifically whether higher-frequency paraphrases outperform lower-frequency ones when meanings are identical.
核心方法
{'approach': 'The authors construct a Textual Frequency Paired Dataset from GSM8K, FLORES-200, and CommonsenseQA using GPT-4o-mini to generate high and low-frequency paraphrases validated by human annotators. Sentence-level frequency is estimated using position-unaware multiplication of word-level Zipf frequencies from existing corpora. Experiments test both prompting and fine-tuning scenarios on closed-source (GPT-4o-mini) and open-source LLMs (DeepSeek-V3, Llama-3.3-70B-Instruct, qwen2.5-7b-instruct).', 'key_components': ['Paraphrasing is useful for evaluating language models, mitigating data contamination, and data augmentation.', 'Computational budgets for training and prompting are usually limited, raising questions about paraphrase selection.', 'Results suggest that high-frequency paraphrases should be preferred for both prompting and fine-tuning.'], 'section_ids': ['sec_3']}
论点验证
可复现性评估
较低可复现性 (0%)
缺失的复现细节
- Specific LLM models/architectures used in experiments
- Dataset details - what datasets were used, how they were sourced and processed
- How 'frequency' was defined and calculated for paraphrases
- Paraphrase generation methodology - how paraphrases were created or collected
- Hyperparameters for fine-tuning (learning rate, batch size, epochs, optimizer settings)
- Prompting configurations (prompt templates, number of shots, temperature, etc.)
- Training/fine-tuning procedures and implementation details
- Evaluation metrics and their exact implementation
- Hardware specifications and computational environment
- Random seeds and number of experimental runs
局限性(作者自述)
论文中未明确列出局限性。
本分析由 PDF 阅读助手 自动生成,仅供参考,不构成学术评审意见。验证结论和可复现性评估基于论文文本自动分析,可能存在偏差。原始论文请参阅 arXiv。
分析时间:2026-04-08T13:18:09+00:00 · 数据来源:Paper Collector