大猫熊的小窝

写点什么好呢~

第一次投稿

感谢大哥们的AReaL和cmd各种奇思妙想的优化

和reviewers的沟通都结束了,下面的这位是花心思最多的hh

Overall Recommendation: 4: Weak accept: Technically solid paper that advances at least one sub-area of AI, with a contribution that others are likely to build on, but with some weaknesses that limit its impact (e.g., limited evaluation). Please use sparingly.

Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.

Compliance With LLM Reviewing Policy: Affirmed.

Code Of Conduct Acknowledgement: Affirmed.

Final Justification:

The DFS-based prefix tree traversal for RL training is a novel and well-implemented idea. However, its effectiveness is fundamentally tied to the prefix compression ratio of the workload. The evaluation is limited to τ²-bench, which has an unusually high compression rate (~9.4×) due to shared system prompts and multi-turn structure. The rebuttal confirmed that the GPU split comparison is fair, but also revealed that on more widely used RL benchmarks like GSM8K and AIME, where prefix sharing is minimal, the method provides much less training speedup or even slows down. This significantly narrows the practical scope, and I believe the practical importance of these high-prefix-sharing scenarios needs stronger justification. Overall, this is well-executed systems work, but with effectiveness limited to a class of high-prefix-sharing workloads. My recommendation is weak reject.


评论

《“第一次投稿”》 有 1 条评论

  1. 还得感谢 tau–bench,难得有这么长的system prompt

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注