●builderEnterpriseClawBench and PlanBench-XL offer evaluation frameworks grounded in real workplace sessions and large tool ecosystems, useful for benchmarking your own agent systems.
●researcherQwen-AgentWorld and Wan-Streamer represent Alibaba's push into world models and streaming inference — worth tracking for architecture and training details.