Topic: NVIDIA’s Wolf: World Summarization Framework Beats GPT-4V on Video Captioning by 55.6%