This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Alexander Doria
dorialexander.bsky.social
did:plc:vg3thtvfbgfrr3u6pf6hy3yk
In case it interests anyone, I managed to set up a demo of GRPO RL training in Colab. It’s an adaptation of Will Brown instant classic for math reasoning. Replace llama 1B with qwen 0.5b and inference with vllm. Full training in about 2 hours.
https://colab.research.google.com/drive/1bfhs1FMLW3FGa8ydvkOZyBNxLYOu0Hev?usp=sharing
2025-02-02T13:49:21.453Z