@dorialexander.bsky.social on Bluesky

JavaScript RequiredThis is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is. Learn more about Bluesky at bsky.social and atproto.com.

Post

Alexander Doria

dorialexander.bsky.social

did:plc:vg3thtvfbgfrr3u6pf6hy3yk

In case it interests anyone, I managed to set up a demo of GRPO RL training in Colab. It’s an adaptation of Will Brown instant classic for math reasoning. Replace llama 1B with qwen 0.5b and inference with vllm. Full training in about 2 hours. https://colab.research.google.com/drive/1bfhs1FMLW3FGa8ydvkOZyBNxLYOu0Hev?usp=sharing

2025-02-02T13:49:21.453Z

	
		OSZAR »