How offline RL regularizations interfere with online training.
Why it works so well and where the limitations are.