Post from 2025-08-27 14:52:29
"Our safeguards work **more reliably** in **common**, **short** exchanges. We have learned over time that these safeguards can **sometimes** be **less reliable** in **long** interactions: as the back-and-forth grows, **parts** of the model’s safety training **may** degrade."