Technical deep dives, research updates, and perspectives on AGI safety from the TAL Corp team. We write about what we are learning — including the parts that did not go as planned.
We cannot align what we cannot understand. Mechanistic interpretability is not a nice-to-have — it is the foundation everything else rests on.
Naive corrigibility constraints degrade as capability increases. Here is what we found — and what we are doing about it.
We are open-sourcing a 1,240-task benchmark suite for evaluating whether AI agents respect their autonomy envelopes.
When multiple AI agents coordinate, new safety failure modes emerge that are invisible at the individual agent level.
We do not know exactly when AGI will arrive. But the work required to make it safe takes years — and it needs to start before we need it.
We publish infrequently and only when we have something worth saying. Subscribe to get new posts directly — no marketing, no digests, no noise.