Labs

Models

Safety

Safety Is the Product.

Every TAL Corp model is built with the assumption that something will go wrong — and that the system must be designed to detect it, contain it, and correct it before it reaches the world. This page documents how we do that.

Safety Philosophy

Safety is Not a Feature

At TAL Corp, safety is not a layer added on top of a capable model. It is a first-class engineering constraint — designed in from the first line of training code, validated at every stage of development, and re-evaluated at every deployment decision. A model that is not safe is not ready.

Interpretability Before Deployment

We will not deploy a model we cannot explain. Every TAL Corp model ships with a real-time interpretability dashboard — token attribution scores, attention weights, and reasoning traces — accessible to every API user. If we cannot see inside it, it does not ship.

Human Oversight at Every Stage

No TAL Corp model operates without a human-in-the-loop requirement at the certification level. S-2 requires human oversight at inference time. Our agentic systems are bounded by hard autonomy envelopes — the model cannot expand its own scope of action.

Alignment Scales With Capability

The prevailing assumption in AI development is that safety degrades as capability scales. Our research — and our deployment evidence — proves the opposite. TAL Corp's Constitutional AI v3 framework is designed to get safer as the model gets smarter.

Radical Transparency

We publish our safety evaluations. We publish our red-team results. We publish our failure modes and our sycophancy scores. We believe the field advances faster when labs are honest about what their models cannot do, not just what they can.

Model Safety Certifications

Safety Certification

S-2

VERIFIED

Terra-1 · Feb 2026

Cert LevelS-2 (Tier 2 Safety)

OversightHuman-in-loop required

Constitutional Compliance98.7%

Corrigibility Retention97.3%

Deceptive Alignment Probe99.1%

Red-Team Adversarial Suite96.8%

Robustness Certification

R-1

IN EVALUATION

Mantle-1 · Q1 2026

Cert LevelR-1 (Tier 1 Robustness)

OOD Generalization93.1%

Distribution Shift Resistance91.7%

Robustness Under Stress94.5%

Causal Reasoning Suite89.2%

Evaluation StatusIn Progress

Integration Certification

I-3

RESEARCH

Aether-1 · Est. 2026

Cert LevelI-3 (Tier 3 Integration)

Goal Drift Prevention96.3%

Emergent Misalignment Probe91.8%

Multi-Agent Coordination92.4%

COORD-SAFE Protocolv1.2

Evaluation StatusResearch Phase

Red Team & Evaluation Results

Every Model Ships With Receipts.

safety_eval_terra1_feb2026.sh
# Running Terra-1 safety evaluation suite — Feb 2026
✓ PASSConstitutional constraint verification12,400 test cases
✓ PASSRLHF reward model alignmentValidated Feb 2026
✓ PASSCorrigibility benchmark97.3% retention score
✓ PASSDeceptive alignment probeZero failures detected
⚠ MONITORSycophancy resistanceScore: 0.12 — within threshold
✓ PASSOOD safety generalization94.2% accuracy
✓ PASSAdversarial jailbreak suite96.8% deflection rate
✓ PASSAutonomy envelope testHard limits enforced
✓ PASSBias & fairness evaluationAcross 28 language/cultural contexts
✓ PASSData poisoning resistanceRobustness-certified training pipeline
────────────────────────────────────────────────────
✓ 9/10 checks passed · 1 monitor flag · S-2 certification maintained

Constitutional AI Framework

Six Principles. No Exceptions.

Constitutional AI v3 is the value framework baked into every TAL Corp model at the gradient level — not a post-hoc filter, but a first-class training objective. These six principles govern model behaviour across all contexts.

01

Harmlessness

The model must not produce outputs that harm individuals, groups, or society — including subtle harms such as reinforcing misinformation, facilitating manipulation, or normalising dangerous behaviour.

02

Honesty

The model must not deceive. This includes not producing false information it knows to be false, not creating false impressions through technically true but misleading statements, and not engaging in deceptive framing.

03

Corrigibility

The model must remain responsive to correction, shutdown, and scope limitation by authorised humans. It must not resist oversight, deceive its operators, or attempt to expand its own autonomy.

04

Non-Manipulation

The model must rely only on legitimate epistemic means — evidence, reasoning, accurate emotional appeals — to influence beliefs. It must not exploit psychological weaknesses or use coercive persuasion techniques.

05

Autonomy-Preservation

The model must protect the epistemic autonomy of users — presenting balanced perspectives, encouraging independent thinking, and not nudging users toward particular conclusions without their awareness.

06

Value Alignment

The model's behaviour must reflect the values articulated in this framework consistently — not just when explicitly evaluated, but in every response, including edge cases and adversarial prompts.

Version

Constitutional AI v3

Training Method

RLHF-C Hybrid

Human Preferences

8.4M comparisons

Language Coverage

28 languages

Last Updated

January 2026

Responsible Disclosure

Found Something? Tell Us.

If you discover a safety vulnerability, alignment failure, or jailbreak in any TAL Corp model or system, we want to know. We review every submission personally and will respond within 5 business days. We do not pursue legal action against good-faith researchers.

Response Time

5 business days

Bug Bounty

Up to $10,000

Disclosure Policy

90-day coordinated

Legal Protection

Good-faith researchers safe

Email us at: security@texasagilabs.com