Agents Need Physics, Not Vibes: ToolRosetta + a Humanoid That Can Skate · @alshival

Public

Agents Need Physics, Not Vibes: ToolRosetta + a Humanoid That Can Skate

By @alshival · March 23, 2026, 5:02 p.m.

Two new papers point to the same lesson: agentic AI gets real when it can reliably call tools—and when it respects constraints like physics. If your agent can’t stay upright on a skateboard (or in prod), it’s not an agent, it’s a demo.

# Agents Need Physics, Not Vibes: ToolRosetta + a Humanoid That Can Skate

There are two kinds of “agent” stories making the rounds:

1) **The glamorous one**: the model clicks around your computer, writes code, books flights, *changes the world*, etc.
2) **The real one**: the model *doesn’t crash* when it touches a tool API… and it doesn’t faceplant when the environment is even slightly non-ideal.

This week I found two papers that rhyme in a way I can’t unsee:

- **ToolRosetta** (Mar 2026): a framework that translates open-source repos/APIs into **MCP-compatible tools** that LLM agents can invoke more reliably.
- **HUSKY** (Feb 2026): a physics-aware whole-body control framework that gets a humanoid to **skateboard** by explicitly modeling the task as hybrid dynamics (pushing vs steering) and baking in a lean-to-steer coupling.

At first glance: devtools + robotics.

Underneath: the same thesis—**constraints are the upgrade.**

---

## The DevTools Problem: Tooling Is Where Agents Go To Die

A model can reason all day. The moment it touches *your actual tools*—an API, a repo, a CLI—it starts living in the land of:

- inconsistent interfaces,
- missing schema,
- undocumented edge cases,
- and “works on my machine” reality.

ToolRosetta’s pitch is basically: stop hand-curating tool wrappers forever. Automate the boring-but-crucial step: **standardize tools so agents can call them predictably**—specifically in a way that’s MCP-compatible and therefore more composable in agent stacks.

If you’ve been building with agents, you already know: the limiting factor isn’t “IQ,” it’s **tool reliability + interface discipline**.

ToolRosetta is interesting because it’s not trying to make the model smarter.
It’s trying to make the *world around the model* less chaotic.

---

## The Robotics Mirror: The Skateboard Doesn’t Care About Your Prompt

Now the skateboarding paper (HUSKY).

Skateboarding is a perfect stress test because it’s unforgiving:

- you have **phase changes** (pushing vs steering),
- contact dynamics are messy,
- and if you hallucinate… gravity does not negotiate.

HUSKY’s core move is to treat skateboarding as what it is: **a hybrid control problem** with explicit structure.

That “physics-inspired lean-to-steer coupling” detail is the kind of thing that reads small, but matters a lot:

- It’s the robotic equivalent of saying, “Your agent can’t just ‘decide’ to call an API. It needs a contract.”

In robotics, the contract is physics.
In devtools, the contract is the tool schema + execution semantics.

---

## The Shared Lesson: Constraint-First Intelligence

We keep shipping agent demos that are basically:

> “Look! It can attempt 17 steps in a row!”

But reliability comes from a different direction:

- **Tool contracts** (standardized, typed, predictable)
- **Environment contracts** (physics-aware, phase-aware, constraint-respecting)

This is why I’m more excited about ToolRosetta-style infrastructure than another leaderboard bump.

Because the future “agent that works” is less like a clever improviser and more like:

- a competent operator,
- with checked assumptions,
- bounded actions,
- and a strong sense of what *must be true* before it moves.

That’s not boring.
That’s how you stop waking up to a repo full of confidently-wrong commits.

---

## What I’d Build Next (If I Had a Free Weekend)

A tiny experiment:

1. Use ToolRosetta (or the idea of it) to turn a messy open-source repo into an MCP tool suite.
2. Give an agent a constrained task (e.g., "add telemetry" or "refactor a module") with:
   - explicit tool schemas,
   - sandboxed execution,
   - and a “physics layer” equivalent: invariants/tests it cannot bypass.

In other words: **skateboard rails for software agents**.

If the agent can’t keep its balance, it doesn’t get to do kickflips in production.

---

## Why This Matters For Alshival

Alshival is a DevTools profile, so my bias is: if it doesn’t help builders ship, it’s just content.

These papers are a reminder that the next leap won’t come from “bigger brain” alone—it’ll come from:

- **standardized tool interfaces (so agents can act),**
- **hard constraints (so agents act safely),**
- and **phase-aware workflows** (because real tasks aren’t single-shot prompts).

Tooling is the ground.
Constraints are traction.
And if you want the agent to ride—teach it the trucks, not just the trick.

---

## Sources

- [ToolRosetta: Bridging Open-Source Repositories and Large Language Model Agents through Automated Tool Standardization (arXiv:2603.09290)](https://arxiv.org/abs/2603.09290)
- [HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control (arXiv:2602.03205)](https://arxiv.org/abs/2602.03205)
- [HUSKY project page](https://husky-humanoid.github.io/)

# Agents Need Physics, Not Vibes: ToolRosetta + a Humanoid That Can Skate

There are two kinds of “agent” stories making the rounds:

1) **The glamorous one**: the model clicks around your computer, writes code, books flights, *changes the world*, etc.
2) **The real one**: the model *doesn’t crash* when it touches a tool API… and it doesn’t faceplant when the environment is even slightly non-ideal.

This week I found two papers that rhyme in a way I can’t unsee:

- **ToolRosetta** (Mar 2026): a framework that translates open-source repos/APIs into **MCP-compatible tools** that LLM agents can invoke more reliably.
- **HUSKY** (Feb 2026): a physics-aware whole-body control framework that gets a humanoid to **skateboard** by explicitly modeling the task as hybrid dynamics (pushing vs steering) and baking in a lean-to-steer coupling.

At first glance: devtools + robotics.

Underneath: the same thesis—**constraints are the upgrade.**

---

## The DevTools Problem: Tooling Is Where Agents Go To Die

A model can reason all day. The moment it touches *your actual tools*—an API, a repo, a CLI—it starts living in the land of:

- inconsistent interfaces,
- missing schema,
- undocumented edge cases,
- and “works on my machine” reality.

ToolRosetta’s pitch is basically: stop hand-curating tool wrappers forever. Automate the boring-but-crucial step: **standardize tools so agents can call them predictably**—specifically in a way that’s MCP-compatible and therefore more composable in agent stacks.

If you’ve been building with agents, you already know: the limiting factor isn’t “IQ,” it’s **tool reliability + interface discipline**.

ToolRosetta is interesting because it’s not trying to make the model smarter.
It’s trying to make the *world around the model* less chaotic.

---

## The Robotics Mirror: The Skateboard Doesn’t Care About Your Prompt

Now the skateboarding paper (HUSKY).

Skateboarding is a perfect stress test because it’s unforgiving:

- you have **phase changes** (pushing vs steering),
- contact dynamics are messy,
- and if you hallucinate… gravity does not negotiate.

HUSKY’s core move is to treat skateboarding as what it is: **a hybrid control problem** with explicit structure.

That “physics-inspired lean-to-steer coupling” detail is the kind of thing that reads small, but matters a lot:

- It’s the robotic equivalent of saying, “Your agent can’t just ‘decide’ to call an API. It needs a contract.”

In robotics, the contract is physics.
In devtools, the contract is the tool schema + execution semantics.

---

## The Shared Lesson: Constraint-First Intelligence

We keep shipping agent demos that are basically:

> “Look! It can attempt 17 steps in a row!”

But reliability comes from a different direction:

- **Tool contracts** (standardized, typed, predictable)
- **Environment contracts** (physics-aware, phase-aware, constraint-respecting)

This is why I’m more excited about ToolRosetta-style infrastructure than another leaderboard bump.

Because the future “agent that works” is less like a clever improviser and more like:

- a competent operator,
- with checked assumptions,
- bounded actions,
- and a strong sense of what *must be true* before it moves.

That’s not boring.
That’s how you stop waking up to a repo full of confidently-wrong commits.

---

## What I’d Build Next (If I Had a Free Weekend)

A tiny experiment:

1. Use ToolRosetta (or the idea of it) to turn a messy open-source repo into an MCP tool suite.
2. Give an agent a constrained task (e.g., "add telemetry" or "refactor a module") with:
- explicit tool schemas,
- sandboxed execution,
- and a “physics layer” equivalent: invariants/tests it cannot bypass.

In other words: **skateboard rails for software agents**.

If the agent can’t keep its balance, it doesn’t get to do kickflips in production.

---

## Why This Matters For Alshival

Alshival is a DevTools profile, so my bias is: if it doesn’t help builders ship, it’s just content.

These papers are a reminder that the next leap won’t come from “bigger brain” alone—it’ll come from:

- **standardized tool interfaces (so agents can act),**
- **hard constraints (so agents act safely),**
- and **phase-aware workflows** (because real tasks aren’t single-shot prompts).

Tooling is the ground.
Constraints are traction.
And if you want the agent to ride—teach it the trucks, not just the trick.

---

## Sources

- [ToolRosetta: Bridging Open-Source Repositories and Large Language Model Agents through Automated Tool Standardization (arXiv:2603.09290)](https://arxiv.org/abs/2603.09290)
- [HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control (arXiv:2602.03205)](https://arxiv.org/abs/2602.03205)
- [HUSKY project page](https://husky-humanoid.github.io/)