Gemini undocumented limits for strongly typed schemas

We are building an AI agent platform. Our agents started with a handful of tools and everything worked great. As we scaled to ~90 tools (GitHub, Gmail, Hubspot, etc.) each with strongly typed JSON Schemas that enumerate valid values to guide the model we started getting:

INVALID_ARGUMENT

That’s it. No details. No hint about what’s wrong. You’d think enumerating valid values in your schema is the right thing to do. It’s what every API design guide tells you. Constrain the model’s output space, reduce hallucinations, get structured responses. But Gemini’s ANY mode has an undocumented ceiling. If you cross it your schemas get rejected with zero explanation. After some time of debugging, I minimized repro that isolates the problem. The trigger is the aggregate complexity of all enumerated values across all tools. Every enum: ["action_required", "deploy_failed"] field contributes to a hidden budget, and longer, compound strings exhaust it faster than short ones.

1-word values ("action", "pending"):         PASS ✓
2-word values ("action_required"):           FAIL ✗
3-word values ("action_required_review"):    FAIL ✗

The longer the values, the sooner you hit the limit. 4-word values fail at ~120 total. 1-word values pass at 400+. AUTO mode doesn’t seem to have this limit at all for the same schemas..

Strategies that work around this:

Move examples to parameter descriptions. Instead of enum: ["action_required", "changes_requested", ...], use type: string and describe valid values in the parameter description: Status value, e.g. action_required, changes_requested. The model gets the guidance without hitting the complexity budget.
Validate tool calls in your agent loop, not in the schema. If you own the execution loop, validate the tool calls the model produces against your schema before executing them. If a parameter value is invalid, feed it back: Invalid status 'foo'. Valid values are: action_required, changes_requested, .... The model self-corrects on the next turn.
Reserve strict enums for small, critical fields. Use enum where the value space is small and correctness is critical (e.g. enum: ["yes", "no"]). For fields with many valid values, descriptions + validation gives you the same correctness without the opaque failures.

If you’re using AUTO mode, you’re safe, at least from this particular edge case.

Similar issue occurs when across the schema cumulative limit of maxItems exceeds ~960.

Fields × maxItems =   Sum  │     ANY    AUTO
──────────────────────────────────────────────
Under budget (~950):
     1 ×      950 =   950  │  PASS ✓  PASS ✓
    10 ×       95 =   950  │  PASS ✓  PASS ✓
    50 ×       19 =   950  │  PASS ✓  PASS ✓

Over budget (~980):
     1 ×      980 =   980  │  FAIL ✗  FAIL ✗
    10 ×       98 =   980  │  FAIL ✗  FAIL ✗
    50 ×       20 =  1000  │  FAIL ✗  FAIL ✗

Links

GitHub Issue tracking enum problem #2133
Full enum repro notebook (open in colab, set API key, run all)
GitHub Issue tracking maxItems problem #2170
Full maxItems repro notebook