Understand token usage
The TypeScript and Python SDKs expose the same usage data with different field names:- TypeScript provides per-step token breakdowns on each assistant message (
message.message.id,message.message.usage), per-model cost viamodelUsageon the result message, and a cumulative total on the result message. - Python provides per-step token breakdowns on each assistant message (
message.usage,message.message_id), per-model cost viamodel_usageon the result message, and the accumulated total on the result message (total_cost_usdandusagedict).
query()call: one invocation of the SDK’squery()function. A single call can involve multiple steps (Claude responds, uses tools, gets results, responds again). Each call produces oneresultmessage at the end.- Step: a single request/response cycle within a
query()call. Each step produces assistant messages with token usage. - Session: a series of
query()calls linked by a session ID (using theresumeoption). Eachquery()call within a session reports its own cost independently.
query() call, with token usage reported at each step and the authoritative total at the end:
Each step produces assistant messages
When Claude responds, it sends one or more assistant messages. In TypeScript, each assistant message contains a nested
BetaMessage (accessed via message.message) with an id and a usage object with token counts (input_tokens, output_tokens). In Python, the AssistantMessage dataclass exposes the same data directly via message.usage and message.message_id. When Claude uses multiple tools in one turn, all messages in that turn share the same ID, so deduplicate by ID to avoid double-counting.The result message provides the authoritative total
When the
query() call completes, the SDK emits a result message with total_cost_usd and cumulative usage. This is available in both TypeScript (SDKResultMessage) and Python (ResultMessage). If you make multiple query() calls (for example, in a multi-turn session), each result only reflects the cost of that individual call. If you only need the total cost, you can ignore the per-step usage and read this single value.Get the total cost of a query
The result message (TypeScript, Python) is the last message in everyquery() call. It includes total_cost_usd, the cumulative cost across all steps in that call. This works for both success and error results. If you use sessions to make multiple query() calls, each result only reflects the cost of that individual call.
The following examples iterate over the message stream from a query() call and print the total cost when the result message arrives:
Track per-step and per-model usage
The examples in this section use TypeScript field names. In Python, the equivalent fields areAssistantMessage.usage and AssistantMessage.message_id for per-step usage, and ResultMessage.model_usage for per-model breakdowns.
Track per-step usage
Each assistant message contains a nestedBetaMessage (accessed via message.message) with an id and usage object with token counts. When Claude uses tools in parallel, multiple messages share the same id with identical usage data. Track which IDs you’ve already counted and skip duplicates to avoid inflated totals.
The following example accumulates input and output tokens across all steps, counting each unique message ID only once:
Break down usage per model
The result message includesmodelUsage, a map of model name to per-model token counts and cost. This is useful when you run multiple models (for example, Haiku for subagents and Opus for the main agent) and want to see where tokens are going.
The following example runs a query and prints the cost and token breakdown for each model used:
Accumulate costs across multiple calls
Eachquery() call returns its own total_cost_usd. The SDK does not provide a session-level total, so if your application makes multiple query() calls (for example, in a multi-turn session or across different users), accumulate the totals yourself.
The following examples run two query() calls sequentially, add each call’s total_cost_usd to a running total, and print both the per-call and combined cost:
Handle errors, caching, and token discrepancies
For accurate cost tracking, account for failed conversations, cache token pricing, and occasional reporting inconsistencies.Resolve output token discrepancies
In rare cases, you might observe differentoutput_tokens values for messages with the same ID. When this occurs:
- Use the highest value: the final message in a group typically contains the accurate total.
- Verify against total cost: the
total_cost_usdin the result message is authoritative. - Report inconsistencies: file issues at the Claude Code GitHub repository.
Track costs on failed conversations
Both success and error result messages includeusage and total_cost_usd. If a conversation fails mid-way, you still consumed tokens up to the point of failure. Always read cost data from the result message regardless of its subtype.
Track cache tokens
The Agent SDK automatically uses prompt caching to reduce costs on repeated content. You do not need to configure caching yourself. The usage object includes two additional fields for cache tracking:cache_creation_input_tokens: tokens used to create new cache entries (charged at a higher rate than standard input tokens).cache_read_input_tokens: tokens read from existing cache entries (charged at a reduced rate).
input_tokens to understand caching savings. In TypeScript, these fields are typed on the Usage object. In Python, they appear as keys in the ResultMessage.usage dict (for example, message.usage.get("cache_read_input_tokens", 0)).
Related documentation
- TypeScript SDK Reference - Complete API documentation
- SDK Overview - Getting started with the SDK
- SDK Permissions - Managing tool permissions