Extended thinking needed to travel end-to-end through the API,
runtime, and CLI so the client can request a thinking budget,
preserve streamed reasoning blocks, and present them in a
collapsed text-first form. The implementation keeps thinking
strictly opt-in, adds a session-local toggle, and reuses the
existing flag/slash-command/reporting surfaces instead of
introducing a new UI layer.
Constraint: Existing non-thinking text/tool flows had to remain backward compatible by default
Constraint: Terminal UX needed a lightweight collapsed representation rather than an interactive TUI widget
Rejected: Heuristic CLI-only parsing of reasoning text | brittle against structured stream payloads
Rejected: Expanded raw thinking output by default | too noisy for normal assistant responses
Confidence: medium
Scope-risk: moderate
Reversibility: clean
Directive: Keep thinking blocks structurally separate from answer text unless the upstream API contract changes
Tested: cargo fmt --all; cargo clippy --workspace --all-targets -- -D warnings; cargo test -q
Not-tested: Live upstream thinking payloads against the production API contract
This upgrades Rust session compaction so summaries carry more than a flat timeline. The compacted state now calls out recent user requests, pending work signals, key files, and the current work focus so resumed sessions retain stronger execution continuity.
The change stays deterministic and local while moving the compact output closer to session-memory style handoff value.
Constraint: Keep compaction local and deterministic rather than introducing API-side summarization
Constraint: Preserve the existing resumable system-summary mechanism and compact command flow
Rejected: Add a full session-memory background extractor now | larger runtime change than needed for this incremental parity pass
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep future compaction enrichments biased toward actionable state transfer, not just verbose recap
Tested: cargo fmt; cargo clippy --all-targets --all-features -- -D warnings; cargo test -q
Not-tested: Long real-world sessions with deeply nested tool/result payloads
This adds token and estimated cost reporting to runtime usage tracking and surfaces it in the CLI status and turn output. It also upgrades compaction summaries so users see a clearer resumable summary and token savings after /compact.
The verification path required cleaning existing workspace clippy and test friction in adjacent crates so cargo fmt, cargo clippy -D warnings, and cargo test succeed from the Rust workspace root in this repo state.
Constraint: Keep the change incremental and user-visible without a large CLI rewrite
Constraint: Verification must pass with cargo fmt, cargo clippy --all-targets --all-features -- -D warnings, and cargo test
Rejected: Implement a full model-pricing table now | would add more surface area than needed for this first UX slice
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: If pricing becomes model-specific later, keep the current estimate labeling explicit rather than implying exact billing
Tested: cargo fmt; cargo clippy --all-targets --all-features -- -D warnings; cargo test -q
Not-tested: Live Anthropic API interaction and real streaming terminal sessions