SDK FAQs

SDK FAQs

About these FAQs

  • Who is this for? Unique users with programmatic expertise using automations or AI modules with the SDK or Toolkit.

  • Scope: Focused on Unique’s SDK


SDK vs. Toolkit

What’s the difference between the SDK and the Toolkit?

  • SDK: Low‑level client library that maps closely to Unique’s Public APIs; maximum flexibility.

  • Toolkit: Higher‑level helpers and guardrails built on top of the SDK to simplify common patterns (chat sessions, memory service, tool wiring, retries, rate limiting, etc.).

When to use which: Start with Toolkit for faster dev and safer defaults. Drop down to the SDK when you need advanced control.


Why am I seeing timeouts when running multiple large requests in parallel?

Large inputs (e.g., 50K–100K tokens) combined with parallelism increase the chance that one or more calls hit a gateway timeout or provider throttle. Even if most calls succeed, a single slow outlier can delay your overall workflow.

Mitigations:

  • Control concurrency: Use an async/Task limiter (e.g., 3–4 requests at once).

  • Stagger dispatch: Add jitters (100–500ms) between launches to avoid synchronized bursts.

  • Chunk smarter: Summarize or filter before fanning out to reduce per‑call tokens.

  • Fail‑soft: Set a per‑task deadline; proceed without stragglers where acceptable.

Is a timeout the same as a rate‑limit error?

No. Timeouts are typically network/gateway/deadline related. Rate‑limit errors are explicit responses from a model provider. However, logs may sometimes show ambiguous “upstream timeout” messages when rate‑limits are involved. Inspect error codes and headers; if in doubt, log both the HTTP status and provider error type.


Concurrency & rate limiting

What’s the recommended pattern for parallel LLM calls?

  • Use async primitives to dispatch a bounded number of concurrent requests.

  • Apply a rate limiter (requests per second/minute) in addition to a concurrency cap.

  • Consider provider diversification (Azure OpenAI, Gemini, Light LLM) only if your use case tolerates non‑identical model behavior.

  • For batch jobs, consider work queues with retry and backoff.

Should I send requests to multiple model providers simultaneously?

It’s possible, but do so intentionally. Differences in output style/cost/latency may affect UX and post‑processing. If you diversify for resiliency, normalize responses (schemas) and track provider‑level SLAs.


Function calling & tools

Does the SDK support function calling (tool use)?

Yes. You can declare tools/functions and let the model select and call them. The SDK exposes helpers for registering tool schemas and handling tool invocations.

Best practices:

  • Keep tool schemas minimal and strongly typed.

  • Validate inputs server‑side before executing your tool.

  • Impose per‑tool timeouts and idempotency (safe retries).

LINK: ai/tutorials/unique_basics/sdk_examples/chat_complete.py at main · Unique-AG/ai


Short‑term memory

Is there a short‑term memory feature?

Yes. Sessions/chats can retain recent context (short‑term memory) persisted in your project datastore. Limits are configurable per deployment. Memory is scoped to a chat/session ID and is not a substitute for long‑term knowledge bases.

Tips:

  • Use the same chat/session ID across turns to benefit from memory.

  • Set sensible character/turn limits to control cost and privacy.

  • Provide a way to reset or export session memory for compliance.


Common Errors

  • Gateway timeout (HTTP 504): An upstream proxy didn’t get a response in time (30s typical) this usually comes from the LLM so wait a few minutes and try again

  • Rate limit (HTTP 429): Too many requests to a provider (Azure OpenAI, Gemini, Light LLM) - retry later when provider has more capacity for requests


Author

@Zach Fong

 

© 2025 Unique AG. All rights reserved. Privacy PolicyTerms of Service