SDK FAQs
About these FAQs
Who is this for? Unique users with programmatic expertise using automations or AI modules with the SDK or Toolkit.
Scope: Focused on Unique’s SDK
SDK vs. Toolkit
What’s the difference between the SDK and the Toolkit?
SDK: Low‑level client library that maps closely to Unique’s Public APIs; maximum flexibility.
Link:
unique_sdk
Toolkit: Higher‑level helpers and guardrails built on top of the SDK to simplify common patterns (chat sessions, memory service, tool wiring, retries, rate limiting, etc.).
Link:
Client Challenge
When to use which: Start with Toolkit for faster dev and safer defaults. Drop down to the SDK when you need advanced control.
Why am I seeing timeouts when running multiple large requests in parallel?
Large inputs (e.g., 50K–100K tokens) combined with parallelism increase the chance that one or more calls hit a gateway timeout or provider throttle. Even if most calls succeed, a single slow outlier can delay your overall workflow.
Mitigations:
Control concurrency: Use an async/Task limiter (e.g., 3–4 requests at once).
Stagger dispatch: Add jitters (100–500ms) between launches to avoid synchronized bursts.
Chunk smarter: Summarize or filter before fanning out to reduce per‑call tokens.
Fail‑soft: Set a per‑task deadline; proceed without stragglers where acceptable.
Is a timeout the same as a rate‑limit error?
No. Timeouts are typically network/gateway/deadline related. Rate‑limit errors are explicit responses from a model provider. However, logs may sometimes show ambiguous “upstream timeout” messages when rate‑limits are involved. Inspect error codes and headers; if in doubt, log both the HTTP status and provider error type.
Concurrency & rate limiting
What’s the recommended pattern for parallel LLM calls?
Use async primitives to dispatch a bounded number of concurrent requests.
Apply a rate limiter (requests per second/minute) in addition to a concurrency cap.
Consider provider diversification (Azure OpenAI, Gemini, Light LLM) only if your use case tolerates non‑identical model behavior.
For batch jobs, consider work queues with retry and backoff.
Should I send requests to multiple model providers simultaneously?
It’s possible, but do so intentionally. Differences in output style/cost/latency may affect UX and post‑processing. If you diversify for resiliency, normalize responses (schemas) and track provider‑level SLAs.
Function calling & tools
Does the SDK support function calling (tool use)?
Yes. You can declare tools/functions and let the model select and call them. The SDK exposes helpers for registering tool schemas and handling tool invocations.
Best practices:
Keep tool schemas minimal and strongly typed.
Validate inputs server‑side before executing your tool.
Impose per‑tool timeouts and idempotency (safe retries).
LINK:
ai/tutorials/unique_basics/sdk_examples/chat_complete.py at main · Unique-AG/ai
Short‑term memory
Is there a short‑term memory feature?
Yes. Sessions/chats can retain recent context (short‑term memory) persisted in your project datastore. Limits are configurable per deployment. Memory is scoped to a chat/session ID and is not a substitute for long‑term knowledge bases.
Tips:
Use the same chat/session ID across turns to benefit from memory.
Set sensible character/turn limits to control cost and privacy.
Provide a way to reset or export session memory for compliance.
Common Errors
Gateway timeout (HTTP 504): An upstream proxy didn’t get a response in time (30s typical) this usually comes from the LLM so wait a few minutes and try again
Rate limit (HTTP 429): Too many requests to a provider (Azure OpenAI, Gemini, Light LLM) - retry later when provider has more capacity for requests
Author | @Zach Fong |
|---|
© 2025 Unique AG. All rights reserved. Privacy Policy – Terms of Service