Implementing Streaming Output with NixAPI: A Complete Guide to the Typewriter Effect
Learn how to use NixAPI's streaming API to display LLM responses token-by-token in real time. Covers Python, Node.js, and browser-side React implementations with full runnable code.
The “typewriter effect” you see in ChatGPT — where text appears character by character — is called streaming output. It uses the HTTP Server-Sent Events (SSE) protocol under the hood.
This guide walks you through the concept and implementation end-to-end.
Why Use Streaming?
A standard API call waits until the model has finished generating all content before returning a response. If the model is writing a 1,000-word article, you could be staring at a blank screen for 10–20 seconds.
Streaming pushes each token to the client as soon as it’s generated, so users see content appearing immediately — a dramatically better experience.
Standard mode: ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ → display all at once
Streaming mode: token → token → token → real-time display
Python Streaming
from openai import OpenAI
client = OpenAI(
api_key="your-NixAPI-key",
base_url="https://api.nixapi.com/v1",
)
# Enable streaming with stream=True
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a short poem about autumn"}],
stream=True, # ← key parameter
)
# Print each chunk as it arrives
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print() # newline at the end
Run this and you’ll see text appear one character at a time. flush=True prevents output buffering delays.
Node.js / TypeScript Streaming
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-NixAPI-key',
baseURL: 'https://api.nixapi.com/v1',
});
async function streamChat() {
const stream = client.chat.completions.stream({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Explain what a vector database is' }],
});
// Approach 1: async iterator
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content ?? '';
process.stdout.write(text);
}
// Get the final complete result
const finalCompletion = await stream.finalChatCompletion();
console.log('\nTotal tokens used:', finalCompletion.usage?.total_tokens);
}
streamChat();
Browser / React Frontend
import { useState } from 'react';
export default function StreamingChat() {
const [output, setOutput] = useState('');
const [loading, setLoading] = useState(false);
async function handleAsk() {
setOutput('');
setLoading(true);
const response = await fetch('https://api.nixapi.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${import.meta.env.VITE_NIXAPI_KEY}`,
},
body: JSON.stringify({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Explain quantum entanglement simply' }],
stream: true,
}),
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const lines = decoder.decode(value).split('\n');
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
const data = line.slice(6);
if (data === '[DONE]') break;
try {
const json = JSON.parse(data);
const text = json.choices[0]?.delta?.content ?? '';
setOutput(prev => prev + text);
} catch {}
}
}
setLoading(false);
}
return (
<div>
<button onClick={handleAsk} disabled={loading}>
{loading ? 'Generating...' : 'Ask'}
</button>
<p style={{ whiteSpace: 'pre-wrap' }}>{output}</p>
</div>
);
}
Security note: API keys in frontend code are visible to users. In production, proxy requests through your backend instead.
Things to Keep in Mind
- Timeouts: Streaming requests can run for a while. Set your HTTP timeout to at least 120s.
- Error handling: Catch exceptions on network drops and prompt the user to retry.
- Token usage: In streaming mode, the
usagefield only appears in the final chunk.
Summary
| Standard Mode | Streaming Mode | |
|---|---|---|
| Parameter | default | stream: true |
| Response format | Single JSON object | SSE data stream |
| User experience | Wait, then display all | Real-time token-by-token |
| Best for | Batch processing, background tasks | Chat UI, content generation |
👉 Sign up for NixAPI and try streaming for free
Try NixAPI Now
Reliable LLM API relay for OpenAI, Claude, Gemini, DeepSeek, Qwen, and Grok with ¥1 = $1 top-up
Sign Up Free