Why not Claude or Gemini?

OpenAI had the most stable API and the clearest pricing at the time FA Analysis was built. The structured output requirement (six sections, consistent delimiter format) is achievable with any capable LLM, but gpt-4o-mini was the known quantity. The input-hashed caching and triple-hash parsing approach is model-agnostic; switching providers would require only a client swap and possibly a prompt adjustment.

How do you handle hallucinations in finance?

The approach is to constrain the model's task so it cannot easily hallucinate. The prompt passes the actual numbers and asks the model to narrate them, not to recall or reason about the company from training data. Temperature 0.3 reduces the variance in how the model interprets that data. The model is not being asked what it knows about a stock; it is being asked to describe a specific dataset. That is a meaningfully different task.

Why server-first for the AI calls?

Two reasons. First, the OpenAI API key cannot be in the browser bundle without being exposed to anyone who looks at the network tab. Second, the AI response needs to go through the caching and hashing layer before it reaches the client. Both of those things are server concerns. Putting the OpenAI call on the client would require either a proxy route or leaking the key, and a proxy route is effectively just moving the server logic to a separate file.

What happens if both Yahoo Finance and Alpha Vantage go down at the same time?

The analysis call fails and the error surfaces. The stale-while-error pattern handles OpenAI downtime, not data source downtime. If the underlying data cannot be fetched, there is nothing to hash and nothing to compare, so returning cached analysis would mean serving analysis computed against different data. The right behaviour is to surface the failure and let the user try again rather than serve potentially misleading analysis.

FA Analysis · Faiz Kasman

A stock analysis call through OpenAI costs money and takes time. Both are recoverable if the underlying financial data has not changed. If it has changed, even slightly, the analysis has to run again. That tension drove every interesting decision in FA Analysis.

Context

What FA Analysis is

FA Analysis is a stock analytics platform that pairs real-time market data with LLM-generated narrative. A user enters a ticker, and the platform fetches price history, fundamentals, and trading activity from Yahoo Finance and Alpha Vantage, then passes a structured data payload to OpenAI. The response comes back as six named sections with a sentiment score attached. Charts render via Lightweight Charts, the WebGL-backed library from TradingView.

The stack is Next.js 15, TypeScript, shadcn/ui, and the OpenAI SDK. All AI calls happen server-side. The client receives structured JSON and renders it. There is no OpenAI key in the browser bundle and no OpenAI latency in the client critical path.

The interesting engineering is not the OpenAI call itself. Any application can call OpenAI. What makes the difference between a prototype and something that holds up under repeated use is everything around the call: how you decide whether to make it at all, how you handle the two external data sources that feed into it, and how you structure the prompt so the response stays useful as input to downstream rendering code rather than freeform prose that breaks on every minor model variation.

Architecture

Server-first, parallel by default

The architecture is server-first throughout. Next.js Server Components handle the data fetching layer, which means the OpenAI SDK, the Yahoo Finance client, and the Alpha Vantage client never appear in the browser bundle. The client receives pre-rendered HTML for the analysis text and structured JSON for the charts. Lightweight Charts handles the rendering in a client component using WebGL, which is why 1,250 data points across five years of daily price history stay at 60fps.

The two financial data providers run in parallel. Yahoo Finance carries the price, fundamentals, and company profile. Alpha Vantage carries supplementary technical data. Before parallel fetching was introduced, the sequential calls added up to roughly five seconds per request. Running both with Promise.all brought that to around 1.5 seconds. The parallel result feeds into a hashing step: the combined data payload is hashed, and that hash is compared against what was used for the previous analysis call. If the hash matches, the cached analysis is returned and OpenAI is never called. If it does not match, a new call is made and the result replaces the cache entry.

Cache TTLs are tiered by computation cost. Market sentiment, which derives from VIX data and requires a lighter computation pass, expires after 30 minutes. The full AI analysis, which involves the complete OpenAI call, expires after 24 hours. The stale-while-error pattern handles OpenAI downtime: if the provider is unreachable, the most recent cached analysis is returned rather than surfacing a broken state. The user gets slightly stale data rather than an error page.

01 · Model choice and temperature

The choice of gpt-4o-mini over gpt-4 came from thinking clearly about what the task actually is. FA Analysis does not ask the model to reason creatively, infer novel conclusions, or do anything that requires the full capacity of a larger model. The task is summarisation: take a structured data payload about a stock and produce a readable narrative organised into six named sections. gpt-4o-mini handles this at quality that is, in practice, indistinguishable from gpt-4 for this specific task, at a fraction of the cost per call.

Temperature is set to 0.3. A higher temperature would introduce variation in outputs that is actively harmful in a finance context. The model should not paraphrase the same fundamental data differently on two calls that happen to land at different random seeds. Low temperature keeps the output stable and grounded in the numbers passed in.

openai-service.ts

const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
  { role: 'system', content: systemPrompt },
  { role: 'user', content: analysisPayload },
],
temperature: 0.3,
max_tokens: 1500,
})

The max_tokens cap at 1,500 is a budget guardrail. The six-section output fits comfortably within that limit. Anything larger would either be padding or indicate the model drifting from the structured format.

02 · Parallel fetch and exponential backoff

Both data providers are called with Promise.all. If Yahoo Finance takes 900ms and Alpha Vantage takes 1,100ms, the total wait is 1,100ms, not 2,000ms. Before this change the calls were sequential, and the combined wait was often closer to five seconds depending on API response times at any given moment.

Retry logic uses exponential backoff: 1s, then 2s, then 4s, capping at 10s total wait across retries. The ValidationError class short-circuits this entirely. If a ticker input fails the regex validation (/^[A-Za-z.-]+$/) before any API call is made, a ValidationError is thrown and no retry cycle starts. A malformed ticker is not a transient network failure and retrying it would just burn API budget on a request that will fail the same way every time.

The backoff applies to network-level failures only. A ValidationError exits immediately. A 4xx from either provider that indicates a bad symbol also exits without retrying. Retries are reserved for cases where a second attempt has a reasonable chance of succeeding.

03 · Input-hashed caching and triple-hash delimiters

The caching mechanism ties the cache key to the content of the data, not to time. When the analysis call runs, the combined Yahoo Finance and Alpha Vantage payload is serialised and hashed. That hash is stored alongside the analysis result. On the next request for the same ticker, the new payload is hashed and compared. If the hashes match, the data has not changed, and the cached analysis is returned without calling OpenAI. Only a change in the underlying financial data triggers a new call.

This matters most for frequently requested tickers. A popular stock queried ten times in an hour on a quiet market day would generate ten identical OpenAI calls under a naive TTL-only approach. With input-hashed caching, it generates one.

The prompt uses triple-hash (###) delimiters to mark each section boundary, which makes the response machine-parseable. The six sections are:

### Market Summary / ### Trading Activity / ### Financial Health / ### Technical Signals / ### Risk Factors / ### Growth Drivers

The parser splits on ### and maps section headers to keys. This is more robust than asking the model to return JSON, which breaks if the model emits any preamble or explanation before the opening brace. Triple-hash delimiters survive minor output variation that would cause JSON parsing to fail.

Sentiment scores run from 0 to 100. Scores under 40 are bearish, 40 to 60 neutral, above 60 bullish.

5s to 1.5s
latency drop from sequential to parallel fetch
30min
market sentiment cache TTL
24h
AI analysis cache TTL
6
structured output sections via triple-hash delimiters
0.3
OpenAI temperature for factual grounding

Learnings

Q2 2024
Picking gpt-4o-mini over gpt-4 was a cost-versus-quality call, not a compromise. For a summarisation task with well-structured input, the smaller model produces output that is functionally equivalent. The right question is not 'which model is best?' but 'which model is sufficient for this specific task?'. Those are different questions and they lead to different answers.
Q2 2024
The stale-while-error pattern prevents OpenAI downtime from breaking the UI. Instead of surfacing an error state when the provider is unreachable, the most recent cached analysis is returned. Users get slightly stale data, which is almost always better than no data. The pattern is simple to implement and the failure mode it prevents is obvious in retrospect.
Q3 2024
The ValidationError short-circuit was added after noticing retry cycles burning API budget on malformed tickers. A ticker like '12345' will fail every time; retrying it is not helpful. Distinguishing between 'this input is invalid' and 'this request failed transiently' is a small code change with meaningful cost impact at any real usage volume.
Q3 2024
Lightweight Charts (TradingView's WebGL library) handles 1,250+ daily data points at 60fps without any optimisation work on the rendering side. Chart.js at that data volume requires downsampling or virtualisation to stay smooth. The switch was driven by performance need, but the API is also significantly cleaner for financial chart types.

FAQ

Why not Claude or Gemini?: OpenAI had the most stable API and the clearest pricing at the time FA Analysis was built. The structured output requirement (six sections, consistent delimiter format) is achievable with any capable LLM, but gpt-4o-mini was the known quantity. The input-hashed caching and triple-hash parsing approach is model-agnostic; switching providers would require only a client swap and possibly a prompt adjustment.
How do you handle hallucinations in finance?: The approach is to constrain the model's task so it cannot easily hallucinate. The prompt passes the actual numbers and asks the model to narrate them, not to recall or reason about the company from training data. Temperature 0.3 reduces the variance in how the model interprets that data. The model is not being asked what it knows about a stock; it is being asked to describe a specific dataset. That is a meaningfully different task.
Why server-first for the AI calls?: Two reasons. First, the OpenAI API key cannot be in the browser bundle without being exposed to anyone who looks at the network tab. Second, the AI response needs to go through the caching and hashing layer before it reaches the client. Both of those things are server concerns. Putting the OpenAI call on the client would require either a proxy route or leaking the key, and a proxy route is effectively just moving the server logic to a separate file.
What happens if both Yahoo Finance and Alpha Vantage go down at the same time?: The analysis call fails and the error surfaces. The stale-while-error pattern handles OpenAI downtime, not data source downtime. If the underlying data cannot be fetched, there is nothing to hash and nothing to compare, so returning cached analysis would mean serving analysis computed against different data. The right behaviour is to surface the failure and let the user try again rather than serve potentially misleading analysis.