Technology & Architecture

Enterprise AI infrastructure for reliable, scalable and privacy-compliant form assistants.

Architecture Overview

The FINO Suite follows a modular cloud architecture with clear layer separation. Each component is independently scalable and replaceable.

Frontend - Web Components (framework-independent, integrable into any website)
API Layer - REST API with authentication, rate limiting and tenant separation
AI Orchestration - Prompt management, context preparation and response validation
RAG Pipeline - Retrieval Augmented Generation with vector and hybrid search
Knowledge Layer - MCP servers, knowledge bases, document indices
Cloud infrastructure: Operated by default on EU data centres. Optionally available on STACKIT (Schwarz Group) as a sovereign cloud alternative - on customer request.

Language Models (LLMs)

FINO is model-agnostic and supports various Large Language Models. The choice of model can be configured per tenant and use case.

Provider Models Use Case EU Hosting
Anthropic Claude Sonnet family Dialogue management, form assistance, complex follow-up questions ✅ via EU infrastructure
Amazon Nova Pro, Nova Lite, Titan Document analysis, image processing, embeddings ✅ EU (Frankfurt)
Amazon Nova Sonic Speech processing (FINO Voice) ✅ EU (Stockholm)
Others Configurable on request Customer-specific requirements Depends on provider
Important: No model is trained with customer data. All requests are processed statelessly - no conversation data is permanently stored with the model providers.

RAG - Retrieval Augmented Generation

FINO uses RAG to base AI responses on verified facts rather than general model knowledge. The result: professionally accurate, up-to-date and traceable answers.

Retrieval (Knowledge Retrieval)

For each user query, relevant information is retrieved from the connected knowledge bases.

  • Vector search: Semantic matching via embeddings
  • Hybrid search: Combination of semantic and keyword search
  • Ranking: Relevance scoring and filtering of results
  • Source references: Every piece of information is traceable to its source

Generation (Response Generation)

The language model generates a response based on the retrieved information and conversation context.

  • Context window: Relevant documents are provided to the model
  • Prompt engineering: Domain-specific instructions control tone and accuracy
  • Validation: Responses are checked for consistency
  • Multilingual: Response in the user's language, form in German

Why RAG instead of pure LLM?

Without RAG (pure LLM):

  • Responses based on training data (outdated)
  • Hallucinations possible
  • No source references
  • Not tenant-specific

With RAG (FINO):

  • Responses based on current, verified sources
  • Fact-based and traceable
  • Source references with every response
  • Individual knowledge base per tenant

MCP - Model Context Protocol

FINO uses the Model Context Protocol (MCP) as an open standard for communication between AI systems and knowledge sources. This enables a flexible, extensible architecture.

Modularity

  • Knowledge sources as independent MCP servers
  • Easy addition and removal of data sources
  • Independent scaling per source
  • Standardised interfaces

Distributed Architecture

  • Multiple knowledge sources queryable in parallel
  • Multi-tenant configuration
  • Real-time knowledge base updates
  • Interoperability with various AI models

Integration & Interfaces

Frontend Integration

FINO is provided as a Web Component - a single HTML tag is all you need for integration.

<smart-chat default-language="en"></smart-chat>
  • No framework required
  • Works in any website
  • Responsive and accessible
  • Customisable design
Integration guide →

Backend Interfaces

Standardised APIs are available for deeper integrations.

  • REST API: Standard HTTP interface for all products
  • MCP Protocol: For knowledge base connectivity
  • Webhooks: Event-based notifications
  • Form mapping: Automatic mapping of AI responses to form fields

CMS Plugins

WordPress (available)

Ready-to-use plugin with graphical configuration interface. Installation via WordPress admin, branding customisation without code changes.

  • All branding options (colours, texts, logos)
  • Multilingual configuration
  • Page-level visibility control

Other CMS (on request)

Integrations for Drupal, Joomla and Shopware are planned. Contact us for your specific use case.

Performance & Scalability

Performance

  • Response times: Typically 3–5 seconds for complex queries
  • Caching: Intelligent caching for frequent queries
  • Streaming: Responses are streamed in real-time
  • Availability: 24/7 operation with automatic failover

Scalability

  • Horizontal: Automatic scaling during peak loads
  • Multi-tenant: Hundreds of tenants on one infrastructure
  • Modular: Individual components independently scalable
  • From pilot to production: Same architecture, different sizing

Technical questions?

We are happy to explain the architecture in detail and show how FINO fits into your system landscape.