Cloud software
Overview
Cloud software, also called Software-as-a-Service (SaaS) or web-based software, is an application delivery model in which users access programs and data through internet-connected devices without requiring local installation or infrastructure maintenance. The software runs on remote servers operated by a provider, and users interact with it primarily through web browsers or lightweight client applications. This architectural shift abstracts infrastructure and platform concerns from end users, who instead pay subscription or usage-based fees for on-demand access.
In the context of LLM-era AI systems, cloud software infrastructure underpins the delivery of large language models, foundation models, and AI services. Inference infrastructure for production LLM deployments typically operates as cloud software, enabling batch inference, real-time API access, and multi-tenant resource sharing. Organizations use cloud software platforms to host answer generation systems, AI search engines, and multi-agent orchestration workflows.
Cloud software architecture introduces particular considerations for AI transparency and governance. Model cards, transparency reports, and Bills of Materials must reflect the cloud deployment context, including data handling practices and access controls. Content filtering, guardrails, and Acceptable Use Policies are enforced at the cloud service layer rather than locally.
The distributed nature of cloud software creates trade-offs in latency versus throughput. Single-user request latency can exceed local execution, but cloud infrastructure enables parallel processing of multiple requests and efficient resource scaling. For applications requiring long-term memory or persistent agent memory, cloud databases provide centralized storage unavailable to purely local models.
How it works
Cloud software operates through a client-server architecture with several layers:
- Presentation layer: Users interact with a web interface (HTML/CSS/JavaScript rendered in a browser) or a native client application that communicates with remote servers via HTTP(S) or other network protocols.
- Application layer: Business logic and algorithms run on cloud provider servers, often distributed across multiple machines for fault tolerance and load balancing. For LLM applications, this includes the inference engine that executes model forward passes and manages context windows.
- Data layer: User data, model weights, and application state are persisted in cloud databases and storage systems. Multi-tenancy is achieved through logical isolation (separate databases, encryption keys, or schema partitioning) rather than physical hardware separation.
- Infrastructure layer: Virtual machines, containers, or managed services abstract underlying physical hardware. The cloud provider provisions, monitors, and scales resources automatically based on demand.
API-based cloud software enables programmatic access alongside browser-based interfaces. An AI crawler or search engine may access LLM cloud services through REST or gRPC APIs, receiving structured responses that incorporate grounding via contextual retrieval or knowledge graphs.
For inference, cloud software typically implements request queueing, batching, and caching to optimize throughput for multiple concurrent users. Embedding services and dense retrieval pipelines run as cloud microservices, often backed by vector databases.
| Term | Distinction |
|---|---|
| On-premises software | Installed and executed on user-controlled hardware; user manages infrastructure, updates, and data. Cloud software outsources these responsibilities to a third-party provider. |
| Open-source LLM | Refers to the licensing and availability of model code and weights. An open-source LLM may be served via cloud software (hosted by a provider) or run locally; cloud software is a delivery mechanism, not a licensing model. |
| Foundation model | A large pre-trained neural network. Cloud software is the delivery platform for accessing or hosting a foundation model; the model itself is the artifact being delivered. |
| Edge computing | Computation performed on local or nearby devices to reduce latency. Cloud software centralizes computation remotely; edge deployments complement or replace cloud approaches for latency-sensitive scenarios. |
| Inference infrastructure | The technical stack (servers, GPUs, frameworks) required to run model inference. Cloud software is a commercial and operational model built atop inference infrastructure. |
Examples
- OpenAI API and ChatGPT: ChatGPT is a cloud software application delivering foundation model inference via a web interface; the OpenAI API provides programmatic cloud software access to the same underlying model for third-party applications.
- Google Generative AI Studio: A cloud software platform allowing users to interact with multimodal LLMs (text and image) and fine-tune models without managing inference infrastructure.
- Anthropic Claude API and Claude.ai: Claude is accessible as cloud software through a web application (Claude.ai) and RESTful API, with context windows up to 200K tokens and support for in-context learning.
- Hugging Face Inference API: A cloud software layer that hosts open-source LLM models (Mistral, Llama) alongside proprietary models, providing batch and real-time inference without requiring users to provision GPUs.
See also
- Large language model
- Inference infrastructure
- Foundation model
- Acceptable Use Policy (AI)
- Model transparency report
- Latency vs throughput (LLM)