Salt & Steel: Performance and Optimization

Document type: Technical Architecture — Canonical
Status: Canonical
Last updated: 2026-04
See also: Client Architecture | Server Architecture | Networking Model

Overview

Salt & Steel's performance targets are simultaneously ambitious and non-negotiable:

60 fps on mid-range hardware (RTX 3060 / RX 6600 equivalent, 1080p, Medium-High settings)
30 fps minimum on low-end hardware (GTX 1060 / RX 580 equivalent, 1080p, Low settings)
60 fps on PlayStation 5 / Xbox Series X in Performance mode
30 fps on PlayStation 4 / Xbox One at reduced visual settings

These targets must hold across the game's most demanding scenarios: a naval combat engagement with 8+ ships exchanging broadsides in a storm, or a 6-player land party clearing a dense dungeon with overlapping skill effects. Achieving these targets requires systematic optimization across every domain of the engine — not as a late-phase polish step, but as a design constraint applied from day one.

The unique challenge Salt & Steel adds over a standard ARPG: the dual-domain nature means that optimization work must be performed twice, once for the land combat renderer (dense characters, particle skills, dungeon geometry) and once for the naval renderer (ocean surface simulation, ship fleet rendering, weather VFX). The transitions between domains must not cause visible frame-rate spikes.

Ocean Rendering Optimization

Ocean rendering is the single most GPU-expensive system in Salt & Steel. A naively implemented ocean covering a 4km × 4km sea instance at full quality would be intractable. Every optimization below is necessary; none are optional.

Tessellation LOD

The ocean surface is rendered as a tessellated mesh using a concentric ring LOD system centered on the camera:

Ring configuration:

Ring 0 (0-50m radius): 4m tessellation spacing, full Tessendorf FFT displacement
Ring 1 (50-200m radius): 16m spacing, full FFT displacement, reduced normal map contribution
Ring 2 (200-800m radius): 64m spacing, displacement replaced by scrolling normal map animation
Ring 3 (800-4000m radius): 256m spacing, static mesh with animated color/foam texture

Ring transitions use smooth blending driven by distance from the camera to prevent visible edge artifacts. Total triangle budget for ocean geometry at any given frame: approximately 120,000-250,000 triangles, compared to a naively uniform-tessellation approach that would require 40+ million triangles for the same area.

GPU tessellation budget: Tessellation is capped per ring at a maximum hardware tessellation factor of 64. On high-end hardware, the tessellation factor is raised to the quality cap. On low-end hardware, the Ring 0 radius is reduced to 25m and the maximum tessellation factor is capped at 32.

FFT Ocean Compute Optimization

The Tessendorf FFT runs on the GPU as a compute shader:

FFT texture resolution: 512×512 (high quality), 256×256 (medium quality), 128×128 (low quality)
FFT is computed once per frame at the full render resolution, then resampled for each ring's LOD level — not recomputed per LOD
The FFT result is cached in a GPU texture that persists across frames; only the frequency-domain update (wind-speed-driven spectrum modification) runs every frame. The FFT itself runs at half frame rate (30 Hz) — interpolated between two computed frames to maintain smooth motion at 60 fps
On mid-range hardware, the frequency update runs every other frame (30 Hz) and the FFT itself runs at 15 Hz with temporal blending — reducing compute cost by 50% with minimal visual impact at distance

Performance budget target (mid-range GPU, 1080p):

Ocean surface compute (FFT): 0.8ms per frame
Ocean vertex tessellation: 0.5ms per frame
Ocean shading (subsurface, foam, reflections): 1.8ms per frame
Total ocean: 3.1ms of the 16.7ms frame budget (60 fps)

Reflection Optimization

Ocean reflections use a multi-tier approach:

Screen-space reflections (SSR): Captures reflections of geometry visible in the current frame. Runs at half resolution, upscaled with TAA. Cost: ~0.6ms. Works for ships, cliffs, and clouds visible to the camera.
Planar reflection pass: A low-frequency (every 4th frame) reflection render of the scene above the waterline into a 512×512 render target. Used when the camera is close to the water surface. Cost: ~1.2ms every 4th frame (averaged: 0.3ms/frame).
Precomputed sky reflection: The current sky/weather state is rendered into a cubemap at 128×128 resolution, updated every 2 seconds. Used for the base ocean reflection color at distance. Cost: negligible per frame.

SSR and planar reflections blend at the waterline: SSR dominates at close camera distances where precision matters; planar reflection dominates further out. This avoids the artifacts of each technique being used outside its ideal range.

Multi-Entity Naval Combat Optimization

A full naval engagement with 8 player ships and their crews is the peak render load scenario. Each ship has 40-60 crew members, 6-12 cannon mountings, cloth-simulated sails, and active particle effects from cannon fire. Without aggressive optimization, this scenario would require a gaming PC that doesn't exist yet.

GPU Instanced Ship Rendering

All ships of the same class share a base mesh. Their component variations (hull skin, cannon type, sail design, figurehead) are expressed as per-instance material parameters and attachment point offsets, not as unique meshes.

Draw call accounting (8 ships, worst case):

Hull geometry: 8 instances of shared hull mesh per class, 3 draw calls (3 hull classes represented)
Sail geometry: 8 instances per mast count variant, 4 draw calls (cloth simulated, unique vertex buffer per ship but shared shader)
Cannon geometry: GPU instanced across all cannon mounts of the same type, 6 draw calls for all cannon types
Crew geometry: GPU instanced skinned mesh, 4 draw calls (crew skeleton types × 2 for attacking/defending animation sets)
Figurehead: 8 individual draw calls (unique per ship) — the one non-instanced element
Rigging: Billboard or geometry, 2 draw calls per ship (LOD-dependent)
Total for 8 ships: approximately 45-60 draw calls

Compare to a non-instanced approach: 8 ships × 80+ draw calls = 640+ draw calls, exceeding the draw call budget for the entire scene.

Crew Rendering — GPU Skinned Instancing

Crew members use a GPU instanced skinning approach where the bone transform palette for all instances of a given skeleton type is stored in a compute buffer:

Bone transforms are computed in a pre-pass compute shader: one dispatch per crew animation set (attacking, idle, cannon reload, fire-fighting)
The vertex shader for crew meshes reads per-instance bone transforms from the compute buffer rather than from a CPU-uploaded uniform buffer
Animation blending is performed in the compute pre-pass
At 8 ships × 50 crew = 400 crew figures, the pre-pass cost is approximately 0.3ms per frame — negligible given the visual complexity rendered

LOD for crew:

Ships within 100m (deck camera distance): Full skeletal animation, full material
Ships 100-300m: Simplified skeleton (12 bones vs. full 64 bones), lower-res skin
Ships 300m+: GPU particle impostor — crew replaced by small billboards that still convey movement and density without skeletal simulation cost

Cannon Fire Particle System

Naval cannon fire is the highest-particle-count moment in the game. A simultaneous broadside from 8 ships (8 × 10 cannons = 80 cannons firing simultaneously) requires careful particle budget management.

Per-cannon-fire effect:

Muzzle flash: 30 particles (GPU, 3-frame lifetime), 1 point light (screen-space, 0.5s lifetime)
Smoke plume: 150 particles (GPU, 4s lifetime, wind-advected)
Cannonball trail: 40 particles (GPU, per-projectile, 0.2s trail)
Water impact (for misses): 200 particles (GPU, 1.5s lifetime, spray + foam patch)
Ship impact (for hits): 350 particles (GPU, 3s lifetime, splinter debris + impact dust + smoke)

Worst case (80 cannons firing, all hitting):

Muzzle: 80 × 30 = 2,400 particles
Smoke: 80 × 150 = 12,000 particles (longest-lived)
Impact: 80 × 350 = 28,000 particles
Active projectiles: 80 × 40 = 3,200 particles
Total: ~45,600 particles in the peak frame

This is within budget for mid-range hardware (target: 100,000 simultaneous particles). However, smoke accumulates across multiple broadsides — after 3 broadsides in 30 seconds, smoke particle count can reach 36,000 (12,000 × 3 overlapping generations). The particle system enforces a per-effect maximum budget: smoke particles are culled from the oldest generation when the total smoke particle count for a given ship exceeds 5,000.

Particle LOD: Particles from ships beyond 500m use billboard sprites instead of geometry. Ships beyond 1km receive a single animated smoke texture overlay (no particles). This is applied smoothly using a distance-based LOD blend.

Particle Budget Management

Particle effects appear in five main contexts in Salt & Steel:

Context	Peak Particle Count	Budget
Naval cannon combat (8 ships)	45,000-80,000	100,000
Personal combat (6 players + monsters)	15,000-40,000	80,000
Weather (storm, rain, lightning)	100,000-200,000	200,000
Magic effects (spells, supernatural)	10,000-30,000	60,000
Environmental (waterfalls, fires, bubbles)	5,000-20,000	40,000

When multiple contexts are active simultaneously (a storm during naval combat), the particle manager arbitrates budget across systems. Priority order (highest to lowest budget access):

Player-character-attached effects (always highest priority — these are the most noticeable)
Weather effects (structural to world atmosphere)
Enemy/nearby combat effects
Environmental background effects
Distant/culled effects

Budget enforcement mechanism:

Particle emitters register with the particle manager and request a budget allocation in particles
The manager allocates from the current frame's total budget, priority-ordered
Emitters that exceed their allocation reduce their emission rate proportionally — this maintains the visual character of the effect while shedding particles
Low-priority distant emitters are culled entirely if the total system budget is exceeded by 20%+

Dynamic budget scaling: The particle manager samples GPU frame time from the previous frame. If GPU frame time exceeded 90% of the frame budget, the particle manager reduces total allocation by 15% for the next frame. This provides automatic quality adjustment during unexpectedly heavy scenes without requiring pre-defined quality levels for every possible scenario.

Network Bandwidth Optimization for Ship State

The ship state synchronization protocol is designed to minimize bandwidth while maintaining sufficient fidelity for smooth rendering.

Delta Compression for Ship State

Ship state packets use delta compression: only fields that changed since the last transmitted state are included. A ship maintaining constant heading and speed (cruising between encounters) generates minimal delta packets:

Full ship state packet: 32 bytes
Ship cruising (no change except position): 6 bytes delta
Ship in combat (all fields changing): 28 bytes delta

The delta flag bitmask (1 byte) indicates which fields are present in the packet. The receiver applies only the present fields and retains unchanged fields from the previous state.

Quantization

All continuous values in ship state packets are quantized:

Position: 0.1m precision (1 cm position error at most — imperceptible at ship scale)
Heading: 0.35° precision (255 discrete headings via 8-bit representation)
Speed: 0.5-knot precision (practical ship speed range 0-20 knots fits in 6 bits)
Hull integrity: 1% precision (0-100 as uint8)
Crew count: 0-255 as uint8

At 4km × 4km sea instance with origin at (0,0), a 0.1m-precision position requires:

X: 40,000 positions → 16 bits
Y: 40,000 positions → 16 bits

This is 4 bytes for position vs. 8 bytes for full float32×2 — a 50% position data reduction with negligible precision loss.

Packet Batching

Multiple ship state updates within a single tick are batched into a single TCP segment. At 20 Hz with 8 ships, 8 ship state packets are combined into one segment, reducing per-packet TCP header overhead by 7/8.

Memory Management for Dual Land/Sea Systems

The core memory challenge: during a sea-to-land transition, both domain's assets must be simultaneously available in memory (the sea instance continues running while the land instance loads). This creates a potential double-loading scenario.

Resident Pool Architecture

Always-resident (never swapped): ~1.1 GB

Engine and rendering core assets
Player character models + animation banks (current outfit + 2 cached alternates)
Player ship model + current component set
UI assets + UI font atlases
Core audio banks (music, universal SFX)
Item/modifier/skill definition databases

Sea-domain pool: ~800 MB target

Ocean surface rendering resources (FFT textures, foam simulation, caustic maps)
Weather particle systems + shader variants
All ship hull LOD meshes (sea instances may contain any ship class)
Sea creature models (prioritized: creatures in current instance)
Hostile NPC ship assets

Land-domain pool: ~600 MB target

Current tileset geometry and textures (biome-specific)
Monster models for current region
NPC character models (for populated areas)
Environment props and dressing
Dungeon/island-specific audio banks

Transition overlap: During the transition, both pools must be resident simultaneously, creating a peak of approximately 2.5 GB total resident memory. This is comfortably within modern GPU VRAM budgets (8 GB+ for RTX 3060 / RX 6600 tier) and system RAM requirements.

Low-end handling (GPU with 6 GB VRAM): On launch, if VRAM pressure is detected above 85%, the sea-domain pool's NPC ship LOD 0 meshes are pre-swapped to LOD 1 for ships beyond 300m. This reduces sea-domain pool from ~800 MB to ~600 MB, providing 200 MB headroom for the transition overlap.

Asset Streaming Priority

The streaming manager assigns priority levels to asset requests:

Priority	Asset Type	Behavior
Critical	Player's own ship hull (LOD 0)	Always in VRAM, never evicted
Critical	Player character (LOD 0)	Always in VRAM
High	Nearby entities (< 50m)	Immediate load, no eviction
Medium	Mid-range entities (50-300m)	Load as bandwidth allows
Low	Distant entities (300m+)	Load during idle time, first to evict
Background	Next-region pre-load	Lowest priority, aborted if VRAM pressure rises

Pre-load on approach: When the player's ship is within 1km of a chart region boundary, the streaming manager begins loading assets for the adjacent region's sea instance at background priority. By the time the boundary is crossed, most assets are already resident.

Server-Side Performance Targets

Server optimization is as critical as client optimization. A simulation server that cannot maintain its tick rate will introduce visible stuttering in all connected clients.

Land Instance Simulation Budget (30 Hz target = 33ms per tick)

System	Budget (ms)	Notes
Monster AI / pathfinding	8ms	Up to 200 monsters; A* pathfinding batched across frame
GURPS combat resolution	6ms	Up to 30 simultaneous combat interactions per tick
Physics (projectile movement, collision)	4ms	Arrow/bullet/spell projectile motion
Status effect processing	2ms	Bleeding, poison, stun timers
State delta generation + serialization	5ms	Per-client AOI filtering + delta encoding
Overhead / slack	8ms	Spike headroom for heavy boss encounters

Combat resolution budget: 30 simultaneous combat interactions × ~200 microseconds each = 6ms. A GURPS combat interaction is: two 3d6 rolls (server RNG: ~10ns each) + hit location roll + damage calculation (arithmetic: ~50ns) + DR lookup + wound modifier application + HP update + status effect check. Total per-interaction: ~1-5 microseconds. The 6ms budget is extremely conservative — it handles 1000+ simultaneous interactions, which exceeds any realistic land-combat scenario.

Sea Instance Simulation Budget (20 Hz target = 50ms per tick)

System	Budget (ms)	Notes
Ship physics (position, heading, momentum)	8ms	Up to 16 player ships + 16 NPC ships
Wind/weather effect on ships	3ms	Per-ship wind vector projection
Cannon projectile simulation	6ms	Up to 128 active projectiles
Collision detection (ship-ship, proj-ship)	5ms	Broadphase sweep + narrowphase for candidates
Boarding sub-contexts (0-3 active)	10ms	Each boarding runs its own GURPS mini-sim
NPC AI (merchant convoys, patrol routes)	5ms	Simple waypoint-following + engagement logic
Sea creature AI	4ms	Behavior trees for up to 20 active creatures
State delta generation + serialization	7ms	More entities than land, larger AOI
Overhead / slack	2ms	Tight; peak scenarios may drop to 15 Hz

The sea instance is compute-tight at 50ms per tick. The primary risk is boarding sub-contexts: each boarding sub-context requires its own GURPS simulation (6ms each at 30 Hz, consuming ~3ms average per boarding context at the sea instance's 20 Hz cadence). Three simultaneous boardings would consume 9ms from the 50ms budget, well within headroom. Four or more simultaneous boardings in the same instance require tick-rate reduction for the ship simulation (drop to 15 Hz, freeing 16ms of budget).

Multi-Threading Architecture

Client Threading Model

Thread	Responsibility
Main/Game thread	Input handling, game state updates, client-side prediction
Render thread	Command buffer recording (Vulkan / DX12), submission
Worker pool (N-2 threads)	Particle simulation, physics, asset decompression, audio
Network thread	TCP send/receive, packet parsing, state application
Streaming thread	Asset loading from disk, texture mip streaming
Ocean compute	Dedicated compute queue for FFT ocean simulation (GPU-side)

The main thread and render thread run with highest OS priority. The network thread runs at high priority to minimize latency between packet receipt and state application. Worker pool and streaming threads run at normal priority.

Frame structure:

Frame N:
  Main thread: Update game state for frame N → Record UI draw calls
  Render thread: Record scene draw calls for frame N-1 (double-buffered)
  Worker pool: Simulate particles for frame N+1, decompress streaming assets
  Network thread: Process incoming state packets, apply to game state

Frame N+1:
  GPU: Execute draw calls recorded in frame N
  Main thread: Update game state for frame N+1
  Render thread: Record scene draw calls for frame N

The double-buffered frame structure allows the GPU to execute frame N's draw calls while the CPU records frame N+1's draw calls simultaneously — a standard pipelining approach that is essential for achieving GPU utilization above 80%.

Server Threading Model

Simulation servers use a single-threaded simulation loop (maintaining determinism) with parallel worker threads for independent subsystems:

Main simulation thread: Authoritative game state, GURPS resolution, entity state advancement
AI worker threads (N-1 cores): Monster AI computation (pathfinding, behavior trees) runs in parallel, results applied to main thread at tick boundary
Network I/O thread: Packet receive, parse, queue for main thread; state delta transmission
Checkpoint thread: Periodic state serialization and write to Account Authority (decoupled from main simulation to avoid stall)

The main simulation thread never blocks on I/O. AI results are applied as batch updates at the start of each tick. Network packets from the previous tick are applied as the first operation of the current tick. This design is deterministic and avoids the race conditions inherent in multi-threaded game simulation.

Performance Settings and Adaptive Quality

Salt & Steel provides six quality presets (Ultra, High, Medium, Low, Very Low, Custom) and an Adaptive Quality mode.

Adaptive Quality mode: The engine monitors GPU and CPU frame time each frame. If either exceeds 85% of the target frame budget for 3 consecutive frames, the Adaptive Quality manager reduces the lowest-priority visual settings by one step. If both are below 60% utilization for 5 consecutive seconds, it increases by one step. Visible settings changes are blended over 0.5 seconds to avoid jarring transitions.

Settings that Adaptive Quality controls (in order of reduction priority):

Particle budget ceiling (100% → 70% → 40%)
Ocean LOD aggressiveness (Normal → Aggressive → Very Aggressive)
Shadow resolution (2048 → 1024 → 512)
Screen-space reflection quality (Full → Half-res → Disabled)
Tessellation factor cap (64 → 32 → 16)
Post-processing quality (Full → Medium → Minimal)
Draw distance (Full → 75% → 50%)

Settings NOT subject to Adaptive Quality (always on, performance cost is fixed):

GURPS combat resolution (server-side, not a client cost)
Core ship and character rendering (non-negotiable for readability)
HUD and UI elements

Hardware detection and default preset assignment:

Hardware Tier	Default Preset
RTX 4080+ / RX 7900+	Ultra
RTX 3070-4070 / RX 6700-7700	High
RTX 3060 / RX 6600	Medium-High (custom)
GTX 1080-RTX 3060 Ti / RX 580-6600	Medium
GTX 1060 / RX 480	Low
Below GTX 1060	Very Low (fallback to DX11)

Profiling and Performance Monitoring

An in-game performance overlay (activated via developer console or F10 shortcut) displays:

Frame time (ms) and current FPS
GPU frame time breakdown (ocean, shadows, particles, UI, other)
CPU frame time breakdown (game logic, AI, network, render recording)
Memory: VRAM used / total, RAM used / total
Particle count: current / budget
Draw call count
Network: current latency (RTT), bytes/sec in/out, packet loss rate

This overlay is available in all build types (not stripped in retail builds) to allow players to diagnose their own performance issues and report specific metrics when filing support tickets.

Telemetry: With player consent, anonymized per-frame performance statistics are transmitted to the backend analytics pipeline. This data identifies hardware configurations experiencing below-target performance, informs future optimization priorities, and validates that patch changes have the expected performance impact across the real player population.

Cross-references: design/10-technical-architecture/client-architecture.md, design/10-technical-architecture/server-architecture.md, design/10-technical-architecture/networking-model.md, design/11-visual-and-audio/rendering-pipeline.md