Protocol
The Pulse protocol defines explicit contracts for models and datasources, ensuring type safety and reproducibility.
Model Protocol
Every model in Pulse is defined by a protocol contract that specifies its inputs, outputs, training configuration, and inference behavior.
Basic Structure
model: <model-name>
version: <semver>
runtime: <runtime>
input:
type: object
properties:
<field>:
type: <type>
description: <string>
required: <boolean>
output:
type: object
properties:
<field>:
type: <type>
training:
datasource: <datasource-ref>
snapshot: required | optional | disabled
schedule: <cron-expression>
inference:
timeout: <duration>
retries: <number>
cache:
ttl: <duration>
key: [<fields>]Input/Output Types
Pulse supports JSON Schema types with additional ML-specific extensions:
| Type | Description | Example |
|---|---|---|
| string | Text values | "hello" |
| number | Numeric values (int or float) | 42, 3.14 |
| boolean | True/false values | true |
| array | Lists of items | [1, 2, 3] |
| object | Nested structures | {"a": 1} |
| tensor | N-dimensional arrays | shape: [224, 224, 3] |
| embedding | Vector embeddings | dim: 768 |
Datasource Protocol
Datasource contracts define how Pulse connects to and reads from your data infrastructure.
PostgreSQL Example
datasource: user-events
type: postgresql
connection:
host: ${POSTGRES_HOST}
port: 5432
database: analytics
ssl: required
pool:
min: 2
max: 10
schema:
table: events
columns:
- name: id
type: uuid
primary: true
- name: user_id
type: uuid
index: true
- name: event_type
type: string
enum: [click, view, purchase]
- name: properties
type: jsonb
- name: created_at
type: timestamp
index: true
query:
filter: "created_at >= NOW() - INTERVAL '90 days'"
order_by: created_at DESC
snapshot:
strategy: incremental
column: created_at
retention: 180d
compression: zstdS3 Example
datasource: training-images
type: s3
connection:
bucket: ml-training-data
region: us-east-1
credentials:
access_key_id: ${AWS_ACCESS_KEY_ID}
secret_access_key: ${AWS_SECRET_ACCESS_KEY}
schema:
format: parquet
partition_by: [date, category]
columns:
- name: image_path
type: string
- name: label
type: string
- name: confidence
type: number
snapshot:
strategy: full
prefix: snapshots/
retention: 365dSnapshot Strategies
Pulse supports multiple snapshot strategies for different use cases:
full
Complete copy of the dataset. Best for small to medium datasets where incremental tracking is complex.
incremental
Only captures changes since the last snapshot. Requires a timestamp or version column. Best for large, append-only datasets.
cdc
Change Data Capture using database replication logs. Captures inserts, updates, and deletes. Best for mutable datasets.
Validation Rules
Add validation rules to ensure data quality:
input:
type: object
properties:
age:
type: number
minimum: 0
maximum: 150
email:
type: string
format: email
score:
type: number
multipleOf: 0.01
tags:
type: array
minItems: 1
maxItems: 10
uniqueItems: true
status:
type: string
enum: [pending, active, completed]Inference Configuration
Fine-tune inference behavior with these options:
inference:
# Request timeout
timeout: 100ms
# Retry configuration
retries: 3
retry_backoff: exponential
# Response caching
cache:
enabled: true
ttl: 60s
key: [user_id, request_type]
invalidate_on: [model_deploy, drift_detected]
# Batching for throughput
batch:
enabled: true
max_size: 32
max_wait: 10ms
# Circuit breaker
circuit_breaker:
enabled: true
failure_threshold: 5
reset_timeout: 30s
# A/B testing
traffic:
canary: 10%
shadow: true