Protocol

The Pulse protocol defines explicit contracts for models and datasources, ensuring type safety and reproducibility.

Model Protocol

Every model in Pulse is defined by a protocol contract that specifies its inputs, outputs, training configuration, and inference behavior.

Basic Structure

model: <model-name>
version: <semver>
runtime: <runtime>

input:
  type: object
  properties:
    <field>:
      type: <type>
      description: <string>
      required: <boolean>

output:
  type: object
  properties:
    <field>:
      type: <type>

training:
  datasource: <datasource-ref>
  snapshot: required | optional | disabled
  schedule: <cron-expression>
  
inference:
  timeout: <duration>
  retries: <number>
  cache:
    ttl: <duration>
    key: [<fields>]

Input/Output Types

Pulse supports JSON Schema types with additional ML-specific extensions:

Type	Description	Example
string	Text values	"hello"
number	Numeric values (int or float)	42, 3.14
boolean	True/false values	true
array	Lists of items	[1, 2, 3]
object	Nested structures	{"a": 1}
tensor	N-dimensional arrays	shape: [224, 224, 3]
embedding	Vector embeddings	dim: 768

Datasource Protocol

Datasource contracts define how Pulse connects to and reads from your data infrastructure.

PostgreSQL Example

datasource: user-events
type: postgresql

connection:
  host: ${POSTGRES_HOST}
  port: 5432
  database: analytics
  ssl: required
  pool:
    min: 2
    max: 10

schema:
  table: events
  columns:
    - name: id
      type: uuid
      primary: true
    - name: user_id
      type: uuid
      index: true
    - name: event_type
      type: string
      enum: [click, view, purchase]
    - name: properties
      type: jsonb
    - name: created_at
      type: timestamp
      index: true

query:
  filter: "created_at >= NOW() - INTERVAL '90 days'"
  order_by: created_at DESC

snapshot:
  strategy: incremental
  column: created_at
  retention: 180d
  compression: zstd

S3 Example

datasource: training-images
type: s3

connection:
  bucket: ml-training-data
  region: us-east-1
  credentials:
    access_key_id: ${AWS_ACCESS_KEY_ID}
    secret_access_key: ${AWS_SECRET_ACCESS_KEY}

schema:
  format: parquet
  partition_by: [date, category]
  columns:
    - name: image_path
      type: string
    - name: label
      type: string
    - name: confidence
      type: number

snapshot:
  strategy: full
  prefix: snapshots/
  retention: 365d

Snapshot Strategies

Pulse supports multiple snapshot strategies for different use cases:

full

Complete copy of the dataset. Best for small to medium datasets where incremental tracking is complex.

incremental

Only captures changes since the last snapshot. Requires a timestamp or version column. Best for large, append-only datasets.

cdc

Change Data Capture using database replication logs. Captures inserts, updates, and deletes. Best for mutable datasets.

Validation Rules

Add validation rules to ensure data quality:

input:
  type: object
  properties:
    age:
      type: number
      minimum: 0
      maximum: 150
    email:
      type: string
      format: email
    score:
      type: number
      multipleOf: 0.01
    tags:
      type: array
      minItems: 1
      maxItems: 10
      uniqueItems: true
    status:
      type: string
      enum: [pending, active, completed]

Inference Configuration

Fine-tune inference behavior with these options:

inference:
  # Request timeout
  timeout: 100ms
  
  # Retry configuration
  retries: 3
  retry_backoff: exponential
  
  # Response caching
  cache:
    enabled: true
    ttl: 60s
    key: [user_id, request_type]
    invalidate_on: [model_deploy, drift_detected]
  
  # Batching for throughput
  batch:
    enabled: true
    max_size: 32
    max_wait: 10ms
  
  # Circuit breaker
  circuit_breaker:
    enabled: true
    failure_threshold: 5
    reset_timeout: 30s
  
  # A/B testing
  traffic:
    canary: 10%
    shadow: true