Models - Methods Center

Kalman Filter

The Kalman Filter endpoint processes time series data to reduce noise and extract smooth, accurate predictions. It supports two modes of operation: stateless filtering and stateful filtering with persistent data storage.

Endpoint

POST /api/v1/{account_id}/kalman

Authentication: Required (API Key via X-API-Key header) Quota: Consumes 1 quota unit per request

Input Schema

KalmanInput

Field	Type	Required	Description
`results`	`array[array[float]]`	Yes	A 2D array where each inner list contains time series values
`save`	`boolean`	No (default: `false`)	Whether to save the data to the database for cumulative processing
`unique_identifier`	`string`	Conditional	Required when `save` is `true`. A unique identifier for grouping related data

Validation Rules

When save is true, unique_identifier must be provided
Each inner array in results must contain at least one value
All values must be valid floating-point numbers

Operation Modes

Mode 1: Stateless Filtering (Without ID)

Use Case: Process a single batch of time series data without persistence. Characteristics:

save: false
unique_identifier: Not required (can be null or omitted)
Data is not stored in the database
Filters only the provided data in the current request
Ideal for one-off analysis or real-time processing

Example Request:

{
  "results": [[10.2, 10.5, 10.1, 9.8, 10.3, 10.0, 9.9, 10.4]],
  "save": false
}

Behavior:

Accepts the input time series data
Applies Kalman filtering to the provided data
Returns filtered results, raw state estimates, and smoothed state estimates
Does not persist any data to the database

Mode 2: Stateful Filtering (With ID)

Use Case: Accumulate and process historical data over time for a specific identifier. Characteristics:

save: true
unique_identifier: Required (e.g., "sensor-001", "user-123")
Data is stored in the database with the provided identifier
Processes all historical data associated with the identifier, including the current request
Ideal for tracking trends over time, cumulative analysis, or multi-session processing

Example Request:

{
  "results": [[10.2, 10.5, 10.1, 9.8, 10.3]],
  "save": true,
  "unique_identifier": "sensor-001-2024"
}

Behavior:

Saves the incoming data to the database with the unique_identifier
Retrieves all previous data saved with the same unique_identifier for the account
Combines all historical data (ordered by creation time)
Applies Kalman filtering to the complete dataset
Returns filtered results based on all available data

Important Notes:

The filter processes data cumulatively, so each request includes all previous data with the same identifier
Results will change over time as more data is added
This is useful for progressive refinement of predictions as more observations become available

Data Format

Expected Input Format

The results field must be a 2D array (array of arrays). Each inner array represents a sequence of time series observations.

Single Time Series

{
  "results": [[10.2, 10.5, 10.1, 9.8, 10.3]]
}

This represents a single time series with 5 observations.

Multiple Time Series (Rows)

{
  "results": [
    [10.2, 10.5, 10.1],
    [9.8, 10.3, 10.0],
    [9.9, 10.4, 10.2]
  ]
}

This represents 3 separate time series, each with 3 observations. The Kalman filter processes these as sequential observations in the order provided.

Real-World Example: Sensor Data

{
  "results": [[23.5, 23.7, 23.4, 23.6, 23.8, 23.9, 23.5]],
  "save": true,
  "unique_identifier": "temperature-sensor-001"
}

Temperature readings from a sensor over 7 time points, saved for cumulative tracking.

Output Schema

KalmanOutput

Field	Type	Description
`filtered_data`	`array[float]`	The filtered time series values (predictions) after applying the Kalman filter
`raw_state`	`array[float]`	The raw state estimates from the forward pass of the Kalman filter
`smooth_state`	`array[float]`	The smoothed state estimates from the backward smoothing pass (RTS smoother)
`input_data`	`array[array[float]]`	Echo of the original input data from the request

Example Response

{
  "filtered_data": [10.2, 10.35, 10.25, 10.02, 10.15, 10.08, 10.01, 10.22],
  "raw_state": [10.2, 10.35, 10.25, 10.02, 10.15, 10.08, 10.01, 10.22],
  "smooth_state": [10.2, 10.33, 10.27, 10.05, 10.13, 10.09, 10.03, 10.2],
  "input_data": [[10.2, 10.5, 10.1, 9.8, 10.3, 10.0, 9.9, 10.4]]
}

Kalman Filter Algorithm

The Luna API uses a Rauch-Tung-Striebel (RTS) smoother, which consists of two passes:

1. Forward Pass (Prediction + Update)

For each observation:

Predict: Estimate the next state based on the current state
Update: Correct the prediction using the actual observation

Output: raw_state - state estimates from the forward pass

2. Backward Pass (Smoothing)

After the forward pass, the algorithm runs backward through the data to refine estimates using future observations. Output: smooth_state - refined state estimates

Model Parameters

The filter uses the following matrices (defined in modelling/constants.py):

F (State Transition Matrix): [[1]] - assumes state remains constant
H (Observation Matrix): 28x1 matrix mapping latent state to observations
Q (Process Noise Covariance): [[0.1001]] - system dynamics noise
R (Observation Noise Covariance): 28x28 matrix - measurement noise
x0 (Initial State): [[0]] - starting state estimate

Use Cases

Use Case 1: Student Dropout Prediction on Premise

Scenario: Connect your student portal programmatically to predict student dropout prediction.

curl -X POST "http://localhost:8000/api/v1/1/kalman" \
     -H "X-API-Key: your-api-key" \
     -H "Content-Type: application/json" \
     -d '{
       "results": [[22.1, 22.5, 22.3, 22.7, 22.4]],
       "save": false
     }'

Use Case 2: Weekly Data Accumulation

Scenario: Submit weekly data and process it cumulatively over time. Week 1:

{
  "results": [[10.2, 10.5, 10.1, 9.8, 10.3]],
  "save": true,
  "unique_identifier": "user-survey-1"
}

Week 2:

{
  "results": [[10.0, 9.9, 10.4, 10.2, 10.1]],
  "save": true,
  "unique_identifier": "user-survey-1"
}

The second request will process all 10 observations (5 from Week 1 + 5 from Week 2).

Use Case 3: Device-Specific Tracking

Scenario: Track data from multiple devices separately. Device A:

{
  "results": [[23.5, 23.7, 23.4]],
  "save": true,
  "unique_identifier": "device-A"
}

Device B:

{
  "results": [[18.2, 18.5, 18.3]],
  "save": true,
  "unique_identifier": "device-B"
}

Each device maintains its own data history.

Error Responses

400 Bad Request

Missing Identifier:

{
  "detail": "unique_identifier is required when save is True"
}

Invalid Data:

{
  "detail": "Input data must contain non-empty lists of observations"
}

403 Forbidden

Quota Exceeded:

{
  "detail": "Quota exceeded. Please upgrade your plan."
}

404 Not Found

Account Mismatch:

{
  "detail": "Account not found."
}

500 Internal Server Error

Processing Error:

{
  "detail": "Error processing Kalman filter: <error details>"
}

Data Storage

When save is true, data is stored in the data table with the following structure:

Column	Type	Description
`id`	Integer	Auto-generated primary key
`unique_identifier`	String	The identifier provided in the request
`data`	JSONB	The raw time series data (`results` array)
`account_id`	Integer	Foreign key to the account
`created_at`	Timestamp	When the data was saved

Data Retrieval

When processing with save=true, the service:

Saves the new data with the current timestamp
Queries all records matching the unique_identifier and account_id
Orders results by created_at (chronological order)
Flattens all data arrays into a single combined dataset
Applies filtering to the complete dataset

Best Practices

1. Choose the Right Mode

Use stateless mode (save=false) for:
- One-time analysis
- Real-time processing without history
- Testing and debugging
Use stateful mode (save=true) for:
- Longitudinal studies
- Progressive data collection
- Multi-session tracking

2. Identifier Naming Conventions

Use descriptive, hierarchical identifiers:

sensor-{device_id}-{location}
user-{user_id}-{metric_type}
experiment-{exp_id}-week-{week_number}

3. Data Quality

Ensure consistent sampling rates
Handle missing data before submission (or use NaN values, which the filter handles)
Validate data ranges to avoid extreme outliers that could destabilize the filter

4. Quota Management

Monitor your quota using GET /api/v1/account/quota
Each Kalman filter request consumes 1 quota unit, regardless of data size
Plan data submission frequency according to your quota allocation

Technical Details

Missing Value Handling

The Kalman filter automatically handles missing values (NaN):

If the first observation is missing, it initializes with a default value of 2
For subsequent missing values, it samples from the last observed state distribution
Missing values are imputed using the predicted state before updating

Numerical Stability

The filter uses:

Joseph form covariance update for numerical stability
Matrix inversion via np.linalg.inv (ensure observations are well-conditioned)
Covariance matrices are maintained as positive definite throughout

Performance Considerations

Stateless mode: Processing time is O(n) where n = number of observations
Stateful mode: Processing time is O(N) where N = total historical observations
Large cumulative datasets may increase processing time and quota consumption

Check Quota: GET /api/v1/account/quota - Monitor remaining API calls
Health Check: GET /api/v1/health - Verify API availability

Example Workflow

# 1. Check your quota
curl -X GET "http://localhost:8000/api/v1/account/quota" \
     -H "X-API-Key: your-api-key"

# 2. Submit initial data with identifier
curl -X POST "http://localhost:8000/api/v1/1/kalman" \
     -H "X-API-Key: your-api-key" \
     -H "Content-Type: application/json" \
     -d '{
       "results": [[10.2, 10.5, 10.1, 9.8, 10.3]],
       "save": true,
       "unique_identifier": "sensor-001"
     }'

# 3. Add more data later (cumulative processing)
curl -X POST "http://localhost:8000/api/v1/1/kalman" \
     -H "X-API-Key: your-api-key" \
     -H "Content-Type: application/json" \
     -d '{
       "results": [[10.0, 9.9, 10.4]],
       "save": true,
       "unique_identifier": "sensor-001"
     }'

# 4. Process different data without saving
curl -X POST "http://localhost:8000/api/v1/1/kalman" \
     -H "X-API-Key: your-api-key" \
     -H "Content-Type: application/json" \
     -d '{
       "results": [[15.2, 15.5, 15.1]],
       "save": false
     }'

Summary

The Kalman Filter model provides flexible time series processing with two distinct modes:

Feature	Stateless (`save=false`)	Stateful (`save=true`)
Identifier Required	No	Yes
Data Persistence	No	Yes
Processing Scope	Current request only	All historical data with same ID
Use Case	One-time filtering	Cumulative tracking
Database Impact	None	Stores data in `data` table

Choose the appropriate mode based on your application requirements and data workflow.

Overview

Research

Student Dropout Platform

Modelling API

Contact

​Kalman Filter

​Endpoint

​Input Schema

​KalmanInput

​Validation Rules

​Operation Modes

​Mode 1: Stateless Filtering (Without ID)

​Mode 2: Stateful Filtering (With ID)

​Data Format

​Expected Input Format

​Single Time Series

​Multiple Time Series (Rows)

​Real-World Example: Sensor Data

​Output Schema

​KalmanOutput

​Example Response

​Kalman Filter Algorithm

​1. Forward Pass (Prediction + Update)

​2. Backward Pass (Smoothing)

​Model Parameters

​Use Cases

​Use Case 1: Student Dropout Prediction on Premise

​Use Case 2: Weekly Data Accumulation

​Use Case 3: Device-Specific Tracking

​Error Responses

​400 Bad Request

​403 Forbidden

​404 Not Found

​500 Internal Server Error

​Data Storage

​Data Retrieval

​Best Practices

​1. Choose the Right Mode

​2. Identifier Naming Conventions

​3. Data Quality

​4. Quota Management

​Technical Details

​Missing Value Handling

​Numerical Stability

​Performance Considerations

​Related Endpoints

​Example Workflow

​Summary