Skip to content

KOTA Query Language (KQL) Design

Overview

KQL is designed to be a natural, intuitive query language that bridges human thought patterns and AI cognitive processes. Unlike SQL, which was designed for tabular data, KQL natively understands documents, relationships, time, and meaning.

Design Philosophy

  1. Natural Language First: Queries should read like thoughts
  2. Context-Aware: Implicit understanding of current context
  3. Temporal by Default: Time is always a consideration
  4. Relationship-Centric: Everything connects to everything
  5. AI-Native: Designed for LLM generation and interpretation

Query Types

1. Natural Language Queries

The primary interface is natural language, processed by an LLM-powered parser:

"What did I learn about rust last week?"
"Show me all meetings with Greg from Cogzia"
"Find documents similar to distributed cognition"
"What are my productivity patterns?"
"When was the last time I felt energized after a meeting?"

2. Structured Queries

For precise control and programmatic access:

// Find related documents
{
  type: "graph",
  start: "projects/kota-ai/README.md",
  follow: ["related", "references"],
  depth: 2,
  filter: {
    tags: { $contains: "architecture" }
  }
}

// Semantic search with filters
{
  type: "semantic",
  query: "consciousness implementation",
  threshold: 0.7,
  filter: {
    created: { $gte: "2025-01-01" },
    path: { $match: "*/consciousness/*" }
  },
  limit: 10
}

// Temporal aggregation
{
  type: "temporal",
  aggregate: "count",
  groupBy: "day",
  filter: {
    tags: { $contains: "meeting" }
  },
  range: "last_month"
}

3. Hybrid Queries

Combining natural language with structured precision:

"meetings with Greg" WHERE {
  participants: { $contains: "Greg" },
  duration: { $gte: "30m" }
} ORDER BY created DESC

Query Syntax

Basic Structure

[NATURAL_LANGUAGE] [WHERE CONDITIONS] [ORDER BY fields] [LIMIT n]

Natural Language Processing

The NLP parser extracts: - Intent: search, analyze, summarize, etc. - Entities: people, projects, topics, dates - Modifiers: recent, important, related to - Context: current document, time, previous queries

Structured Conditions

Comparison Operators

  • $eq: Equals
  • $ne: Not equals
  • $gt, $gte: Greater than (or equal)
  • $lt, $lte: Less than (or equal)
  • $in: In array
  • $contains: Contains substring/element
  • $match: Regex/glob pattern match

Logical Operators

  • $and: All conditions must match
  • $or: Any condition must match
  • $not: Negation
  • $exists: Field exists

Special Operators

  • $similar: Semantic similarity
  • $near: Temporal/spatial proximity
  • $related: Graph relationship exists
  • $matches_pattern: Behavioral pattern matching

Field References

Standard fields: - path: File path - title: Document title - content: Full text content - tags: Tag array - created, updated: Timestamps - frontmatter.*: Any frontmatter field

Computed fields: - relevance: Relevance score - distance: Semantic distance - depth: Graph traversal depth - age: Time since creation

Query Examples

1. Content Discovery

# Natural language
"rust programming tutorials"

# Structured equivalent
{
  type: "text",
  query: "rust programming tutorials",
  boost: {
    title: 2.0,
    tags: 1.5,
    content: 1.0
  }
}

# With filters
"rust tutorials" WHERE {
  created: { $gte: "2024-01-01" },
  tags: { $contains: ["programming", "rust"] }
}

2. Relationship Navigation

# Find all documents connected to a project
GRAPH {
  start: "projects/kota-ai",
  follow: ["related", "implements", "references"],
  depth: 3,
  return: ["path", "title", "relationship_type"]
}

# Find collaboration patterns
"documents edited with Charlie" GRAPH {
  edge_filter: {
    type: "co-edited",
    participant: "Charlie"
  }
}

3. Temporal Analysis

# Activity timeline
TIMELINE {
  range: "last_month",
  events: ["created", "updated"],
  groupBy: "day",
  include: ["meetings", "code_changes", "notes"]
}

# Productivity patterns
"When am I most productive?" ANALYZE {
  metric: "documents_created",
  correlate_with: ["time_of_day", "recovery_score", "previous_activity"],
  period: "last_3_months"
}

4. Semantic Exploration

# Find similar concepts
SIMILAR TO "distributed cognition" {
  threshold: 0.7,
  expand: true,  // Include related concepts
  limit: 20
}

# Concept clustering
CLUSTER {
  algorithm: "semantic",
  min_similarity: 0.6,
  max_clusters: 10
}

5. Complex Queries

# Multi-step analysis
PIPELINE [
  // Step 1: Find all meetings
  { 
    type: "text",
    query: "meeting",
    filter: { tags: { $contains: "meeting" } }
  },

  // Step 2: Extract participants
  {
    type: "extract",
    field: "participants",
    unique: true
  },

  // Step 3: Analyze collaboration frequency
  {
    type: "aggregate",
    groupBy: "participant",
    count: "meetings",
    average: "duration"
  }
]

# Pattern detection
DETECT PATTERN {
  name: "breakthrough_after_struggle",
  sequence: [
    { tags: { $contains: "challenge" }, sentiment: "negative" },
    { tags: { $contains: "solution" }, sentiment: "positive" },
  ],
  within: "1 week",
  min_occurrences: 3
}

Query Processing Pipeline

1. Natural Language Understanding

pub struct NLUParser {
    // LLM for intent extraction
    llm: Box<dyn LanguageModel>,

    // Entity recognition
    entity_extractor: EntityExtractor,

    // Temporal expression parser
    temporal_parser: TemporalParser,

    // Context manager
    context: QueryContext,
}

impl NLUParser {
    pub async fn parse(&self, query: &str) -> Result<ParsedQuery> {
        // 1. Extract intent and entities
        let intent = self.extract_intent(query).await?;
        let entities = self.extract_entities(query)?;

        // 2. Resolve temporal expressions
        let temporal = self.parse_temporal(query)?;

        // 3. Build structured query
        self.build_query(intent, entities, temporal)
    }
}

2. Query Optimization

pub struct QueryOptimizer {
    // Statistics for cost estimation
    stats: DatabaseStatistics,

    // Index availability
    indices: IndexCatalog,

    // Rewrite rules
    rules: Vec<RewriteRule>,
}

impl QueryOptimizer {
    pub fn optimize(&self, query: Query) -> OptimizedQuery {
        // 1. Apply rewrite rules
        let rewritten = self.apply_rules(query);

        // 2. Choose optimal indices
        let index_plan = self.select_indices(&rewritten);

        // 3. Generate execution plan
        self.generate_plan(rewritten, index_plan)
    }
}

3. Query Execution

pub struct QueryExecutor {
    // Storage engine
    storage: StorageEngine,

    // Index manager
    indices: IndexManager,

    // Cache for repeated queries
    cache: QueryCache,
}

impl QueryExecutor {
    pub async fn execute(&self, plan: ExecutionPlan) -> QueryResult {
        // Check cache first
        if let Some(cached) = self.cache.get(&plan) {
            return cached;
        }

        // Execute plan steps
        let result = self.execute_plan(plan).await?;

        // Cache results
        self.cache.put(&plan, &result);

        result
    }
}

Context-Aware Features

1. Pronoun Resolution

"What did we discuss?" 
// Resolves 'we' based on current document participants

"Show me more like this"
// 'this' refers to currently viewed document

2. Temporal Context

"What happened next?"
// Continues from previous query time range

"Earlier meetings"
// Relative to last query results

3. Implicit Filters

// In consciousness session context
"recent insights"
// Automatically filters to consciousness-generated content

// In project context
"related issues"
// Scoped to current project

Query Result Types

1. Document Results

pub struct DocumentResult {
    // Core document data
    pub id: DocumentId,
    pub path: String,
    pub title: String,

    // Relevance and scoring
    pub score: f32,
    pub highlights: Vec<Highlight>,

    // Context
    pub breadcrumbs: Vec<String>,
    pub related: Vec<DocumentId>,
}

2. Graph Results

pub struct GraphResult {
    // Nodes
    pub nodes: Vec<Node>,

    // Edges
    pub edges: Vec<Edge>,

    // Traversal metadata
    pub paths: Vec<Path>,
    pub depths: HashMap<NodeId, u32>,
}

3. Analytical Results

pub struct AnalyticalResult {
    // Aggregations
    pub aggregates: HashMap<String, Value>,

    // Time series
    pub series: Option<TimeSeries>,

    // Statistics
    pub stats: Statistics,

    // Insights (LLM-generated)
    pub insights: Vec<Insight>,
}

Advanced Features

1. Query Macros

Define reusable query patterns:

DEFINE MACRO weekly_review AS {
  PIPELINE [
    { type: "temporal", range: "last_week" },
    { type: "aggregate", by: "day", count: "activities" },
    { type: "analyze", generate: "insights" }
  ]
}

// Use macro
EXECUTE weekly_review WHERE { tags: { $contains: "work" } }

2. Continuous Queries

Subscribe to ongoing results:

SUBSCRIBE TO "new insights" {
  filter: {
    type: "consciousness_session",
    created: { $gte: "now" }
  },
  notify: "webhook://localhost:8080/insights"
}

3. Query Learning

System learns from usage patterns:

pub struct QueryLearner {
    // Track query patterns
    query_history: Vec<QueryRecord>,

    // Learn common refinements
    refinement_patterns: HashMap<QueryPattern, Vec<Refinement>>,

    // Suggest improvements
    suggestion_engine: SuggestionEngine,
}

Integration with KOTA

1. Consciousness Queries

# Find patterns in consciousness sessions
CONSCIOUSNESS {
  analyze: "themes",
  period: "last_month",
  min_frequency: 3
}

# Track insight evolution
CONSCIOUSNESS EVOLUTION {
  concept: "distributed cognition",
  show: ["first_mention", "developments", "current_understanding"]
}

2. Health Correlations

# Correlate productivity with health
CORRELATE {
  metric1: "documents_created",
  metric2: "whoop.recovery_score",
  period: "last_3_months",
  lag: [0, 1, 2]  // days
}

3. Project Intelligence

# Project health check
PROJECT "kota-ai" ANALYZE {
  metrics: ["velocity", "complexity", "technical_debt"],
  compare_to: "baseline",
  suggest: "improvements"
}

Error Handling

Query Errors

{
  error: {
    type: "PARSE_ERROR",
    message: "Unexpected token 'WHER' - did you mean 'WHERE'?",
    position: 45,
    suggestion: "WHERE"
  }
}

Graceful Degradation

{
  warning: "Semantic index unavailable, falling back to text search",
  results: [...],  // Still returns results
  suggestions: ["Try again later for semantic results"]
}

Performance Considerations

1. Query Complexity Limits

[limits]
max_depth = 5           # Graph traversal
max_results = 10000     # Result set size
max_duration = 5000     # Query timeout (ms)
max_memory = 100        # Memory limit (MB)

2. Query Hints

"complex analysis" HINTS {
  use_index: "semantic",
  parallel: true,
  cache: false
}

Future Extensions

1. Multi-Modal Queries

"Find screenshots similar to [image]"
"Documents discussed in [audio_file]"

2. Federated Queries

FEDERATE {
  sources: ["local", "github", "google_drive"],
  query: "project documentation",
  merge_by: "similarity"
}

3. Predictive Queries

PREDICT {
  what: "next_document_needed",
  based_on: "current_context",
  confidence: 0.8
}

Conclusion

KQL is designed to grow with KOTA's cognitive capabilities. It bridges natural human expression with precise data operations, enabling true distributed cognition. The language will evolve based on usage patterns, becoming more intuitive and powerful over time.

The key innovation is treating queries not as database operations, but as cognitive requests - allowing KOTA to understand not just what you're looking for, but why you're looking for it.