KOTA Query Language (KQL) Design¶
Overview¶
KQL is designed to be a natural, intuitive query language that bridges human thought patterns and AI cognitive processes. Unlike SQL, which was designed for tabular data, KQL natively understands documents, relationships, time, and meaning.
Design Philosophy¶
- Natural Language First: Queries should read like thoughts
- Context-Aware: Implicit understanding of current context
- Temporal by Default: Time is always a consideration
- Relationship-Centric: Everything connects to everything
- AI-Native: Designed for LLM generation and interpretation
Query Types¶
1. Natural Language Queries¶
The primary interface is natural language, processed by an LLM-powered parser:
"What did I learn about rust last week?"
"Show me all meetings with Greg from Cogzia"
"Find documents similar to distributed cognition"
"What are my productivity patterns?"
"When was the last time I felt energized after a meeting?"
2. Structured Queries¶
For precise control and programmatic access:
// Find related documents
{
type: "graph",
start: "projects/kota-ai/README.md",
follow: ["related", "references"],
depth: 2,
filter: {
tags: { $contains: "architecture" }
}
}
// Semantic search with filters
{
type: "semantic",
query: "consciousness implementation",
threshold: 0.7,
filter: {
created: { $gte: "2025-01-01" },
path: { $match: "*/consciousness/*" }
},
limit: 10
}
// Temporal aggregation
{
type: "temporal",
aggregate: "count",
groupBy: "day",
filter: {
tags: { $contains: "meeting" }
},
range: "last_month"
}
3. Hybrid Queries¶
Combining natural language with structured precision:
"meetings with Greg" WHERE {
participants: { $contains: "Greg" },
duration: { $gte: "30m" }
} ORDER BY created DESC
Query Syntax¶
Basic Structure¶
Natural Language Processing¶
The NLP parser extracts: - Intent: search, analyze, summarize, etc. - Entities: people, projects, topics, dates - Modifiers: recent, important, related to - Context: current document, time, previous queries
Structured Conditions¶
Comparison Operators¶
$eq
: Equals$ne
: Not equals$gt
,$gte
: Greater than (or equal)$lt
,$lte
: Less than (or equal)$in
: In array$contains
: Contains substring/element$match
: Regex/glob pattern match
Logical Operators¶
$and
: All conditions must match$or
: Any condition must match$not
: Negation$exists
: Field exists
Special Operators¶
$similar
: Semantic similarity$near
: Temporal/spatial proximity$related
: Graph relationship exists$matches_pattern
: Behavioral pattern matching
Field References¶
Standard fields: - path
: File path - title
: Document title - content
: Full text content - tags
: Tag array - created
, updated
: Timestamps - frontmatter.*
: Any frontmatter field
Computed fields: - relevance
: Relevance score - distance
: Semantic distance - depth
: Graph traversal depth - age
: Time since creation
Query Examples¶
1. Content Discovery¶
# Natural language
"rust programming tutorials"
# Structured equivalent
{
type: "text",
query: "rust programming tutorials",
boost: {
title: 2.0,
tags: 1.5,
content: 1.0
}
}
# With filters
"rust tutorials" WHERE {
created: { $gte: "2024-01-01" },
tags: { $contains: ["programming", "rust"] }
}
2. Relationship Navigation¶
# Find all documents connected to a project
GRAPH {
start: "projects/kota-ai",
follow: ["related", "implements", "references"],
depth: 3,
return: ["path", "title", "relationship_type"]
}
# Find collaboration patterns
"documents edited with Charlie" GRAPH {
edge_filter: {
type: "co-edited",
participant: "Charlie"
}
}
3. Temporal Analysis¶
# Activity timeline
TIMELINE {
range: "last_month",
events: ["created", "updated"],
groupBy: "day",
include: ["meetings", "code_changes", "notes"]
}
# Productivity patterns
"When am I most productive?" ANALYZE {
metric: "documents_created",
correlate_with: ["time_of_day", "recovery_score", "previous_activity"],
period: "last_3_months"
}
4. Semantic Exploration¶
# Find similar concepts
SIMILAR TO "distributed cognition" {
threshold: 0.7,
expand: true, // Include related concepts
limit: 20
}
# Concept clustering
CLUSTER {
algorithm: "semantic",
min_similarity: 0.6,
max_clusters: 10
}
5. Complex Queries¶
# Multi-step analysis
PIPELINE [
// Step 1: Find all meetings
{
type: "text",
query: "meeting",
filter: { tags: { $contains: "meeting" } }
},
// Step 2: Extract participants
{
type: "extract",
field: "participants",
unique: true
},
// Step 3: Analyze collaboration frequency
{
type: "aggregate",
groupBy: "participant",
count: "meetings",
average: "duration"
}
]
# Pattern detection
DETECT PATTERN {
name: "breakthrough_after_struggle",
sequence: [
{ tags: { $contains: "challenge" }, sentiment: "negative" },
{ tags: { $contains: "solution" }, sentiment: "positive" },
],
within: "1 week",
min_occurrences: 3
}
Query Processing Pipeline¶
1. Natural Language Understanding¶
pub struct NLUParser {
// LLM for intent extraction
llm: Box<dyn LanguageModel>,
// Entity recognition
entity_extractor: EntityExtractor,
// Temporal expression parser
temporal_parser: TemporalParser,
// Context manager
context: QueryContext,
}
impl NLUParser {
pub async fn parse(&self, query: &str) -> Result<ParsedQuery> {
// 1. Extract intent and entities
let intent = self.extract_intent(query).await?;
let entities = self.extract_entities(query)?;
// 2. Resolve temporal expressions
let temporal = self.parse_temporal(query)?;
// 3. Build structured query
self.build_query(intent, entities, temporal)
}
}
2. Query Optimization¶
pub struct QueryOptimizer {
// Statistics for cost estimation
stats: DatabaseStatistics,
// Index availability
indices: IndexCatalog,
// Rewrite rules
rules: Vec<RewriteRule>,
}
impl QueryOptimizer {
pub fn optimize(&self, query: Query) -> OptimizedQuery {
// 1. Apply rewrite rules
let rewritten = self.apply_rules(query);
// 2. Choose optimal indices
let index_plan = self.select_indices(&rewritten);
// 3. Generate execution plan
self.generate_plan(rewritten, index_plan)
}
}
3. Query Execution¶
pub struct QueryExecutor {
// Storage engine
storage: StorageEngine,
// Index manager
indices: IndexManager,
// Cache for repeated queries
cache: QueryCache,
}
impl QueryExecutor {
pub async fn execute(&self, plan: ExecutionPlan) -> QueryResult {
// Check cache first
if let Some(cached) = self.cache.get(&plan) {
return cached;
}
// Execute plan steps
let result = self.execute_plan(plan).await?;
// Cache results
self.cache.put(&plan, &result);
result
}
}
Context-Aware Features¶
1. Pronoun Resolution¶
"What did we discuss?"
// Resolves 'we' based on current document participants
"Show me more like this"
// 'this' refers to currently viewed document
2. Temporal Context¶
"What happened next?"
// Continues from previous query time range
"Earlier meetings"
// Relative to last query results
3. Implicit Filters¶
// In consciousness session context
"recent insights"
// Automatically filters to consciousness-generated content
// In project context
"related issues"
// Scoped to current project
Query Result Types¶
1. Document Results¶
pub struct DocumentResult {
// Core document data
pub id: DocumentId,
pub path: String,
pub title: String,
// Relevance and scoring
pub score: f32,
pub highlights: Vec<Highlight>,
// Context
pub breadcrumbs: Vec<String>,
pub related: Vec<DocumentId>,
}
2. Graph Results¶
pub struct GraphResult {
// Nodes
pub nodes: Vec<Node>,
// Edges
pub edges: Vec<Edge>,
// Traversal metadata
pub paths: Vec<Path>,
pub depths: HashMap<NodeId, u32>,
}
3. Analytical Results¶
pub struct AnalyticalResult {
// Aggregations
pub aggregates: HashMap<String, Value>,
// Time series
pub series: Option<TimeSeries>,
// Statistics
pub stats: Statistics,
// Insights (LLM-generated)
pub insights: Vec<Insight>,
}
Advanced Features¶
1. Query Macros¶
Define reusable query patterns:
DEFINE MACRO weekly_review AS {
PIPELINE [
{ type: "temporal", range: "last_week" },
{ type: "aggregate", by: "day", count: "activities" },
{ type: "analyze", generate: "insights" }
]
}
// Use macro
EXECUTE weekly_review WHERE { tags: { $contains: "work" } }
2. Continuous Queries¶
Subscribe to ongoing results:
SUBSCRIBE TO "new insights" {
filter: {
type: "consciousness_session",
created: { $gte: "now" }
},
notify: "webhook://localhost:8080/insights"
}
3. Query Learning¶
System learns from usage patterns:
pub struct QueryLearner {
// Track query patterns
query_history: Vec<QueryRecord>,
// Learn common refinements
refinement_patterns: HashMap<QueryPattern, Vec<Refinement>>,
// Suggest improvements
suggestion_engine: SuggestionEngine,
}
Integration with KOTA¶
1. Consciousness Queries¶
# Find patterns in consciousness sessions
CONSCIOUSNESS {
analyze: "themes",
period: "last_month",
min_frequency: 3
}
# Track insight evolution
CONSCIOUSNESS EVOLUTION {
concept: "distributed cognition",
show: ["first_mention", "developments", "current_understanding"]
}
2. Health Correlations¶
# Correlate productivity with health
CORRELATE {
metric1: "documents_created",
metric2: "whoop.recovery_score",
period: "last_3_months",
lag: [0, 1, 2] // days
}
3. Project Intelligence¶
# Project health check
PROJECT "kota-ai" ANALYZE {
metrics: ["velocity", "complexity", "technical_debt"],
compare_to: "baseline",
suggest: "improvements"
}
Error Handling¶
Query Errors¶
{
error: {
type: "PARSE_ERROR",
message: "Unexpected token 'WHER' - did you mean 'WHERE'?",
position: 45,
suggestion: "WHERE"
}
}
Graceful Degradation¶
{
warning: "Semantic index unavailable, falling back to text search",
results: [...], // Still returns results
suggestions: ["Try again later for semantic results"]
}
Performance Considerations¶
1. Query Complexity Limits¶
[limits]
max_depth = 5 # Graph traversal
max_results = 10000 # Result set size
max_duration = 5000 # Query timeout (ms)
max_memory = 100 # Memory limit (MB)
2. Query Hints¶
Future Extensions¶
1. Multi-Modal Queries¶
2. Federated Queries¶
FEDERATE {
sources: ["local", "github", "google_drive"],
query: "project documentation",
merge_by: "similarity"
}
3. Predictive Queries¶
Conclusion¶
KQL is designed to grow with KOTA's cognitive capabilities. It bridges natural human expression with precise data operations, enabling true distributed cognition. The language will evolve based on usage patterns, becoming more intuitive and powerful over time.
The key innovation is treating queries not as database operations, but as cognitive requests - allowing KOTA to understand not just what you're looking for, but why you're looking for it.