GFQL DateTime Filtering Examples#

This notebook shows how to filter graph data by dates and times using GFQL predicates.

Table of Contents#

Key Temporal Filtering Concepts:

  1. Basic DateTime Filtering - Filter by specific dates and times

  2. Date-Only Filtering - Ignore time components

  3. Time-of-Day Filtering - Filter by time patterns

  4. Complex Temporal Queries - Combine with other predicates

  5. Temporal Value Classes - Explicit temporal objects

  6. Timezone-Aware Filtering - Handle timezone conversions

  7. Chain Operations - Multi-hop temporal queries

  8. Wire Protocol Dicts - JSON-compatible configuration

Quick Reference: - gt(), lt(), ge(), le() - Greater/less than comparisons - between() - Range queries - is_in() - Match specific values - DateTimeValue, DateValue, TimeValue - Explicit temporal types

[ ]:
# Standard Python datetime imports
import pandas as pd
import numpy as np
from datetime import datetime, date, time, timedelta
import pytz

# Graphistry imports
import graphistry
from graphistry import n, e_forward, e_reverse, e_undirected

# Temporal predicates
from graphistry.compute import (
    gt, lt, ge, le, eq, ne, between, is_in,
    DateTimeValue, DateValue, TimeValue
)

Setup: Create Sample Data#

Let’s create a sample dataset representing a transaction network with temporal data.

[2]:
# Generate sample transaction data
np.random.seed(42)

# Create nodes (accounts)
n_accounts = 100
accounts_df = pd.DataFrame({
    'account_id': [f'ACC_{i:04d}' for i in range(n_accounts)],
    'account_type': np.random.choice(['checking', 'savings', 'business'], n_accounts),
    'created_date': pd.date_range('2020-01-01', periods=n_accounts, freq='W'),
    'last_active': pd.date_range('2023-01-01', periods=n_accounts, freq='D') +
                   pd.to_timedelta(np.random.randint(0, 365, n_accounts), unit='D')
})

# Create edges (transactions)
n_transactions = 500
transactions_df = pd.DataFrame({
    'transaction_id': [f'TXN_{i:06d}' for i in range(n_transactions)],
    'source': np.random.choice(accounts_df['account_id'], n_transactions),
    'target': np.random.choice(accounts_df['account_id'], n_transactions),
    'amount': np.random.exponential(100, n_transactions).round(2),
    'timestamp': pd.date_range('2023-01-01', periods=n_transactions, freq='H') +
                 pd.to_timedelta(np.random.randint(0, 8760, n_transactions), unit='H'),
    'transaction_time': [time(np.random.randint(0, 24), np.random.randint(0, 60))
                        for _ in range(n_transactions)],
    'transaction_type': np.random.choice(['transfer', 'payment', 'deposit'], n_transactions)
})

print(f"Created {len(accounts_df)} accounts and {len(transactions_df)} transactions")
print(f"\nTransaction date range: {transactions_df['timestamp'].min()} to {transactions_df['timestamp'].max()}")
[3]:
# Create graphistry instance
g = graphistry.edges(transactions_df, 'source', 'target').nodes(accounts_df, 'account_id')
print(f"Graph: {len(g._nodes)} nodes, {len(g._edges)} edges")

1. Basic DateTime Filtering#

Filter transactions based on datetime values using edge predicates.

[4]:
# Filter transactions after a specific date
# First, filter the edges directly
cutoff_date = datetime(2023, 7, 1)
recent_edges = g._edges[gt(pd.Timestamp(cutoff_date))(g._edges['timestamp'])]
recent_g = g.edges(recent_edges)

print(f"Transactions after {cutoff_date}: {len(recent_g._edges)}")
recent_g._edges[['transaction_id', 'timestamp', 'amount']].head()
[5]:
# Alternative: Use chain with edge operations
# Start from all nodes, then follow edges with temporal filter
recent_chain = g.chain([
    n(),  # Start with all nodes
    e_forward({
        "timestamp": gt(pd.Timestamp(cutoff_date))
    })
])

print(f"Transactions after {cutoff_date} (chain): {len(recent_chain._edges)}")
[6]:
# Filter transactions in a specific month
march_edges = g._edges[
    between(
        datetime(2023, 3, 1),
        datetime(2023, 3, 31, 23, 59, 59)
    )(g._edges['timestamp'])
]
march_g = g.edges(march_edges)

print(f"Transactions in March 2023: {len(march_g._edges)}")
march_g._edges[['transaction_id', 'timestamp', 'amount']].head()

2. Date-Only Filtering#

Filter nodes based on dates, ignoring time components.

[7]:
# Filter accounts created after a specific date
new_accounts = g.chain([
    n(filter_dict={
        "created_date": ge(date(2021, 1, 1))
    })
])

print(f"Accounts created after 2021: {len(new_accounts._nodes)}")
new_accounts._nodes[['account_id', 'created_date', 'account_type']].head()
[8]:
# Find accounts active in the last 90 days
ninety_days_ago = datetime.now().date() - timedelta(days=90)
active_accounts = g.chain([
    n(filter_dict={
        "last_active": gt(pd.Timestamp(ninety_days_ago))
    })
])

print(f"Recently active accounts: {len(active_accounts._nodes)}")

3. Time-of-Day Filtering#

Filter transactions based on time of day.

[9]:
# Find transactions during business hours (9 AM - 5 PM)
business_hours_edges = g._edges[
    between(
        time(9, 0, 0),
        time(17, 0, 0)
    )(g._edges['transaction_time'])
]
business_hours_g = g.edges(business_hours_edges)

print(f"Business hour transactions: {len(business_hours_g._edges)}")
print(f"Percentage of total: {len(business_hours_g._edges) / len(g._edges) * 100:.1f}%")
[10]:
# Find transactions at specific times (e.g., on the hour)
on_the_hour_times = [time(h, 0, 0) for h in range(24)]
on_hour_edges = g._edges[
    is_in(on_the_hour_times)(g._edges['transaction_time'])
]
on_hour_g = g.edges(on_hour_edges)

print(f"Transactions on the hour: {len(on_hour_g._edges)}")

4. Complex Temporal Queries#

Combine temporal predicates with other filters for complex queries.

[11]:
# Find large transactions (>$500) in Q4 2023
q4_mask = between(
    datetime(2023, 10, 1),
    datetime(2023, 12, 31, 23, 59, 59)
)(g._edges['timestamp'])
large_mask = gt(500)(g._edges['amount'])

q4_large_edges = g._edges[q4_mask & large_mask]
q4_large_g = g.edges(q4_large_edges)

print(f"Large Q4 2023 transactions: {len(q4_large_g._edges)}")
if len(q4_large_g._edges) > 0:
    print(f"Total value: ${q4_large_g._edges['amount'].sum():,.2f}")
    print(f"Average: ${q4_large_g._edges['amount'].mean():,.2f}")
[12]:
# Multi-hop query: Find accounts that received money recently
# and then sent money to business accounts
thirty_days_ago = datetime.now() - timedelta(days=30)

# First, find recent transactions
recent_edges = g._edges[gt(pd.Timestamp(thirty_days_ago))(g._edges['timestamp'])]
recent_g = g.edges(recent_edges)

# Use chain to find money flow pattern
money_flow = recent_g.chain([
    # Start with any node
    n(),
    # Follow incoming edges (as destination)
    e_reverse(),
    # Go to source nodes
    n(),
    # Follow outgoing edges
    e_forward(),
    # To business accounts
    n(filter_dict={"account_type": "business"})
])

print(f"Money flow pattern found: {len(money_flow._nodes)} business accounts")

5. Using Temporal Value Classes#

Use explicit temporal value classes for more control.

[13]:
# Create temporal values with specific properties
dt_value = DateTimeValue("2023-06-15T14:30:00", "UTC")
date_value = DateValue("2023-06-15")
time_value = TimeValue("14:30:00")

# Use in predicates
specific_edges = g._edges[gt(dt_value)(g._edges['timestamp'])]
specific_g = g.edges(specific_edges)

print(f"Transactions after {dt_value.value}: {len(specific_g._edges)}")

6. Timezone-Aware Filtering#

Handle timezone-aware datetime comparisons.

[14]:
# Add timezone info to our data for this example
transactions_df_tz = transactions_df.copy()
transactions_df_tz['timestamp_utc'] = pd.to_datetime(transactions_df_tz['timestamp']).dt.tz_localize('UTC')
transactions_df_tz['timestamp_eastern'] = transactions_df_tz['timestamp_utc'].dt.tz_convert('US/Eastern')

g_tz = graphistry.edges(transactions_df_tz, 'source', 'target')

# Filter using Eastern time
eastern = pytz.timezone('US/Eastern')
eastern_cutoff = eastern.localize(datetime(2023, 7, 1, 9, 0, 0))  # 9 AM Eastern

eastern_morning_edges = g_tz._edges[
    gt(pd.Timestamp(eastern_cutoff))(g_tz._edges['timestamp_eastern'])
]
eastern_morning_g = g_tz.edges(eastern_morning_edges)

print(f"Transactions after 9 AM Eastern on July 1, 2023: {len(eastern_morning_g._edges)}")

7. Chain Operations with Temporal Edge Filters#

Demonstrate using temporal predicates in chain operations with proper edge filtering.

[15]:
# Find paths through recent high-value transactions
recent_high_value = g.chain([
    # Start from all nodes
    n(),
    # Follow edges with temporal and amount filters
    e_forward({
        "timestamp": gt(datetime.now() - timedelta(days=7)),
        "amount": gt(200)
    }),
    # Reach destination nodes
    n()
])

print(f"Recent high-value transaction paths:")
print(f"  Nodes: {len(recent_high_value._nodes)}")
print(f"  Edges: {len(recent_high_value._edges)}")
[ ]:
# Wire protocol dicts in is_in predicates
# Useful for checking against multiple specific timestamps

important_dates = [
    {"type": "datetime", "value": "2023-01-01T00:00:00", "timezone": "UTC"},  # New Year
    {"type": "datetime", "value": "2023-07-04T00:00:00", "timezone": "UTC"},  # July 4th
    {"type": "datetime", "value": "2023-12-25T00:00:00", "timezone": "UTC"},  # Christmas
]

# Note: This checks for exact timestamp matches
# For date matching, you'd need to extract the date portion
holiday_pred = is_in(important_dates)

# For demonstration, let's check if any transactions happened exactly at midnight on these days
# (In real data, you'd probably want to check date ranges instead)
print(f"Checking for transactions at midnight on holidays...")
print(f"(This is likely 0 unless transactions were specifically created at midnight)")

Summary#

This notebook demonstrated:

  1. DateTime filtering with gt, lt, between predicates on edges

  2. Date-only filtering for day-level granularity on nodes

  3. Time-of-day filtering for patterns like business hours

  4. Complex queries combining temporal and non-temporal predicates

  5. Multi-hop queries with temporal constraints using chain operations

  6. Temporal value classes for explicit control

  7. Timezone-aware filtering

  8. Wire protocol dictionaries for JSON-compatible predicate configuration

  9. Proper chain syntax with edge filters in e_forward() and node filters in n()

Key takeaways: - Temporal predicates work seamlessly with pandas datetime types - Wire protocol dicts enable configuration-driven filtering: gt({"type": "datetime", "value": "2023-01-01T00:00:00", "timezone": "UTC"}) - Timezone awareness is built-in for accurate cross-timezone comparisons - Complex temporal patterns can be expressed through chain operations

Temporal predicates in GFQL provide a powerful way to analyze time-series aspects of graph data, enabling complex temporal queries while maintaining the expressiveness of graph traversals.

[ ]:
# Build predicates programmatically with wire protocol dicts
def create_date_filter(year, month, day, comparison="gt"):
    """Create a date filter using wire protocol format"""
    date_dict = {
        "type": "date",
        "value": f"{year:04d}-{month:02d}-{day:02d}"
    }

    if comparison == "gt":
        return gt(date_dict)
    elif comparison == "lt":
        return lt(date_dict)
    elif comparison == "ge":
        return ge(date_dict)
    elif comparison == "le":
        return le(date_dict)
    else:
        raise ValueError(f"Unknown comparison: {comparison}")

# Use the programmatic filter
filter_2023 = create_date_filter(2023, 1, 1, "ge")
accounts_2023 = g.chain([
    n(filter_dict={
        "created_date": filter_2023
    })
])

print(f"Accounts created in 2023 or later: {len(accounts_2023._nodes)}")
[ ]:
# Example: Load predicate configuration from JSON
import json

# Simulate loading from a JSON config file
config_json = '''
{
    "filters": {
        "recent_transactions": {
            "timestamp": {
                "type": "gt",
                "value": {
                    "type": "datetime",
                    "value": "2023-10-01T00:00:00",
                    "timezone": "UTC"
                }
            }
        },
        "business_hours": {
            "transaction_time": {
                "type": "between",
                "start": {"type": "time", "value": "09:00:00"},
                "end": {"type": "time", "value": "17:00:00"}
            }
        }
    }
}
'''

config = json.loads(config_json)

# Use the wire protocol dict directly
recent_filter = config["filters"]["recent_transactions"]["timestamp"]["value"]
recent_edges = g._edges[gt(recent_filter)(g._edges['timestamp'])]
recent_g = g.edges(recent_edges)

print(f"Recent transactions (from JSON config): {len(recent_g._edges)}")
[ ]:
# Wire protocol dictionaries work directly in Python
# These are equivalent:
pred1 = gt(pd.Timestamp("2023-07-01"))
pred2 = gt({"type": "datetime", "value": "2023-07-01T00:00:00", "timezone": "UTC"})

# Test they produce the same results
result1 = pred1(g._edges['timestamp'])
result2 = pred2(g._edges['timestamp'])
print(f"Results are identical: {result1.equals(result2)}")
print(f"Transactions after July 1, 2023: {result1.sum()}")

8. Using Wire Protocol Dictionaries#

You can pass wire protocol dictionaries directly to temporal predicates. This is useful for: - Loading predicate configurations from JSON files - Building predicates programmatically - Sharing predicate definitions between systems

What’s Next?#

[16]:
# Complex multi-hop with temporal constraints
# Find 2-hop paths through recent transactions
two_hop_recent = g.chain([
    # Start from business accounts
    n(filter_dict={"account_type": "business"}),
    # First hop: recent outgoing transactions
    e_forward({
        "timestamp": gt(datetime.now() - timedelta(days=30))
    }, name="hop1"),
    # Intermediate nodes
    n(),
    # Second hop: any transaction
    e_forward(name="hop2"),
    # Final nodes
    n()
])

print(f"2-hop paths from business accounts through recent transactions:")
print(f"  Total edges: {len(two_hop_recent._edges)}")
print(f"  Hop 1 edges: {two_hop_recent._edges['hop1'].sum()}")
print(f"  Hop 2 edges: {two_hop_recent._edges['hop2'].sum()}")

Summary#

This notebook demonstrated:

  1. DateTime filtering with gt, lt, between predicates on edges

  2. Date-only filtering for day-level granularity on nodes

  3. Time-of-day filtering for patterns like business hours

  4. Complex queries combining temporal and non-temporal predicates

  5. Multi-hop queries with temporal constraints using chain operations

  6. Temporal value classes for explicit control

  7. Timezone-aware filtering

  8. Proper chain syntax with edge filters in e_forward() and node filters in n()

Temporal predicates in GFQL provide a powerful way to analyze time-series aspects of graph data, enabling complex temporal queries while maintaining the expressiveness of graph traversals.