Domain Customization in stdexec

Design and Implementation Guide @ 2025

Sai Charan Arvapally
University of Alberta, Canada
View Project on GitHub

Abstract

This paper provides a comprehensive guide to domain customization in stdexec, enabling scheduler authors to provide optimized implementations of sender algorithms. Domain customization allows execution contexts to intercept and transform sender expressions before they are connected to receivers, enabling substantial performance optimizations while maintaining composability.

Introduction
Motivation
Design Overview
Technical Specification
Implementation Guidance
Real-World Examples
Performance Considerations
Design Rationale
Best Practices
Conclusion
Special Thanks
References

Introduction

The sender/receiver model in stdexec provides a composable framework for asynchronous computation. However, generic algorithm implementations may not leverage the full capabilities of specialized execution contexts. Domain customization bridges this gap by allowing execution contexts to provide optimized implementations of sender algorithms.

Motivation

Problem Statement

Consider a parallel execution context that manages a thread pool optimized for bulk operations. When users compose sender chains involving bulk work:

auto work = stdexec::schedule(parallel_sched)
  | stdexec::then(prepare_data)
  | stdexec::bulk(10000, [](std::size_t i, auto data) {
      process_item(i, data);
    });

The generic bulk algorithm may:

Not utilize the thread pool's work-stealing capabilities
Create unnecessary intermediate allocations
Miss opportunities for vectorization
Ignore the execution context's scheduling policies

Current Limitations (Before)

Without domain customization, bulk operations use generic implementations:

// Generic bulk implementation
template<typename Sender, typename Shape, typename Func>
auto bulk(Sender&& snd, Shape shape, Func func) {
  return generic_bulk_sender{
    std::forward<Sender>(snd), shape, func
  };
  // Uses default thread spawning, no work-stealing
  // No awareness of underlying execution context
}

Problems:

Suboptimal performance on specialized hardware
Cannot leverage context-specific optimizations
One-size-fits-all approach limits efficiency

Domain Customization Approach (After)

With domain customization, execution contexts can provide optimized implementations:

// Custom domain intercepts bulk operations
struct thread_pool_domain : stdexec::default_domain {
  template<bulk_sender Sender>
  auto transform_sender(Sender&& snd) const {
    return optimized_bulk_sender{
      extract_scheduler(snd),
      extract_params(snd)...
    };
    // Uses work-stealing, vectorization, NUMA awareness
  }
};

Benefits:

Seamless integration with existing sender chains
Context-aware optimizations (NUMA, cache hierarchy)
Maintains composability and type safety

Design Overview

Fundamentals

Domain customization in stdexec operates through a sophisticated interplay of four key components:

Scheduler	Domain
`get_domain()`	`transform_sender`
Custom Senders
Optimized Implementation

Domain: Customization point that transforms senders
Scheduler: Associates with domain via get_domain() query
Transform Objects: Extract and repackage sender parameters
Custom Senders: Provide optimized implementations

Transformation Patterns

Domain customization supports two environment patterns based on where schedulers live in the sender/receiver pipeline:

Pattern 1: Completion Scheduler (`__completes_on`)

The __completes_on<Sender, Scheduler> concept detects whether a sender has a specific scheduler as its completion scheduler. This happens when you call stdexec::schedule(sched) - that scheduler becomes "baked into" the sender chain for all subsequent operations.

auto work = stdexec::schedule(my_sched)  // Sets completion scheduler
  | stdexec::bulk(n, func);               // Transformed by domain

Pattern 2: Receiver Environment (`__starts_on`)

The __starts_on<Sender, Scheduler, Env> concept detects whether a sender will start execution on a specific scheduler when run in a given environment. This happens when you pass a scheduler to sync_wait() or similar functions - the scheduler becomes part of the receiver environment.

auto work = stdexec::just() 
  | stdexec::bulk(n, func);

stdexec::sync_wait(std::move(work), my_sched);  // Environment provides scheduler

Technical Specification

Understanding Scheduler Detection

Before implementing domain customization, it's crucial to understand how stdexec detects schedulers in the sender/receiver pipeline:

The `__completes_on` Concept

// This sender "completes on" my_scheduler
auto sender = stdexec::schedule(my_scheduler)  // <-- scheduler here
  | stdexec::then([](){ return 42; });

// Check: does sender complete on my_scheduler? 
static_assert(stdexec::__completes_on<decltype(sender), my_scheduler>);  // true

When you call stdexec::schedule(sched), that scheduler becomes the "completion scheduler" for the entire sender chain. Any subsequent operations (then, bulk_chunked, etc.) will complete on that scheduler.

The `__starts_on` Concept

// This sender has NO completion scheduler
auto sender = stdexec::just(42)
  | stdexec::bulk_chunked(1000, [](int i, int val) { work(i, val); });

// But when we sync_wait with a scheduler...
stdexec::sync_wait(sender, my_scheduler);  // <-- scheduler in environment

// Check: does sender start on my_scheduler in this environment?
static_assert(stdexec::__starts_on<decltype(sender), my_scheduler, some_env>);  // true

When you pass a scheduler to sync_wait() or similar functions, that scheduler becomes part of the "receiver environment" and will be used to start execution.

Domain Interface

struct domain_concept {
  // Transform sender without environment context
  template<typename Sender>
  auto transform_sender(Sender&& snd) const noexcept -> /*sender*/;
  
  // Transform sender with environment context  
  template<typename Sender, typename Env>
  auto transform_sender(Sender&& snd, const Env& env) const noexcept -> /*sender*/;
};

Scheduler Integration

Schedulers associate with domains via the get_domain query:

class my_scheduler {
public:
  auto query(stdexec::get_domain_t) const noexcept {
    return my_domain{};
  }
  
  // ... other scheduler interface
};

Implementation Guidance

Flowchart 1: Sender Interception and Transformation

User Writes: schedule(sched) | bulk_chunked(n,f)

↓

Domain Sees: bulk_sender

↓

Transform Object: Extract params & policy

↓

Domain Creates: optimized_bulk_sender

↓

User Gets: Same interface, MASSIVE speedup!

This shows the main magic of domain customization.

Flowchart 2: Core Components Architecture

Scheduler

→

queries

Domain

↓

uses

Transform Objects

↓

creates

Custom Senders

↖

Scheduler
executes on
Custom Senders

↗

↑

• Scheduler queries Domain

• Domain uses Transform Objects

• Transform Objects create Custom Senders

• Scheduler executes on Custom Senders

Shows how the 4 key components connect.

Step-by-Step Implementation

Step 1: Define Your Domain

struct my_domain : stdexec::default_domain {
  // Constrain which senders to transform
  template<typename Sender>
  concept transformable_sender = 
    stdexec::sender_expr_for<Sender, stdexec::bulk_chunked_t> ||
    stdexec::sender_expr_for<Sender, stdexec::bulk_unchunked_t>;
    
  template<transformable_sender Sender>
  auto transform_sender(Sender&& snd) const noexcept;
  
  template<transformable_sender Sender, typename Env>
  auto transform_sender(Sender&& snd, const Env& env) const noexcept;
};

Step 2: Implement Transform Objects

struct transform_bulk {
  template<typename Data, typename Previous>
  auto operator()(stdexec::bulk_chunked_t, Data&& data, Previous&& prev) const {
    auto&& [policy, shape, func] = std::forward<Data>(data);
    
    // Extract execution policy
    using policy_t = std::remove_cvref_t<decltype(policy.__get())>;
    constexpr bool parallel = std::same_as<policy_t, stdexec::parallel_policy>;
    
    return my_bulk_sender</*chunked=*/false, Previous, decltype(shape), 
                         decltype(func), parallel>{
      scheduler_, std::forward<Previous>(prev), shape, std::move(func)
    };
  }
  
  my_scheduler scheduler_;
};

Step 3: Complete Domain Implementation

template <my_domain::transformable_sender Sender>
auto my_domain::transform_sender(Sender&& snd) const noexcept {
  // Case 1: Scheduler in sender chain
  if constexpr (stdexec::__completes_on<Sender, my_scheduler>) {
    auto sched = stdexec::get_completion_scheduler<stdexec::set_value_t>(
      stdexec::get_env(snd));
    return stdexec::__sexpr_apply(
      std::forward<Sender>(snd), transform_bulk{sched});
  } else {
    static_assert(stdexec::__completes_on<Sender, my_scheduler>,
      "No my_scheduler found in sender's completion environment");
    return not_a_sender<stdexec::__name_of<Sender>>{};
  }
}

Error Handling Patterns

For unsupported configurations, provide clear diagnostics:

template<typename Sender>
struct not_a_sender {
  using sender_concept = stdexec::sender_t;
  
    // Note: static_assert(false) may not work in all compiler contexts
  // Use static_assert with dependent condition instead
  static_assert(std::is_void_v<Sender>, 
    "Sender cannot be optimized: no compatible scheduler found");
};

Important Notes:

Compiler compatibility: The pattern static_assert(false) may not work consistently across all compilers. Using a dependent condition like std::is_void_v<Sender> ensures the assertion is template-dependent and properly delayed.
SFINAE considerations: The not_a_sender approach works well when you want hard compilation errors. For SFINAE-friendly detection where you want overloads to be quietly removed from consideration, use concepts or std::enable_if instead.
Alternative approaches: You can use requires clauses on the transform functions rather than error types for cleaner diagnostics.

Real-World Examples

System Context (stdexec)

The system context implementation demonstrates production-quality domain customization:

struct __parallel_scheduler_domain : stdexec::default_domain {
  template<__bulk_chunked_or_unchunked Sender>
  auto transform_sender(Sender&& __sndr) const noexcept {
    if constexpr (stdexec::__completes_on<Sender, parallel_scheduler>) {
      auto __sched = stdexec::get_completion_scheduler<stdexec::set_value_t>(
        stdexec::get_env(__sndr));
      return stdexec::__sexpr_apply(
        static_cast<Sender&&>(__sndr), 
        __transform_parallel_bulk_sender{__sched});
    } else {
      static_assert(stdexec::__completes_on<Sender, parallel_scheduler>,
        "No parallel_scheduler instance found");
      return __not_a_sender<stdexec::__name_of<Sender>>();
    }
  }
};

Key Features:

Precise sender matching using concepts
Compile-time error diagnostics
Integration with backend abstraction layer
Support for both chunked and unchunked bulk operations

HPX Thread Pool Scheduler

HPX provides another excellent example with policy-aware optimization:

template<typename Policy>  
struct thread_pool_domain : stdexec::default_domain {
  template<__bulk_chunked_or_unchunked Sender>
  auto transform_sender(Sender&& sndr) const noexcept {
    auto&& [tag, data, child] = sndr;
    auto&& [pol, shape, f] = data;
    
    return hpx::execution::experimental::detail::
      thread_pool_bulk_sender<Policy, /*...*/>{
        HPX_MOVE(sched), child, shape, f
      };
  }
  
  thread_pool_scheduler<Policy> sched;
};

Advantages:

Policy-driven optimization selection
Structured binding for clean parameter extraction
Integration with HPX's sophisticated work-stealing runtime

Performance Considerations

Compilation Impact

Domain customization operates entirely at compile-time:

Zero runtime overhead for transformation logic
Template instantiation cost proportional to sender complexity
Improved codegen through context-specific optimizations

Memory Efficiency

Domain customization enables:

Allocation elimination through in-place execution
Cache-friendly layouts via memory pool integration
NUMA optimization through topology-aware scheduling

Design Rationale

Why Two Transform Overloads?

The dual overload design handles two distinct composition patterns based on the scheduler detection concepts:

Transformation Patterns

Pattern 1: `__completes_on`

Scheduler in sender chain

schedule(sched)

↓

| bulk_chunked(n,f)

Pattern 2: `__starts_on`

Scheduler in environment

just(42) | bulk_chunked(n,f)

↓

sync_wait(work, sched)

Your domain must check both patterns to catch all optimization opportunities.

// Pattern 1: Scheduler in sender chain (__completes_on)
auto work1 = stdexec::schedule(my_sched) | algorithm(...);

// Pattern 2: Scheduler in execution context (__starts_on)  
auto work2 = algorithm(...);
stdexec::sync_wait(work2, my_sched);

Your domain must check both cases to catch all optimization opportunities:

template <bulk_sender Sender>
auto transform_sender(Sender&& snd) const noexcept {
  // Case 1: Scheduler in sender chain
  if constexpr (stdexec::__completes_on<Sender, my_scheduler>) {
    auto sched = stdexec::get_completion_scheduler<stdexec::set_value_t>(
      stdexec::get_env(snd));
    return optimize_with_scheduler(std::forward<Sender>(snd), sched);
  }
  // Case 1 failed - no optimization possible
}

template <bulk_sender Sender, typename Env>  
auto transform_sender(Sender&& snd, const Env& env) const noexcept {
  // Case 1: Still try completion scheduler first
  if constexpr (stdexec::__completes_on<Sender, my_scheduler>) {
    // Same as above...
  } 
  // Case 2: Scheduler in receiver environment
  else if constexpr (stdexec::__starts_on<Sender, my_scheduler, Env>) {
    auto sched = stdexec::get_scheduler(env);
    return optimize_with_scheduler(std::forward<Sender>(snd), sched);
  }
  // Both cases failed - no optimization possible
}

Key Insight:

__completes_on: Scheduler is "baked into" the sender chain
__starts_on: Scheduler comes from the execution environment
Domain customization: Must check both to catch all optimization opportunities!

Why `__sexpr_apply`?

Using __sexpr_apply instead of direct structured binding provides:

Type safety through proper forwarding
Extensibility for future sender expression formats
Consistent interface across different sender types

Error Handling Philosophy

Compile-time errors prevent runtime failures:

Static assertions provide clear diagnostics
not_a_sender types enable SFINAE-friendly detection
Graceful degradation when optimization unavailable

Best Practices

Domain Design

Do:

Inherit from stdexec::default_domain
Use concepts for precise sender matching
Implement both transform overloads
Provide clear error messages

Don't:

Transform senders you don't optimize
Break sender semantic contracts
Ignore lifetime and forwarding rules

Scheduler Integration

Do:

Associate domain via get_domain() query
Support standard completion signatures
Maintain environment forwarding

Don't:

Assume specific sender structures
Couple tightly to implementation details

Conclusion

Domain customization in stdexec provides a powerful mechanism for execution context optimization while preserving composability. The design enables substantial performance improvements through context-aware algorithm implementations.

Key takeaways:

Zero-cost abstraction: Compile-time transformation with no runtime overhead
Maintained composability: Optimizations are transparent to user code
Production ready: Successfully deployed in stdexec and HPX

Scheduler authors can leverage domain customization to unlock the full performance potential of their execution contexts while maintaining the elegant composability that makes sender/receiver programming so powerful.

Special Thanks

My deepest thanks go to Hartmut Kaiser and Isidoros Tsaousis-Seiras for their mentorship and invaluable insights throughout this project.

References

[P2300R10] std::execution -- Structured Concurrency
[stdexec] NVIDIA stdexec - Reference implementation
[HPX] HPX - High Performance ParalleX runtime system
[System Context] Production domain customization example in stdexec

Domain Customization in stdexec

Design and Implementation Guide @ 2025

Abstract

Table of Contents

Introduction

Motivation

Problem Statement

Current Limitations (Before)

Domain Customization Approach (After)

Design Overview

Fundamentals

Transformation Patterns

Pattern 1: Completion Scheduler (__completes_on)

Pattern 2: Receiver Environment (__starts_on)

Technical Specification

Understanding Scheduler Detection

The __completes_on Concept

The __starts_on Concept

Domain Interface

Scheduler Integration

Implementation Guidance

Flowchart 1: Sender Interception and Transformation

Flowchart 2: Core Components Architecture

Step-by-Step Implementation

Step 1: Define Your Domain

Step 2: Implement Transform Objects

Step 3: Complete Domain Implementation

Error Handling Patterns

Real-World Examples

System Context (stdexec)

HPX Thread Pool Scheduler

Performance Considerations

Compilation Impact

Memory Efficiency

Design Rationale

Why Two Transform Overloads?

Transformation Patterns

Pattern 1: __completes_on

Pattern 2: __starts_on

Why __sexpr_apply?

Error Handling Philosophy

Best Practices

Domain Design

Scheduler Integration

Conclusion

Special Thanks

References

Pattern 1: Completion Scheduler (`__completes_on`)

Pattern 2: Receiver Environment (`__starts_on`)

The `__completes_on` Concept

The `__starts_on` Concept

Pattern 1: `__completes_on`

Pattern 2: `__starts_on`

Why `__sexpr_apply`?