Sai Charan - C++ Developer

Domain Customization in stdexec

Design and Implementation Guide @ 2025

Sai Charan Arvapally
University of Alberta, Canada
View Project on GitHub

Abstract

This paper provides a comprehensive guide to domain customization in stdexec, enabling scheduler authors to provide optimized implementations of sender algorithms. Domain customization allows execution contexts to intercept and transform sender expressions before they are connected to receivers, enabling substantial performance optimizations while maintaining composability.

Table of Contents


Introduction

The sender/receiver model in stdexec provides a composable framework for asynchronous computation. However, generic algorithm implementations may not leverage the full capabilities of specialized execution contexts. Domain customization bridges this gap by allowing execution contexts to provide optimized implementations of sender algorithms.


Motivation

Problem Statement

Consider a parallel execution context that manages a thread pool optimized for bulk operations. When users compose sender chains involving bulk work:

auto work = stdexec::schedule(parallel_sched)
  | stdexec::then(prepare_data)
  | stdexec::bulk(10000, [](std::size_t i, auto data) {
      process_item(i, data);
    });

The generic bulk algorithm may:

Current Limitations (Before)

Without domain customization, bulk operations use generic implementations:

// Generic bulk implementation
template<typename Sender, typename Shape, typename Func>
auto bulk(Sender&& snd, Shape shape, Func func) {
  return generic_bulk_sender{
    std::forward<Sender>(snd), shape, func
  };
  // Uses default thread spawning, no work-stealing
  // No awareness of underlying execution context
}

Problems:

Domain Customization Approach (After)

With domain customization, execution contexts can provide optimized implementations:

// Custom domain intercepts bulk operations
struct thread_pool_domain : stdexec::default_domain {
  template<bulk_sender Sender>
  auto transform_sender(Sender&& snd) const {
    return optimized_bulk_sender{
      extract_scheduler(snd),
      extract_params(snd)...
    };
    // Uses work-stealing, vectorization, NUMA awareness
  }
};

Benefits:


Design Overview

Fundamentals

Domain customization in stdexec operates through a sophisticated interplay of four key components:

Scheduler Domain
get_domain() transform_sender
Custom Senders
Optimized Implementation
  1. Domain: Customization point that transforms senders
  2. Scheduler: Associates with domain via get_domain() query
  3. Transform Objects: Extract and repackage sender parameters
  4. Custom Senders: Provide optimized implementations

Transformation Patterns

Domain customization supports two environment patterns based on where schedulers live in the sender/receiver pipeline:

Pattern 1: Completion Scheduler (__completes_on)

The __completes_on<Sender, Scheduler> concept detects whether a sender has a specific scheduler as its completion scheduler. This happens when you call stdexec::schedule(sched) - that scheduler becomes "baked into" the sender chain for all subsequent operations.

auto work = stdexec::schedule(my_sched)  // Sets completion scheduler
  | stdexec::bulk(n, func);               // Transformed by domain

Pattern 2: Receiver Environment (__starts_on)

The __starts_on<Sender, Scheduler, Env> concept detects whether a sender will start execution on a specific scheduler when run in a given environment. This happens when you pass a scheduler to sync_wait() or similar functions - the scheduler becomes part of the receiver environment.

auto work = stdexec::just() 
  | stdexec::bulk(n, func);

stdexec::sync_wait(std::move(work), my_sched);  // Environment provides scheduler

Technical Specification

Understanding Scheduler Detection

Before implementing domain customization, it's crucial to understand how stdexec detects schedulers in the sender/receiver pipeline:

The __completes_on Concept

// This sender "completes on" my_scheduler
auto sender = stdexec::schedule(my_scheduler)  // <-- scheduler here
  | stdexec::then([](){ return 42; });

// Check: does sender complete on my_scheduler? 
static_assert(stdexec::__completes_on<decltype(sender), my_scheduler>);  // true

When you call stdexec::schedule(sched), that scheduler becomes the "completion scheduler" for the entire sender chain. Any subsequent operations (then, bulk_chunked, etc.) will complete on that scheduler.

The __starts_on Concept

// This sender has NO completion scheduler
auto sender = stdexec::just(42)
  | stdexec::bulk_chunked(1000, [](int i, int val) { work(i, val); });

// But when we sync_wait with a scheduler...
stdexec::sync_wait(sender, my_scheduler);  // <-- scheduler in environment

// Check: does sender start on my_scheduler in this environment?
static_assert(stdexec::__starts_on<decltype(sender), my_scheduler, some_env>);  // true

When you pass a scheduler to sync_wait() or similar functions, that scheduler becomes part of the "receiver environment" and will be used to start execution.

Domain Interface

struct domain_concept {
  // Transform sender without environment context
  template<typename Sender>
  auto transform_sender(Sender&& snd) const noexcept -> /*sender*/;
  
  // Transform sender with environment context  
  template<typename Sender, typename Env>
  auto transform_sender(Sender&& snd, const Env& env) const noexcept -> /*sender*/;
};

Scheduler Integration

Schedulers associate with domains via the get_domain query:

class my_scheduler {
public:
  auto query(stdexec::get_domain_t) const noexcept {
    return my_domain{};
  }
  
  // ... other scheduler interface
};

Implementation Guidance

Flowchart 1: Sender Interception and Transformation

User Writes: schedule(sched) | bulk_chunked(n,f)
Domain Sees: bulk_sender
Transform Object: Extract params & policy
Domain Creates: optimized_bulk_sender
User Gets: Same interface, MASSIVE speedup!

This shows the main magic of domain customization.

Flowchart 2: Core Components Architecture

Scheduler
queries
Domain
uses
Transform Objects
creates
Custom Senders
Scheduler
executes on
Custom Senders
• Scheduler queries Domain
• Domain uses Transform Objects
• Transform Objects create Custom Senders
• Scheduler executes on Custom Senders

Shows how the 4 key components connect.

Step-by-Step Implementation

Step 1: Define Your Domain

struct my_domain : stdexec::default_domain {
  // Constrain which senders to transform
  template<typename Sender>
  concept transformable_sender = 
    stdexec::sender_expr_for<Sender, stdexec::bulk_chunked_t> ||
    stdexec::sender_expr_for<Sender, stdexec::bulk_unchunked_t>;
    
  template<transformable_sender Sender>
  auto transform_sender(Sender&& snd) const noexcept;
  
  template<transformable_sender Sender, typename Env>
  auto transform_sender(Sender&& snd, const Env& env) const noexcept;
};

Step 2: Implement Transform Objects

struct transform_bulk {
  template<typename Data, typename Previous>
  auto operator()(stdexec::bulk_chunked_t, Data&& data, Previous&& prev) const {
    auto&& [policy, shape, func] = std::forward<Data>(data);
    
    // Extract execution policy
    using policy_t = std::remove_cvref_t<decltype(policy.__get())>;
    constexpr bool parallel = std::same_as<policy_t, stdexec::parallel_policy>;
    
    return my_bulk_sender</*chunked=*/false, Previous, decltype(shape), 
                         decltype(func), parallel>{
      scheduler_, std::forward<Previous>(prev), shape, std::move(func)
    };
  }
  
  my_scheduler scheduler_;
};

Step 3: Complete Domain Implementation

template <my_domain::transformable_sender Sender>
auto my_domain::transform_sender(Sender&& snd) const noexcept {
  // Case 1: Scheduler in sender chain
  if constexpr (stdexec::__completes_on<Sender, my_scheduler>) {
    auto sched = stdexec::get_completion_scheduler<stdexec::set_value_t>(
      stdexec::get_env(snd));
    return stdexec::__sexpr_apply(
      std::forward<Sender>(snd), transform_bulk{sched});
  } else {
    static_assert(stdexec::__completes_on<Sender, my_scheduler>,
      "No my_scheduler found in sender's completion environment");
    return not_a_sender<stdexec::__name_of<Sender>>{};
  }
}

Error Handling Patterns

For unsupported configurations, provide clear diagnostics:

template<typename Sender>
struct not_a_sender {
  using sender_concept = stdexec::sender_t;
  
    // Note: static_assert(false) may not work in all compiler contexts
  // Use static_assert with dependent condition instead
  static_assert(std::is_void_v<Sender>, 
    "Sender cannot be optimized: no compatible scheduler found");
};

Important Notes:


Real-World Examples

System Context (stdexec)

The system context implementation demonstrates production-quality domain customization:

struct __parallel_scheduler_domain : stdexec::default_domain {
  template<__bulk_chunked_or_unchunked Sender>
  auto transform_sender(Sender&& __sndr) const noexcept {
    if constexpr (stdexec::__completes_on<Sender, parallel_scheduler>) {
      auto __sched = stdexec::get_completion_scheduler<stdexec::set_value_t>(
        stdexec::get_env(__sndr));
      return stdexec::__sexpr_apply(
        static_cast<Sender&&>(__sndr), 
        __transform_parallel_bulk_sender{__sched});
    } else {
      static_assert(stdexec::__completes_on<Sender, parallel_scheduler>,
        "No parallel_scheduler instance found");
      return __not_a_sender<stdexec::__name_of<Sender>>();
    }
  }
};

Key Features:

HPX Thread Pool Scheduler

HPX provides another excellent example with policy-aware optimization:

template<typename Policy>  
struct thread_pool_domain : stdexec::default_domain {
  template<__bulk_chunked_or_unchunked Sender>
  auto transform_sender(Sender&& sndr) const noexcept {
    auto&& [tag, data, child] = sndr;
    auto&& [pol, shape, f] = data;
    
    return hpx::execution::experimental::detail::
      thread_pool_bulk_sender<Policy, /*...*/>{
        HPX_MOVE(sched), child, shape, f
      };
  }
  
  thread_pool_scheduler<Policy> sched;
};

Advantages:


Performance Considerations

Compilation Impact

Domain customization operates entirely at compile-time:

Memory Efficiency

Domain customization enables:


Design Rationale

Why Two Transform Overloads?

The dual overload design handles two distinct composition patterns based on the scheduler detection concepts:

Transformation Patterns

Pattern 1: __completes_on
Scheduler in sender chain
schedule(sched)
| bulk_chunked(n,f)
Pattern 2: __starts_on
Scheduler in environment
just(42) | bulk_chunked(n,f)
sync_wait(work, sched)

Your domain must check both patterns to catch all optimization opportunities.

// Pattern 1: Scheduler in sender chain (__completes_on)
auto work1 = stdexec::schedule(my_sched) | algorithm(...);

// Pattern 2: Scheduler in execution context (__starts_on)  
auto work2 = algorithm(...);
stdexec::sync_wait(work2, my_sched);

Your domain must check both cases to catch all optimization opportunities:

template <bulk_sender Sender>
auto transform_sender(Sender&& snd) const noexcept {
  // Case 1: Scheduler in sender chain
  if constexpr (stdexec::__completes_on<Sender, my_scheduler>) {
    auto sched = stdexec::get_completion_scheduler<stdexec::set_value_t>(
      stdexec::get_env(snd));
    return optimize_with_scheduler(std::forward<Sender>(snd), sched);
  }
  // Case 1 failed - no optimization possible
}

template <bulk_sender Sender, typename Env>  
auto transform_sender(Sender&& snd, const Env& env) const noexcept {
  // Case 1: Still try completion scheduler first
  if constexpr (stdexec::__completes_on<Sender, my_scheduler>) {
    // Same as above...
  } 
  // Case 2: Scheduler in receiver environment
  else if constexpr (stdexec::__starts_on<Sender, my_scheduler, Env>) {
    auto sched = stdexec::get_scheduler(env);
    return optimize_with_scheduler(std::forward<Sender>(snd), sched);
  }
  // Both cases failed - no optimization possible
}

Key Insight:

Why __sexpr_apply?

Using __sexpr_apply instead of direct structured binding provides:

Error Handling Philosophy

Compile-time errors prevent runtime failures:


Best Practices

Domain Design

Do:

Don't:

Scheduler Integration

Do:

Don't:


Conclusion

Domain customization in stdexec provides a powerful mechanism for execution context optimization while preserving composability. The design enables substantial performance improvements through context-aware algorithm implementations.

Key takeaways:

Scheduler authors can leverage domain customization to unlock the full performance potential of their execution contexts while maintaining the elegant composability that makes sender/receiver programming so powerful.


Special Thanks

My deepest thanks go to Hartmut Kaiser and Isidoros Tsaousis-Seiras for their mentorship and invaluable insights throughout this project.


References

  1. [P2300R10] std::execution -- Structured Concurrency
  2. [stdexec] NVIDIA stdexec - Reference implementation
  3. [HPX] HPX - High Performance ParalleX runtime system
  4. [System Context] Production domain customization example in stdexec