Sai Charan Arvapally
University of Alberta, Canada
View Project on GitHub
This paper provides a comprehensive guide to domain customization in stdexec,
enabling scheduler authors to provide optimized implementations of sender algorithms.
Domain customization allows execution contexts to intercept and transform sender
expressions before they are connected to receivers, enabling substantial performance
optimizations while maintaining composability.
The sender/receiver model in stdexec provides a composable framework
for asynchronous computation. However, generic algorithm implementations may not
leverage the full capabilities of specialized execution contexts. Domain customization
bridges this gap by allowing execution contexts to provide optimized implementations
of sender algorithms.
Consider a parallel execution context that manages a thread pool optimized for bulk operations. When users compose sender chains involving bulk work:
auto work = stdexec::schedule(parallel_sched) | stdexec::then(prepare_data) | stdexec::bulk(10000, [](std::size_t i, auto data) { process_item(i, data); });
The generic bulk algorithm may:
Without domain customization, bulk operations use generic implementations:
// Generic bulk implementation template<typename Sender, typename Shape, typename Func> auto bulk(Sender&& snd, Shape shape, Func func) { return generic_bulk_sender{ std::forward<Sender>(snd), shape, func }; // Uses default thread spawning, no work-stealing // No awareness of underlying execution context }
Problems:
With domain customization, execution contexts can provide optimized implementations:
// Custom domain intercepts bulk operations struct thread_pool_domain : stdexec::default_domain { template<bulk_sender Sender> auto transform_sender(Sender&& snd) const { return optimized_bulk_sender{ extract_scheduler(snd), extract_params(snd)... }; // Uses work-stealing, vectorization, NUMA awareness } };
Benefits:
Domain customization in stdexec operates through a sophisticated interplay of four key components:
| Scheduler | Domain |
|---|---|
get_domain() |
transform_sender |
| Custom Senders | |
| Optimized Implementation | |
get_domain() queryDomain customization supports two environment patterns based on where schedulers live in the sender/receiver pipeline:
__completes_on)The __completes_on<Sender, Scheduler> concept detects whether a sender has a specific scheduler as its completion scheduler. This happens when you call stdexec::schedule(sched) - that scheduler becomes "baked into" the sender chain for all subsequent operations.
auto work = stdexec::schedule(my_sched) // Sets completion scheduler | stdexec::bulk(n, func); // Transformed by domain
__starts_on)The __starts_on<Sender, Scheduler, Env> concept detects whether a sender will start execution on a specific scheduler when run in a given environment. This happens when you pass a scheduler to sync_wait() or similar functions - the scheduler becomes part of the receiver environment.
auto work = stdexec::just() | stdexec::bulk(n, func); stdexec::sync_wait(std::move(work), my_sched); // Environment provides scheduler
Before implementing domain customization, it's crucial to understand how stdexec detects schedulers in the sender/receiver pipeline:
__completes_on Concept// This sender "completes on" my_scheduler auto sender = stdexec::schedule(my_scheduler) // <-- scheduler here | stdexec::then([](){ return 42; }); // Check: does sender complete on my_scheduler? static_assert(stdexec::__completes_on<decltype(sender), my_scheduler>); // true
When you call stdexec::schedule(sched), that scheduler becomes the "completion scheduler" for the entire sender chain. Any subsequent operations (then, bulk_chunked, etc.) will complete on that scheduler.
__starts_on Concept// This sender has NO completion scheduler auto sender = stdexec::just(42) | stdexec::bulk_chunked(1000, [](int i, int val) { work(i, val); }); // But when we sync_wait with a scheduler... stdexec::sync_wait(sender, my_scheduler); // <-- scheduler in environment // Check: does sender start on my_scheduler in this environment? static_assert(stdexec::__starts_on<decltype(sender), my_scheduler, some_env>); // true
When you pass a scheduler to sync_wait() or similar functions, that scheduler becomes part of the "receiver environment" and will be used to start execution.
struct domain_concept { // Transform sender without environment context template<typename Sender> auto transform_sender(Sender&& snd) const noexcept -> /*sender*/; // Transform sender with environment context template<typename Sender, typename Env> auto transform_sender(Sender&& snd, const Env& env) const noexcept -> /*sender*/; };
Schedulers associate with domains via the get_domain query:
class my_scheduler { public: auto query(stdexec::get_domain_t) const noexcept { return my_domain{}; } // ... other scheduler interface };
schedule(sched) | bulk_chunked(n,f)
bulk_sender
optimized_bulk_sender
This shows the main magic of domain customization.
Shows how the 4 key components connect.
struct my_domain : stdexec::default_domain { // Constrain which senders to transform template<typename Sender> concept transformable_sender = stdexec::sender_expr_for<Sender, stdexec::bulk_chunked_t> || stdexec::sender_expr_for<Sender, stdexec::bulk_unchunked_t>; template<transformable_sender Sender> auto transform_sender(Sender&& snd) const noexcept; template<transformable_sender Sender, typename Env> auto transform_sender(Sender&& snd, const Env& env) const noexcept; };
struct transform_bulk { template<typename Data, typename Previous> auto operator()(stdexec::bulk_chunked_t, Data&& data, Previous&& prev) const { auto&& [policy, shape, func] = std::forward<Data>(data); // Extract execution policy using policy_t = std::remove_cvref_t<decltype(policy.__get())>; constexpr bool parallel = std::same_as<policy_t, stdexec::parallel_policy>; return my_bulk_sender</*chunked=*/false, Previous, decltype(shape), decltype(func), parallel>{ scheduler_, std::forward<Previous>(prev), shape, std::move(func) }; } my_scheduler scheduler_; };
template <my_domain::transformable_sender Sender> auto my_domain::transform_sender(Sender&& snd) const noexcept { // Case 1: Scheduler in sender chain if constexpr (stdexec::__completes_on<Sender, my_scheduler>) { auto sched = stdexec::get_completion_scheduler<stdexec::set_value_t>( stdexec::get_env(snd)); return stdexec::__sexpr_apply( std::forward<Sender>(snd), transform_bulk{sched}); } else { static_assert(stdexec::__completes_on<Sender, my_scheduler>, "No my_scheduler found in sender's completion environment"); return not_a_sender<stdexec::__name_of<Sender>>{}; } }
For unsupported configurations, provide clear diagnostics:
template<typename Sender> struct not_a_sender { using sender_concept = stdexec::sender_t; // Note: static_assert(false) may not work in all compiler contexts // Use static_assert with dependent condition instead static_assert(std::is_void_v<Sender>, "Sender cannot be optimized: no compatible scheduler found"); };
Important Notes:
static_assert(false) may not work consistently across all compilers. Using a dependent condition like std::is_void_v<Sender> ensures the assertion is template-dependent and properly delayed.not_a_sender approach works well when you want hard compilation errors. For SFINAE-friendly detection where you want overloads to be quietly removed from consideration, use concepts or std::enable_if instead.requires clauses on the transform functions rather than error types for cleaner diagnostics.The system context implementation demonstrates production-quality domain customization:
struct __parallel_scheduler_domain : stdexec::default_domain { template<__bulk_chunked_or_unchunked Sender> auto transform_sender(Sender&& __sndr) const noexcept { if constexpr (stdexec::__completes_on<Sender, parallel_scheduler>) { auto __sched = stdexec::get_completion_scheduler<stdexec::set_value_t>( stdexec::get_env(__sndr)); return stdexec::__sexpr_apply( static_cast<Sender&&>(__sndr), __transform_parallel_bulk_sender{__sched}); } else { static_assert(stdexec::__completes_on<Sender, parallel_scheduler>, "No parallel_scheduler instance found"); return __not_a_sender<stdexec::__name_of<Sender>>(); } } };
Key Features:
HPX provides another excellent example with policy-aware optimization:
template<typename Policy> struct thread_pool_domain : stdexec::default_domain { template<__bulk_chunked_or_unchunked Sender> auto transform_sender(Sender&& sndr) const noexcept { auto&& [tag, data, child] = sndr; auto&& [pol, shape, f] = data; return hpx::execution::experimental::detail:: thread_pool_bulk_sender<Policy, /*...*/>{ HPX_MOVE(sched), child, shape, f }; } thread_pool_scheduler<Policy> sched; };
Advantages:
Domain customization operates entirely at compile-time:
Domain customization enables:
The dual overload design handles two distinct composition patterns based on the scheduler detection concepts:
__completes_onschedule(sched)
| bulk_chunked(n,f)
__starts_onjust(42) | bulk_chunked(n,f)
sync_wait(work, sched)
Your domain must check both patterns to catch all optimization opportunities.
// Pattern 1: Scheduler in sender chain (__completes_on) auto work1 = stdexec::schedule(my_sched) | algorithm(...); // Pattern 2: Scheduler in execution context (__starts_on) auto work2 = algorithm(...); stdexec::sync_wait(work2, my_sched);
Your domain must check both cases to catch all optimization opportunities:
template <bulk_sender Sender> auto transform_sender(Sender&& snd) const noexcept { // Case 1: Scheduler in sender chain if constexpr (stdexec::__completes_on<Sender, my_scheduler>) { auto sched = stdexec::get_completion_scheduler<stdexec::set_value_t>( stdexec::get_env(snd)); return optimize_with_scheduler(std::forward<Sender>(snd), sched); } // Case 1 failed - no optimization possible } template <bulk_sender Sender, typename Env> auto transform_sender(Sender&& snd, const Env& env) const noexcept { // Case 1: Still try completion scheduler first if constexpr (stdexec::__completes_on<Sender, my_scheduler>) { // Same as above... } // Case 2: Scheduler in receiver environment else if constexpr (stdexec::__starts_on<Sender, my_scheduler, Env>) { auto sched = stdexec::get_scheduler(env); return optimize_with_scheduler(std::forward<Sender>(snd), sched); } // Both cases failed - no optimization possible }
Key Insight:
__completes_on: Scheduler is "baked into" the sender chain__starts_on: Scheduler comes from the execution environment__sexpr_apply?Using __sexpr_apply instead of direct structured binding provides:
Compile-time errors prevent runtime failures:
not_a_sender types enable SFINAE-friendly detectionDo:
stdexec::default_domainDon't:
Do:
get_domain() queryDon't:
Domain customization in stdexec provides a powerful mechanism for execution
context optimization while preserving composability. The design enables substantial
performance improvements through context-aware algorithm implementations.
Key takeaways:
stdexec and HPXScheduler authors can leverage domain customization to unlock the full performance potential of their execution contexts while maintaining the elegant composability that makes sender/receiver programming so powerful.
My deepest thanks go to Hartmut Kaiser and Isidoros Tsaousis-Seiras for their mentorship and invaluable insights throughout this project.
std::execution -- Structured Concurrency