All Reduce Synchronizer

AllReduce Synchronizer.

class AllReduceSynchronizer(config: autodist.proto.synchronizers_pb2.AllReduceSynchronizer)[source]

Bases: autodist.kernel.synchronization.synchronizer.Synchronizer

AllReduce Synchronizer.

This AllReduce Synchronizer currently uses TensorFlow’s collective_device_ops to insert their AllReduce ops into our graph.

The class AllReduceSynchronizer class contains the following possible instantiations:

  1. spec=`auto`: single-node multiple devices, or cross-node AllReduce based on collective ops

  2. spec=`nccl`: single-node multiple devices, or cross-node AllReduce based on NCCL

  3. spec=`ring`/’tree’, AllReduce with different reduction structures: ring, tree, etc.

However note that it does not contain the following instantiations:

  1. shuffle reduce (reduce to CPU or GPU as in PS) + AllReduce across nodes

  2. any other types of hybrid reduction of PS and AllReduce.

in_graph_apply(graph_item, var_name)[source]

Perform in-graph synchronization based on AllReduce and TensorFlow Collective Ops.

Note that collective ops now only supports dense tensors.

Parameters
  • graph_item (graph_item.GraphItem) – the graph_item to be distributed

  • var_name (str) – the corresponded variable name

Returns

The new graph

Return type

graph_item.GraphItem

assign_cluster_information(num_workers, num_replicas, worker_device, worker_id, canonical_replica_devices, is_chief=False)[source]

Store cluster information in the synchronizer.

between_graph_apply(graph_item, var_name)[source]

Allreduce synchronizer will do nothing in between-graph synchronization.

classmethod create(name, *args, **kwargs)[source]

Create new Synchronizer instance given subclass name.

Parameters
  • name – Name of the Synchronizer subclass (e.g. PSSynchronizer).

  • *args – Any args for the subclass constructor.

  • **kwargs – Any kwargs for the subclass constructor.

Returns

Synchronizer