Strategy as ProtoBuf Message

AutoDist uses Protocol Buffer to standardize strategy representation and its configurations.

Top

autodist/proto/graphitem.proto

AutoDist distributed strategy messages.

Represents how to distribute a TensorFlow computational graph.

GraphItem

Represents the strategy the AutoDist backend will implement.

Field Type Description
graph_def google.protobuf.Any TensorFlow graph_def
grad_target_pairs GraphItem.GradTargetPairsEntry Mapping from grad tensor name to variable name
info GraphItem.Info

GraphItem.GradTargetPairsEntry

Field Type Description
key string
value string

GraphItem.Info

Represents the essential transformed subset of TensorFlow MetaGraph

Right now, it represents a essential AutoDist subset of collections of MetaGraph. In the future, it will generalize to captures.

Field Type Description
variables google.protobuf.Any
table_initializers string
savers google.protobuf.Any

Top

autodist/proto/strategy.proto

AutoDist distributed strategy messages.

Represents how to distribute a TensorFlow computational graph.

Strategy

Represents the strategy the AutoDist backend will implement.

Field Type Description
id string unique strategy identifier
path string optional serialized strategy message temp path
node_config Strategy.Node configuration of some individual nodes of the computational graph
graph_config Strategy.GraphConfig configuration of the computational graph as a whole

Strategy.GraphConfig

Represents the configuration of the graph as a whole.

Based on the list of replicas, the AutoDist backend does a combination of in-graph and between-graph distribution.

Field Type Description
replicas string the number of batch-splitting/data-parallel replicas

Strategy.Node

Represents the configuration of an individual node in the graph.

Right now, these nodes are just variables in the graph, so the only information they contain is how to synchronize the variable’s gradients.

In the future, for node partitioning, these could be any node in the graph. In that case, they would also have more logic for partitioning the op.

Field Type Description
var_name string variable name
PSSynchronizer PSSynchronizer One of a synchronizer to choose
AllReduceSynchronizer AllReduceSynchronizer One of a synchronizer to choose
partitioner string Optional partitioner configuration, e.g. 1, 2, 1
part_config Strategy.Node Optional node configs for each node partition (if partitioned)

Top

autodist/proto/synchronizers.proto

AutoDist synchronization messages.

AllReduceSynchronizer

Synchronization using AllReduce.

Field Type Description
spec AllReduceSynchronizer.Spec Specification for collective communication
compressor AllReduceSynchronizer.Compressor One of the compressors to choose
group int32 The allreduce group to merge with. The group index should be less than the number of variables

PSSynchronizer

Synchronization using a Parameter Server.

Field Type Description
reduction_destination string Parameter Server to use
local_replication bool Whether to create local proxies of each PS variable
sync bool Whether to sync gradients across between-graph replications
staleness int32 Staleness

AllReduceSynchronizer.Compressor

Which gradient compression method to use

Name Number Description
NoneCompressor 0 No compression
HorovodCompressor 1 Horovod's Compression
HorovodCompressorEF 2 Horovod's Compression but with Error Feedback.

AllReduceSynchronizer.Spec

Which communication method to use

Name Number Description
AUTO 0 Runtime's automatic choices
NCCL 1 Use ncclAllReduce for all-reduce, and ring algorithms for all-gather
RING 2 TensorFlow's ring algorithms for all-reduce and all-gather

Scalar Value Types

.proto Type Notes C++ Python
double double float
float float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 int/long
uint32 Uses variable-length encoding. uint32 int/long
uint64 Uses variable-length encoding. uint64 int/long
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 int/long
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 int/long
sfixed32 Always four bytes. int32 int
sfixed64 Always eight bytes. int64 int/long
bool bool boolean
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string str/unicode
bytes May contain any arbitrary sequence of bytes. string str