Strategy as ProtoBuf Message¶

AutoDist uses Protocol Buffer to standardize strategy representation and its configurations.

autodist/proto/graphitem.proto¶

AutoDist distributed strategy messages.

Represents how to distribute a TensorFlow computational graph.

Represents the strategy the AutoDist backend will implement.

Field	Type	Description
graph_def	google.protobuf.Any	TensorFlow graph_def
grad_target_pairs	GraphItem.GradTargetPairsEntry	Mapping from grad tensor name to variable name
info	GraphItem.Info

Field	Type	Description
key	string
value	string

Represents the essential transformed subset of TensorFlow MetaGraph

Right now, it represents a essential AutoDist subset of collections of MetaGraph. In the future, it will generalize to captures.

AutoDist distributed strategy messages.

Represents how to distribute a TensorFlow computational graph.

Represents the strategy the AutoDist backend will implement.

Field	Type	Description
id	string	unique strategy identifier
path	string	optional serialized strategy message temp path
node_config	Strategy.Node	configuration of some individual nodes of the computational graph
graph_config	Strategy.GraphConfig	configuration of the computational graph as a whole

Represents the configuration of the graph as a whole.

Based on the list of replicas, the AutoDist backend does a combination of in-graph and between-graph distribution.

Field	Type	Description
replicas	string	the number of batch-splitting/data-parallel replicas

Represents the configuration of an individual node in the graph.

Right now, these nodes are just variables in the graph, so the only information they contain is how to synchronize the variable’s gradients.

In the future, for node partitioning, these could be any node in the graph. In that case, they would also have more logic for partitioning the op.

Field	Type	Description
var_name	string	variable name
PSSynchronizer	PSSynchronizer	One of a synchronizer to choose
AllReduceSynchronizer	AllReduceSynchronizer	One of a synchronizer to choose
partitioner	string	Optional partitioner configuration, e.g. `1, 2, 1`
part_config	Strategy.Node	Optional node configs for each node partition (if partitioned)

AutoDist synchronization messages.

Synchronization using AllReduce.

Field	Type	Description
spec	AllReduceSynchronizer.Spec	Specification for collective communication
compressor	AllReduceSynchronizer.Compressor	One of the compressors to choose
group	int32	The allreduce group to merge with. The group index should be less than the number of variables

Synchronization using a Parameter Server.

Field	Type	Description
reduction_destination	string	Parameter Server to use
local_replication	bool	Whether to create local proxies of each PS variable
sync	bool	Whether to sync gradients across between-graph replications
staleness	int32	Staleness

Which gradient compression method to use

Name	Number	Description
NoneCompressor	0	No compression
HorovodCompressor	1	Horovod's Compression
HorovodCompressorEF	2	Horovod's Compression but with Error Feedback.

Which communication method to use

Name	Number	Description
AUTO	0	Runtime's automatic choices
NCCL	1	Use ncclAllReduce for all-reduce, and ring algorithms for all-gather
RING	2	TensorFlow's ring algorithms for all-reduce and all-gather

.proto Type	Notes	C++	Python
double		double	float
float		float	float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	int32	int
int64	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	int64	int/long
uint32	Uses variable-length encoding.	uint32	int/long
uint64	Uses variable-length encoding.	uint64	int/long
sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	int32	int
sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	int64	int/long
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 2^28.	uint32	int
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 2^56.	uint64	int/long
sfixed32	Always four bytes.	int32	int
sfixed64	Always eight bytes.	int64	int/long
bool		bool	boolean
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string	str/unicode
bytes	May contain any arbitrary sequence of bytes.	string	str