Strategy as ProtoBuf Message¶
AutoDist uses Protocol Buffer to standardize strategy representation and its configurations.
autodist/proto/graphitem.proto¶
AutoDist distributed strategy messages.
Represents how to distribute a TensorFlow computational graph.
GraphItem¶
Represents the strategy the AutoDist backend will implement.
Field | Type | Description |
---|---|---|
graph_def | google.protobuf.Any | TensorFlow graph_def |
grad_target_pairs | GraphItem.GradTargetPairsEntry | Mapping from grad tensor name to variable name |
info | GraphItem.Info |
GraphItem.Info¶
Represents the essential transformed subset of TensorFlow MetaGraph
Right now, it represents a essential AutoDist subset of collections of MetaGraph. In the future, it will generalize to captures.
Field | Type | Description |
---|---|---|
variables | google.protobuf.Any | |
table_initializers | string | |
savers | google.protobuf.Any |
autodist/proto/strategy.proto¶
AutoDist distributed strategy messages.
Represents how to distribute a TensorFlow computational graph.
Strategy¶
Represents the strategy the AutoDist backend will implement.
Field | Type | Description |
---|---|---|
id | string | unique strategy identifier |
path | string | optional serialized strategy message temp path |
node_config | Strategy.Node | configuration of some individual nodes of the computational graph |
graph_config | Strategy.GraphConfig | configuration of the computational graph as a whole |
Strategy.GraphConfig¶
Represents the configuration of the graph as a whole.
Based on the list of replicas, the AutoDist backend does a combination of in-graph and between-graph distribution.
Field | Type | Description |
---|---|---|
replicas | string | the number of batch-splitting/data-parallel replicas |
Strategy.Node¶
Represents the configuration of an individual node in the graph.
Right now, these nodes are just variables in the graph, so the only information they contain is how to synchronize the variable’s gradients.
In the future, for node partitioning, these could be any node in the graph. In that case, they would also have more logic for partitioning the op.
Field | Type | Description |
---|---|---|
var_name | string | variable name |
PSSynchronizer | PSSynchronizer | One of a synchronizer to choose |
AllReduceSynchronizer | AllReduceSynchronizer | One of a synchronizer to choose |
partitioner | string | Optional partitioner configuration, e.g. 1, 2, 1 |
part_config | Strategy.Node | Optional node configs for each node partition (if partitioned) |
autodist/proto/synchronizers.proto¶
AutoDist synchronization messages.
AllReduceSynchronizer¶
Synchronization using AllReduce.
Field | Type | Description |
---|---|---|
spec | AllReduceSynchronizer.Spec | Specification for collective communication |
compressor | AllReduceSynchronizer.Compressor | One of the compressors to choose |
group | int32 | The allreduce group to merge with. The group index should be less than the number of variables |
PSSynchronizer¶
Synchronization using a Parameter Server.
AllReduceSynchronizer.Compressor¶
Which gradient compression method to use
Name | Number | Description |
---|---|---|
NoneCompressor | 0 | No compression |
HorovodCompressor | 1 | Horovod's Compression |
HorovodCompressorEF | 2 | Horovod's Compression but with Error Feedback. |
AllReduceSynchronizer.Spec¶
Which communication method to use
Name | Number | Description |
---|---|---|
AUTO | 0 | Runtime's automatic choices |
NCCL | 1 | Use ncclAllReduce for all-reduce, and ring algorithms for all-gather |
RING | 2 | TensorFlow's ring algorithms for all-reduce and all-gather |
Scalar Value Types¶
.proto Type | Notes | C++ | Python |
---|---|---|---|
double | double | float | |
float | float | float | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | int/long |
uint32 | Uses variable-length encoding. | uint32 | int/long |
uint64 | Uses variable-length encoding. | uint64 | int/long |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | int/long |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | int/long |
sfixed32 | Always four bytes. | int32 | int |
sfixed64 | Always eight bytes. | int64 | int/long |
bool | bool | boolean | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | str/unicode |
bytes | May contain any arbitrary sequence of bytes. | string | str |