Grass Cluster Provisioning in On-Premises Environment¶
With the following guide, you can build up a MARO cluster in grass/on-premises in local private network and run your training job in On-Premises distributed environment.
Prerequisites¶
Linux with Python 3.6+
Install Powershell if you are using Windows Server
Cluster Management¶
Create a cluster with a deployment
# Create a grass cluster with a grass-create deployment maro grass create ./grass-azure-create.ymlLet a node join a specified cluster
# Let a worker node join into specified cluster maro grass node join ./node-join.ymlLet a node leave a specified cluster
# Let a worker node leave a specified cluster maro grass node leave {cluster_name} {node_name}
Delete the cluster
# Delete a grass cluster maro grass delete my_grass_cluster
Run Job¶
See Run Job in grass/azure for reference.
Sample Deployments¶
grass-on-premises-create¶
mode: grass/on-premises
name: clusterName
user:
admin_id: admin
master:
username: root
hostname: maroMaster
public_ip_address: 137.128.0.1
private_ip_address: 10.0.0.4
grass-on-premises-join-cluster¶
mode: grass/on-premises
master:
private_ip_address: 10.0.0.4
node:
hostname: maroNode1
username: root
public_ip_address: 137.128.0.2
private_ip_address: 10.0.0.5
resources:
cpu: all
memory: 2048m
gpu: 0
config:
install_node_runtime: true
install_node_gpu_support: false