Difference between revisions of "NautilusServer"

From Deep Depth 116E167 Project Documentation
Jump to: navigation, search
m
m
Line 110: Line 110:
 
     sudo chown YOURUSERNAME:YOURUSERNAME /media/FASTDATA1/YOURUSERNAME
 
     sudo chown YOURUSERNAME:YOURUSERNAME /media/FASTDATA1/YOURUSERNAME
 
     sudo chmod 700 /media/FASTDATA1/YOURUSERNAME
 
     sudo chmod 700 /media/FASTDATA1/YOURUSERNAME
 +
 +
== Compiling your own CUDA programs ==
 +
 +
To do this, add the following lines to your '''.bashrc''' file in your home folder:
 +
 +
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
 +
    export PATH=/usr/local/cuda/bin:$PATH
  
 
== Other gotchas ==
 
== Other gotchas ==

Revision as of 07:08, 16 August 2017

Help

Accessing

IP: 160.75.27.83

SSH port: 1542

Access from: ITU or ITU VPN.

VPN Help

SSH help

The SSH command to connect from a Unix environment:

   ssh -X -p 1542 hossein@160.75.27.83

Switch meanings:

  • -p 1542: connect on port 1542
  • -X: This allows you to run X applications. Omit it if you will be pure command line. You can also run X applications from Windows but you will need to install an X server on your Windows machine.

Note: to check that X forwarding is working, once you have connected, try running on the server the command:

   xeyes

Or:

   dolphin

You could for example run spyder like this. But there can be some latency across the network.

Setting up a deep learning environment

Install anaconda

   export ANACONDA_PATH_PARENT=$HOME/software
   export ANACONDA_PATH=$ANACONDA_PATH_PARENT/anaconda3
   export ANACONDA_INSTALLER=Anaconda3-4.3.1-Linux-x86_64.sh
   mkdir -p ~/tmp
   cd ~/tmp
   mkdir -p $ANACONDA_PATH_PARENT
   wget https://repo.continuum.io/archive/$ANACONDA_INSTALLER
   bash $ANACONDA_INSTALLER -b -p $ANACONDA_PATH
   export PATH=$ANACONDA_PATH/bin:$PATH
   echo PATH: $PATH
   echo >> ~/.bashrc
   echo export PATH=$ANACONDA_PATH/bin:\$PATH >> ~/.bashrc

Install tensorflow and keras

These will be installed in a conda environment called deep:

   export ENVNAME=deep
   conda create --name $ENVNAME
   source activate $ENVNAME
   conda install theano keras tensorflow tensorflow-gpu opencv pillow spyder matplotlib

To check Keras is working:

   python -c "from keras.models import Sequential;Sequential()"

To check the GPU is working with tensorflow, check this first (it should list two GPUs):

   nvidia-smi

Then make sure the following script runs and finds one CPU and two GPUS: https://bitbucket.org/damienjadeduff/uhem_keras_tf/src/master/sariyer_python3/test_tf_gpu.py

Run it like this:

   python test_tf_gpu.py

Warning: for different versions of Tensorflow, Keras or Theano you may need to use pip to install the version you need in an environment.

Easy file access (Linux)

This can be useful for getting files on and off the server by accessing your remote home directory as if it was on your local computer (mounted on your file system).

On YOUR Linux computer run:

   sudo apt-get install sshfs
   targ=~/remote/nautilus
   fusermount -u $targ # only necessary to unmount if already tried
   mkdir -p $targ
   sshfs -p 1542 -o workaround=rename YOUR_SERVER_USERNAME@160.75.27.83:/home/YOUR_SERVER_USERNAME $targ

Note: if parts of your system hang because the connection to the ssh server gets stale (a common problem), just do:

   killall ssfs

It should resolve most of your problems.

Using the SSD

There is an SSD drive installed. This drive is automatically mounted at:

   /media/FASTDATA1

The drive belongs to user root and group fastdata1. If you cannot access it you need to get an admin (Hossein) to add you to the group with the command:

   sudo usermod -aG fastdata1 YOURUSERNAME

And to add you a folder in there with the right permissions:

   sudo mkdir /media/FASTDATA1/YOURUSERNAME
   sudo chown YOURUSERNAME:YOURUSERNAME /media/FASTDATA1/YOURUSERNAME
   sudo chmod 700 /media/FASTDATA1/YOURUSERNAME

Compiling your own CUDA programs

To do this, add the following lines to your .bashrc file in your home folder:

   export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
   export PATH=/usr/local/cuda/bin:$PATH

Other gotchas

Temperature

The GPUs are set to slowdown at 93C and shutdown at 96C. Idle temperature should be about 50C. To see the current temperature information in full run:

   nvidia-smi -q -d temperature

It should never reach that temperature. If it does that's a big whoops.

GPU and Memory Allocation

Multiple kernels and users can also run on one GPU. This may mean you don't have enough memory at some point. It is possible to set it so that a GPU can only be accessed by one user at a time. This may be necessary in the future to ensure there is enough memory for those big jobs.

Tensorflow actually seems to claim all the memory on all the GPUs so it might be a nice gentle thing for other users for you to make tensorflow only use a certain amount of memory as described in the following answer: https://stackoverflow.com/a/34200194/1616231

Alternatively you may make tensorflow take memory as it is needed by taking the steps described in the following answer: https://stackoverflow.com/a/37454574/1616231 (though this will ultimately use more memory).

More Information

Server Construction

Built by Uzmanlar PC

Software

OS

Kubuntu 16.04.3 LTS

Graphics Drivers

Nvidia 384.59 drivers installed using runfile NVIDIA-Linux-x86_64-384.59.run

Installed using (to keep using the integrated graphics as main display graphics):

   sudo ./NVIDIA-Linux-x86_64-370.28.run --no-opengl-files --no-x-check --disable-nouveau

CUDA Drivers

Installed using

   cuda_8.0.61.2_linux.run