Computer Science Deppartment System Administrator Guide Table of Contents ----------------- 0. CS Department computing resources summary 1. CS server systems overview (Dunn 388) 2. CS laboratory systems overview (Dunn 358) 3. CS laboratory directory architecture 4. CS laboratory user ID layout 5. Setting up a CS laboratory system 6. Setting up CS laboratory user accounts 7. Authentication 8. Updating CS laboratory workstations and servers 9. Updating Faculty office systems 10. HPC laboratory systems overview 11. Remote access 12. Backup procedures 13. Software lifecycle 14. Hardware lifecycle 15. Kernel Cleanup 0. CS Department computing resources summary --------------------------------------------- CS Department computers and related resources are located in Dunn Hall on the Third Floor. Rooms assigned to the CS Department are: Dunn 301: Faculty office (Brian Ladd) Dunn 303: Faculty office (Susan Haller) Dunn 305: Department office (Chris Lanz) Dunn 307: Faculty office (Timothy Fossum) Dunn 318: Storage Dunn 320: Storage -- office supplies and paper Dunn 322: Storage -- decommissioned laptops and other devices Dunn 324: Robotics lab (not currently in use) Dunn 326: Storage -- keyboards, mice, and monitors Dunn 328: Storage -- cables, power adapters, and miscellaneous Dunn 330: Storage -- boxes Dunn 332: Storage -- monitors, desktops, portable USB drives Dunn 334: Storage -- desktops, monitors, wireless routers, speakers Dunn 338: Faculty office and work area Dunn 340: Faculty office and storage Dunn 342: Meeting room Dunn 344: Student work area Dunn 346: Computer games Lab Dunn 350: ACM Student Chapter office Dunn 356: HPC lab Dunn 358: GNU/Linux lab Dunn 388: Server room These rooms have Ethernet network connectivity to the CTS switch in Dunn 348. The HPC workstations are served by a small Ethernet switch that is connected to the campus network. The HPC cluster machines are connected to each other through a switch and/or a NAT router that is connected to the campus network. 1. CS Server systems overview (Dunn 388) ----------------------------------------- Lab server, rack mount hostname: cs.potsdam.edu IP address: 137.143.158.220 HP Xeon server 5yr on-site warranty service contract 16GB memory 1.5TB disk in a raid-5 configuration one hot-swappable spare drive installed in June 2013 OS: Ubuntu server 12.04.5 (precise) Development server, rack mount hostname: cs-devel.potsdam.edu IP address: 137.143.156.105 HP server 1.0TB disk in a raid-5 configuration one hot-swappable spare drive installed in 2010? OS: Debian 6.0.10 (squeeze) Decommissioned server, rack mount Gateway server 365G disk in a raid-5 configuration one hot-swappable spare drive decommissioned APC battery backup, rack mount The Lab server is the primary server for the workstations in the Dunn 358 GNU/Linux lab. It also hosts the CS Department web pages, which are accessible from off campus through the campus firewall on port 80. The Development server is a git repository that is used by students doing project work. It is maintained principally by Brian Ladd. 2. CS laboratory systems overview (Dunn 358) --------------------------------------------- The CS laboratory consists of 14 HP Z620 systems, each having 12G memory and 500G disk space. Each system has a 23-inch monitor, a keyboard, and a mouse. The systems are running Ubuntu 12.04. The IP addresses are statically assigned by CTS, with addresses of the form 137.143.158.xx, where xx is given in the following table: xx hostname --- ------- 1 marcy 2 algonquin 3 haystack 4 skylight 5 whiteface 6 dix 7 gray 8 iroquois 9 basin 10 gothics 11 colden 12 giant 13 santanoni 14 redfield The hostnames are locally defined and do not resolve outside of the CS lab environment. The lab machines use DHCP to obtain their IP addresses from the CTS DHCP server. When the MAC address of a lab machine changes, it is necessary to inform CTS of the change so the DHCP binding of the MAC address to its appropriate IP address can be updated. All lab machines are accessible through ssh from off campus through the campus firewall. The systems can be accessed specifically by IP address or symbolically using the hostname 'lab.cs.potsdam.edu'. This hostname resolves in a round-robin fashion to a lab system. 3. CS laboratory directory architecture ---------------------------------------- The directory structure of a CS laboratory system is similar to a directory structure automatically generated with a Ubuntu desktop install. The principal difference is that the /home directory of a CS lab system is NFS mounted to the /home directory of the CS lab server (137.143.158.220). This makes it possible for students logging into any lab system to see their own home directories on the server. The /home directory is also used to allow course-specific directories to be visible to students. On the CS server, the web directory -- normally /var/www -- is symbolically linked to /home/www, and so it is actually part of the /home directory structure. This is so course-specific web pages can be made available to students on the lab systems without going through a web client. The /home directory of a lab machine has the following subdirectories: Gone -- staging area for soon-to-be deleted accounts admin -- user account administration files and utils faculty -- faculty/staff accounts git -- repository (obsoleted by cs-devel) guest -- transient guest accounts lib -- class-specific libraries lost+found share -- documentation and other public files special -- special accounts (e.g., contest judges) staff -- symbolically linked to faculty student -- all student-related accounts user -- more repository stuff (obsoleted by cs-devel) var -- unused www -- web-related documents 4. CS laboratory UID/GID layout -------------------------------- Most GIDs are the same as their corresponding UIDs, with the same user/group names. Exceptions may occur in the range of system UIDs. When faculty and student accounts are created, an unused UID/GID pair (with UID=GID) is chosen for the new account, in the ranges given below. Mapping UIDs serve only to allow access to top-level home directories by name. For example, ~student refers to /home/student, etc. UID ranges ---------- 0-999 system UIDs 1000-1999 faculty UIDs -- homedirs normally in /home/faculty 2000-2999 other system UIDs 9000-9999 admin UIDs (see below) 10000-19999 student UIDs -- homedirs normally in /home/student 20000-29999 class-related UIDs (see below) 30000-30999 guest account UIDs -- homedirs normally in /home/guest 31000-31999 special account UIDs -- homedirs normally in /home/special Mapping UIDs -- all have /bin/false as their shells. These allow ~xxx to refer to the corresponding homedir ------------ 9000 admin homedir=/home/admin 9001 faculty homedir=/home/faculty 9002 guest homedir=/home/guest 9003 student homedir=/home/student 9004 special homedir=/home/special 9100 acct homedir=/home/admin/acct Administrative mapping UIDs --------------------------- 9100 acct homedir=/home/admin/acct 9101 sali homedir=/root/sali 9102 image homedir=/data/sali/images/lab-image Class-specific UIDs ------------------- 2cccs class UIDs homedir=class specific dir ccc=class, s=section (Example...) 22010 CS201 UID homedir=/home/student/Classes/201 (all sections) The password and group files on the server (137.143.158.220) and the lab systems (137.143.158.[1-14]) are similar, but not identical. This is because system UIDs for the server and lab systems are not generated in the same way. To synchronize the password and group files on the server and lab systems, only the non-system UIDs are preserved between the two systems, so that UIDs that are NOT in the ranges 0-999 and 2000-2999 are the same on all systems. The server script /root/lab/update/fix-image-pwdgrp takes the password and group files on the server (in /etc) and merges its non-system UIDs with those on the image (in /data/sali/images/lab-image/etc), updating the image password and group files. The original password and group files are assumed to pass the pwck and grpck checks. The modified image password and group files can be propagated to the lab systems with the server script /root/lab/update/pushall-image-pwdgrp These two scripts are called in sequence by the server script /root/lab/update/fix-pwdgrp This script should be called whenever new users are added to the server (see item 6 below) or when modifications are made to non-system entries in the password or group files on the server. 5. Setting up a CS laboratory system ------------------------------------- The CS lab machines are imaged using the SALI package. See https://oss.trac.surfsara.nl/sali for documentation. If a CS laboratory system is replaced or needs to be re-imaged from scratch, follow these steps: 5a. Make sure that the new system's MAC address has been registered with CTS and that it will pick up its IP address using DHCP. This may require that you boot the system with a Linux live CD first, bring up a browser, and register the system. Be sure to note the system's acquired IP address to complete the following steps. If the system is to have a static IP address (as are those in the CS lab, for example), have CTS modify theiry DHCP tables to bind the system's MAC address to its static IP address. On the Z620s, use the top Ethernet jack when connecting to the network, not the AMT jack. 5b. Make sure the system's IP address is listed in the server's /etc/sali/rsync_stubs/00header.conf file in a hosts_allow entry. Then run sali rsync which will build a new /etc/rsyncd.conf file. 5c. Start the server's rsync daemon: service rsync start 5d. Make sure that the system can boot from the USB stick: either modify the boot order on the system BIOS, or choose a boot-time option to boot from the USB stick (e.g., F9). 5e. Insert the SALI USB stick into a USB port on the system and boot. When the system boot screen appears, stay at the top boot option (Auto-Install "lab-image"), and hit the TAB key to bring up the boot parameter line. Edit this line to replace FIXME with the server address (137.143.158.220), and hit return. 5f. At this point, the system will get its IP address from DHCP, partition the disk, and load the lab image from the server. The script /data/sali/scripts/lab-image-script.sh governs this process. The lab image will be loaded onto the hard disk. 5g. Make sure that the system's IP address is listed in the server's /etc/exports file. If not, add it (using another entry as a template) and run exportfs -r 5h. Reboot the system and test it. 5i. If you are replacing a lab PC with another one, you will need to let CTS know of the new system's MAC address and the old system's static IP address. This will allow the new system to pick up its proper IP address upon boot. 5j. Run 'ifconfig -a' on the system to make sure that the eth1 interface is picking up the proper IP address. If it's eth0, you may need to swap the names 'eth0' and 'eth1' in the file /etc/udev/rules.d/70-persistent-net.rules on the system and reboot. The file /etc/network/interfaces on the system should have the following lines: auto eth1 iface eth1 inet dhcp 6. Setting up CS laboratory user accounts ------------------------------------------ First, you will need the CS Department class list to process. You can get this from the CS Department chair, who must use BearPaws to get the file 'excel_roster.xls' using steps 6a through 6c. 6a. Log in to BearPaws, and select the following menu choices: Admin & Staff Reports -> Class Roster/Excel Download 6b. Select the current (or immediately upcoming) semester to download: for example, Fall 2023 6c. On the Class Roster/Excel Download page, choose: Output format: Microsoft Excel File (radio button) ALL rosters for: COMPUTER SCIENCE (scrolling list) Begin Download (action button) The file will be downloaded with the name excel_roster.xls 6d. Go to the following CS server directory: /home/admin/acct/lib/Classes If necessary, create a directory for the current semester: for example, the Fall 2015 semester would be 2015C. In general the semester coding is yyyys where yyyy is the year and s=A means Spring and s=C means Fall. (Summer session would be s=B, but we have never used this.) Look at the directories in /home/admin/acct/lib/Classes to see the proper format. 6d. Copy the excel_roster.xls class file (see 6c above) to the CS server into the proper semester directory -- for example, 2015C -- giving it a name of the form 'mmdd'. For example, 0905 would stand for Sept 5. This file should be in XML format. 6e. In the semester directory, run the 'fixit' script on the class file, as in the following example fixit 0905 This program will parse the class file and replace it with lines of the form user:Pnnnnnnnn:First Last:CISccc-sss where 'user' is the student's CCA username, 'Pnnnnnnnn' is the student's P-number, 'First Last' are the student's first and last names, and 'CISccc-sss' is the student's class registration for class number ccc, section number sss. Here's an example: smithab295:P00123456:Andrew Smith:CIS201-001 The original file contents (the one in XML format) is copied to the Old subdirectory of this directory, in case there's a need to examine it or re-process it by copying it back to this directory. Only students in CIS classes numbered 201 and above will have accounts created. The class file will contain entries for all CS classes, but entries for classes below 201 will be commented out by prepending the lines with a '#'. 6f. Once the fixed class file appears to be OK, run the 'classlist' script on it, as in the following example: classlist 0905 This will update the server user database (see below) and will create accounts for students who are new to the CS Department. If any new accounts are created, the script will indicate that the script fix-pwdgrp (see item 4 above) needs to be run to update the password and group files on all of the lab systems. 6g. Once the class file is processed as indicated above, new users will be able to access their home directories on all of the lab systems. The user file /home/admin/acct/lib/user contains a list of all students added to the system since 2005, along with the CS classes they have taken and in which semesters. An entry beginning with '#' means that the student has dropped that particular course/section. User accounts MUST be updated, using the above steps: 1. Immediately before the beginning of a semester, normally on a Sunday 2. At the end of the Add/Drop week, normally on a Sunday 3. At the end of week two 4. After the last day to withdraw The 'userlist' program (/home/admin/acct/bin/userlist) retrieves the contents of the user file in various formats: by user, by class, etc. The drop-all program (/home/admin/acct/sbin/drop-all) will display all of the students in the current semester who have dropped all of their CS classes. The script /home/admin/acct/sbin/addfaculty will add a faculty member to the system with a faculty UID. Examine the script to see the required parameters. Make sure that the GECOS parameter is quoted. Here's an example: /home/admin/acct/sbin/addfaculty lastfm "First Last" The user name ('lastfm' in the above example) should be the same as the individual's CCA account. Similar comments apply to the addstaff and addguest scripts. 7. Authentication ------------------ Student and faculty users are password authenticated on the lab systems by using their CCA passwords, using CTS LDAP authentication. Faculty can also use their CCA passwords for authentication on the lab systems. On the CS server, faculty/staff authentication is based on the user's password in the /etc/shadow file. No LDAP authentication is used on the CS server, so only faculty/staff can log in to the CS server, and only using the saved password. The same saved password can be used by faculty/staff on the lab systems as well. The saved password is not required to be the same as the user's CCA password Once a faculty/staff account is created, a system administrator must change the password for the user before the user can access the account. 8. Updating CS laboratory workstations and servers --------------------------------------------------- IT IS ESSENTIAL THAT PATCHES/UPDATES BE APPLIED TO THE CS LAB SYSTEMS AND SERVERS AS SOON AS THEY BECOME AVAILABLE. Patches to the CS lab systems are done separately from patches to the CS server. Marcy (currently 137.143.158.1) is called the "golden image" because this system is used first to apply patches and then to propagate these changes to the other lab systems. Follow these steps to apply updates and patches to the CS lab systems: 8a. Log in to golden image and become root (sudo -i). 8b. cd to /root/sali on the golden image and run the following commands: aptitude update aptitude upgrade 8c. If the upgrade step modified any files, run the following command: ./prepareclient Otherwise, no further steps are necessary. 8d. Log in to the CS server and become root. 8e. cd to /root/sali on the CS server and run the following commands: ./getimage ./pushall-update-client 8f. If the update requires that the lab systems should be rebooted, you can do so physically in the lab (Ctrl-Alt-F1 followed by Ctrl-Alt-Del) or en masse as follows, as root on the CS server: /root/lab/pushall-reboot-all BE AWARE THAT REBOOTING A LAB SYSTEM MAY TERMINATE THE SESSION OF SOMEONE LOGGED IN TO THE SYSTEM REMOTELY, INCLUDING FACULTY. DO THIS ONLY WHEN YOU HAVE INFORMED USERS THAT THE REBOOT IS UPCOMING, AND PREFERABLY ONLY WHEN THERE ARE NO USERS LOGGED IN. You can check the login status of all of the lab machines as root on the CS server: cd /root/lab ./pushall uptime (prints uptime and no. of users logged in) ./pushall who (prints login names of users logged in) ./pushall w (similar to who) 8g. The update/upgrade steps (see 8b) can be used to apply patches and modifications to the CS server as well. 9. Updating Faculty office systems ----------------------------------- Some faculty office systems may require periodic updating using the step 8b above. If this is the case, you will need to get root access to the systems before upgrading them. 10. HPC laboratory systems overview ----------------------------------- Faculty responsible for using the HPC lab systems are also responsible for their maintenance and upgrades. 11. Remote access ----------------- The CTS firewall allows off-campus ssh and http access to the CS server and ssh access to the CS lab systems. In order to limit the exposure of these systems to malicious attacks, only whitelisted incoming ssh connections are allowed. The whitelist is maintained on the CS server in the file /root/lab/iptables.rules This file is a symbolic link to the lab image file /data/sali/images/lab-image/etc/iptables.rules The whitelist can allow access to individual IP addresses or CIDR IP address specifications. Here's an example showing how RoadRunner IP addresses are allowed: # RoadRunner NY Fri Nov 14 19:45:38 EST 2014 -A INPUT -i eth1 -p tcp --dport ssh -s 67.240.0.0/12 -j ACCEPT The ACCEPT lines in this file MUST appear before the DROP line. Do not modify any other lines except the top line, which should have a time/date comment indicating the last modification. If a student is unable to access the CS lab systems remotely, have the student provide the source IP address, and execute the following command: whois The result should display a DNS authority listing, normally including a CIDR line that can be used to craft a new iptables.rules line in the format described above. Be sure to provide a comment as to the DNS authority being allowed and the time/date of the change. Once the iptables.rules file has been modified, run the following script: /root/lab/pushall-iptables This script will propagate the iptables.rules file to the CS server and all of the CS lab systems and will load all of the rules into the running kernels of the systems. It may be instructive to run the script /root/lab/pushall-iptables-vnL on occasion to examine the amount of incoming ssh traffic from of-campus, including dropped packets. An excessive number of dropped packets may be a symptom of penetration attempts. 12. Backup procedures --------------------- The /home directory on the server is backed up daily (Monday through Saturday) on the system's USB drive using the /root/BACKUPHOME.usb script. The root filesystem is backed up weekly (Sunday). Here are the /etc/crontab entries that carry this out: 30 1 * * 7 root /root/BACKUPFULL.usb 30 1 * * 1,2,3,4,5,6 root /root/BACKUPHOME.usb We are working on backing up the CS server to a CTS-based storage system so we will have off-site (i.e., non-Dunn Hall) backups in case of a building-wide failure. This has not been implemented yet. The script to do this for the /home directory is in /root/BACKUPHOME.sol. Once the CTS backup is operational, the /home directory should be backed up every day, and the root directory should be backed up weekly. The backup scripts keep one archive copy and a new backup, so if something happens with the backup itself, there will always be a stable (albeit old) backup available. 13. Software lifecycle ---------------------- The Ubuntu 12.04 LTS distribution is on the CS lab systems and on the CS server. This distribution will be supported through April 2017, after which the lab systems and server will need to be upgraded, probably to 16.04 LTS. This upgrade could be done in Summer 2016 or Summer 2017. 14. Hardware lifecycle ---------------------- The HP Z620 lab systems and HP server have a 5 year warranty that will expire in June 2018. These systems should be replaced then. We do not have a budget for this, so some external funding (Foundation, etc.) will be required to carry this replacement out. Currently, the SALI package is used to install a new lab image on a Z620 workstation for the lab. The SALI boot image is bare-bones and does not appear to work on a Z640, for example -- the Ethernet interface on the Z640 system board is not detected, probably because of the absence of an appropriate driver for the specific Ethernet hardware on this board. Building a new SALI bootable thumb drive can probably fix this, but the SALI documentation is extremely sparse, and SALI seems not to be maintained regularly. When the systems get replaced, that would be a good time to find a more stable and scalable lab image propagation scheme. 15. Kernel Cleanup ------------------ After several update/upgrade cycles, you may find that the lab machines and/or server have many old kernel versions that take up space in the /boot partition. The older versions can safely be removed, but it's best to keep at least the two most recent versions in place. For example, suppose you do a directory listing of /boot and find the following kernels installed: vmlinuz-3.13.0-51-generic vmlinuz-3.13.0-53-generic vmlinuz-3.13.0-54-generic vmlinuz-3.13.0-55-generic vmlinuz-3.13.0-57-generic You can remove versions 51, 53, and 54 with the following command, as root: aptitude purge linux-image-3.13.0-{51,53,54}-generic If this is done on the golden image, you will need to propagate the changes to the other lab systems: to do this, follow steps 8c through 8f above. If this is done on the server, you will need to reboot the server. Rebooting the server can be done without the need to reboot the lab systems, but this should be done only when the lab systems are lightly used; and in this case, users of the lab systems should be warned that access to their home directory may be disrupted temporarily until the server reboot is complete.