HPCAG Meeting #2 – 28 January 2016
- Owkes, Mark <firstname.lastname@example.org>
- Johnson, Erick <email@example.com>
- Poulter, Benjamin <firstname.lastname@example.org>
- Young, Mark <email@example.com>
- Dlakic, Mensur <firstname.lastname@example.org>
- Sheppard, John <email@example.com>
- Wright, Michael <firstname.lastname@example.org>
- Jerry Sheehan, Pol Llovet, Aurelien Mazurie, Thomas Heetderks
- Yunes, Nicolas <email@example.com>
- Lawrence, Martin <firstname.lastname@example.org>
- Rossmann, Doralyn <email@example.com>
MINUTES: HPCAG Meeting #2
- Welcome & Introduction - Jerry
- Queue Limits & Optimization - Pol
(handout: Hyalite Queue Specification)
- (disclaimer) for our initial SLURM queue configuration– we started simple, with NO job limits.
- because of the way SLURM functions, the tyranny of long running jobs means we need to establish job limits.
- we have these queue job limit recommendations (see the handout)– we need your imput on these recommendations.
- these changes will go into effect after the February 4 maintenance window (ACTION ITEM).
- discussion: can we setup job monitoring (to identify jobs running a long time without much activity)? -yes (ACTION ITEM).
- once these queue job limits are implemented– we can adjust these limits as necessary.
(handout: Research Storage Technical Strategy)
- we are acquiring a backup appliance for on-site backup storage–
- we will be spending about $24K to purchase this appliance within the next 2 weeks (from January 28).
- this will probably live in the Renne Data Center (on-site, but in a different building).
- this will provide about 50TB of storage to start– this will be expanded and we'll probobly use compression for more capacity.
- this will be highly durable storage with bit-rot protection.
- we are in negotians with Indiana University to provide off-site backup storage–
- (Jerry) this is right now near complete– we are finalizing legal language for data rights retention.
- we will get about 50TB of off-site storage (again, we will probably use compression).
- we currently backup your Hyalite HOME directories on a nightly basis (to a space on the Lustre file system)– these are the basis for future on-site and off-site backups.
- (Jerry) we have 3 primary storage needs–
- deep archival/backup storage– we are doing this right now (ACTION ITEM).
- project storage– we will execute an RFP to acquire this storage down the road.
- instrament backup for big data from labs– we will address this at a later point.
- we have an opportunity to acquire a campus wide MATLAB Site License for $47K.
- this license will allow MSU to–
- install MATLAB on all faculty/student/staff computers and on Hyalite.
- access 16 of the most popular MATLAB toolboxes including all that we currently use and the parallel computing toolbox.
- we intend to request CFAC funding for this MATLAB License.
- we would like the endorsement of this group (to add to our other endorsements) for this request (ACTION ITEM).
- RCi will write a MATLAB Site License letter of endorsement for your signitures
- RCi will move these job queue limits into production during the February 4 maintenance window
- RCi will check with BIOSIT about setting up job monitoring
- Within two weeks (of January 28) RCi will purchase a storage appliance for backup/archival stroage
- Cluster Software