ATTENDING

  • Owkes, Mark <mark.owkes@montana.edu>
  • Johnson, Erick <erick.johnson@montana.edu>
  • Poulter, Benjamin <benjamin.poulter@montana.edu>
  • Young, Mark <myoung@montana.edu>
  • Dlakic, Mensur <mdlakic@montana.edu>
  • Sheppard, John <john.sheppard@montana.edu>
  • Wright, Michael <mwright@montana.edu>
  • Jerry Sheehan, Pol Llovet, Aurelien Mazurie, Thomas Heetderks

ABSENT

  • Yunes, Nicolas <nicolas.yunes@montana.edu>
  • Lawrence, Martin <lawrence@chemistry.montana.edu>
  • Rossmann, Doralyn <doralyn@montana.edu>

MINUTES: HPCAG Meeting #2

  1. Welcome & Introduction - Jerry
  2. Queue Limits & Optimization - Pol
    (handout: Hyalite Queue Specification)
    • (disclaimer) for our initial SLURM queue configuration– we started simple, with NO job limits.
    • because of the way SLURM functions, the tyranny of long running jobs means we need to establish job limits.
    • we have these queue job limit recommendations (see the handout)– we need your imput on these recommendations.
    • these changes will go into effect after the February 4 maintenance window (ACTION ITEM).
    • discussion: can we setup job monitoring (to identify jobs running a long time without much activity)? -yes (ACTION ITEM).
    • once these queue job limits are implemented– we can adjust these limits as necessary.
  3. Storage Strategy - Pol
    (handout: Research Storage Technical Strategy)
    • we are acquiring a backup appliance for on-site backup storage–
      • we will be spending about $24K to purchase this appliance within the next 2 weeks (from January 28).
      • this will probably live in the Renne Data Center (on-site, but in a different building).
      • this will provide about 50TB of storage to start– this will be expanded and we'll probobly use compression for more capacity.
      • this will be highly durable storage with bit-rot protection.
    • we are in negotians with Indiana University to provide off-site backup storage–
      • (Jerry) this is right now near complete– we are finalizing legal language for data rights retention.
      • we will get about 50TB of off-site storage (again, we will probably use compression).
    • we currently backup your Hyalite HOME directories on a nightly basis (to a space on the Lustre file system)– these are the basis for future on-site and off-site backups.
    • (Jerry) we have 3 primary storage needs–
      1. deep archival/backup storage– we are doing this right now (ACTION ITEM).
      2. project storage– we will execute an RFP to acquire this storage down the road.
      3. instrament backup for big data from labs– we will address this at a later point.
  4. MATLAB Licensing Opportunity - Pol
    • we have an opportunity to acquire a campus wide MATLAB Site License for $47K.
    • this license will allow MSU to–
      • install MATLAB on all faculty/student/staff computers and on Hyalite.
      • access 16 of the most popular MATLAB toolboxes including all that we currently use and the parallel computing toolbox.
    • we intend to request CFAC funding for this MATLAB License.
    • we would like the endorsement of this group (to add to our other endorsements) for this request (ACTION ITEM).

ACTIONS

  • RCi will write a MATLAB Site License letter of endorsement for your signitures
  • RCi will move these job queue limits into production during the February 4 maintenance window
  • RCi will check with BIOSIT about setting up job monitoring
  • Within two weeks (of January 28) RCi will purchase a storage appliance for backup/archival stroage

FUTURE AGENDA

  • Cluster Software
  • -