Job ID: 2019-10265 Type: Full-Time # of Openings: 1 Category: Information Technology
Princeton University’s Plasma Physics Laboratory (PPPL) has an opening in the Information Technology Department for a High Performance Computing Team Lead. Reporting to the Chief Information Officer, the successful candidate will be responsible for maintaining, designing, planning, and implementation of PPPL’s research grade high performance computing environments and HPC support staff. This person will also be responsible for working closely with PPPL staff to assure their research computing requirements are being addressed and appropriately prioritized. The ideal candidate must have strong written and verbal communications skills.
Provides leadership for the planning and implementation of high performance computing systems. Cultivate a high level of collaboration by regularly meeting with other team leads and working groups.
Troubleshoots and resolves complex software, operating system, and network problems and determines whether the problem is system-related, hardware, software or the end–user. Relies upon extensive knowledge of server and desktop systems, vendor supplied diagnostic tools and web based information to determine the reason for the malfunction and the appropriate solution to resolve the problem. Must be able to make independent decisions to best resolve the problems.
Develops, tests, implements, installs and maintains the operating system and the related software for proper server system operation. Configures servers using extensive knowledge of various computers, installs drivers, hardware components and various operating systems. Including monitoring server system backups and when necessary, performs data recovery.
Assists in the troubleshooting of escalated end-user system issues to help maintain consistent lab-wide computing.
Troubleshoot and maintain cyber-security issues pertaining to internal and external firewall and system configurations and settings to meet government cyber-security requirements and also provide consistent and secure networking lab-wide. Backup to the Laboratory’s Cyber Security Officer, participating in the incident response process and assessment of cyber security requirements and controls, log reviews and forensics and vulnerability scanning and remediation.
Documents server system problems related to hardware, software and setup of prescribed formats, resolving them independently or referring them to the immediate supervisor as needed.
Provide recommendations for non-desktop hardware based on detailed project specifications and changing environment needs. Other duties as assigned.
Consults with users, vendors and other IT staff to design and specify research computing systems and storage needed
Installs, maintains and administers research computing systems and clusters
Analyzes and troubleshoots system level problems with software, data and job submissions
Enhance communication and productivity by regularly meeting with other team leads and in working groups
Create and maintain documentation for all systems to ensure greater collaboration and understanding of the environment
Assist in troubleshooting user related issues, such as code development and deployment
Provide training where necessary
Utilize monitoring and diagnostic tools for preventative maintenance of enterprise systems
Research and provide new technologies based on changing requirements
Design, deploy and maintain automated configuration management of Linux based systems
Bachelor of Science in Computer Science or related field
5+ Years experience in managing High Performance Computing environments
Experience managing technical staff
Knowledge of parallel filesystems (such as Lustre) and high speed interconnects (Infiniband, ethernet fabrics)
Strong knowledge of job scheduling technology, such as SLURM
Strong oral and written communication skills
Strong multitasking skills
Experience with configuration management systems, such as Puppet
Ability to work with and follow guidelines set forth in security benchmarks, such as CIS
Ability to architect technical solutions for specialized software and data
General knowledge of networking equipment and techniques
Princeton University is an Equal Opportunity/Affirmative Action Employer and all qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity or expression, national origin, disability status, protected veteran status, or any other characteristic protected by law. EEO IS THE LAW
Internal Number: 111656970
About Princeton University
Princeton University is a vibrant community of scholarship and learning that stands in the nation's service and in the service of all nations. Chartered in 1746, Princeton is the fourth-oldest college in the United States. Princeton is an independent, coeducational, nondenominational institution that provides undergraduate and graduate instruction in the humanities, social sciences, natural sciences and engineering.As a world-renowned research university, Princeton seeks to achieve the highest levels of distinction in the discovery and transmission of knowledge and understanding. At the same time, Princeton is distinctive among research universities in its commitment to undergraduate teaching.Today, more than 1,100 faculty members instruct approximately 5,200 undergraduate students and 2,600 graduate students. The University's generous financial aid program ensures that talented students from all economic backgrounds can afford a Princeton education.