
Description: This seminar will serve as an introduction to the capabilities of several of the HPCMP's emerging web-based portal technologies. It will include a basic overview then focus on features that benefit AI & ML. Presentations are from ParallelWorks, ReScale, and InfiniteTactics.
| Presenter(s): ParallelWorks, ReScale, and InfiniteTactics Location: Webcast Date: June 24, 2025 |
Controlled by: DoD HPCMP Controlled by: PET Program CUI Category: OPSEC Limited Dissemination Control: FEDCON POC: Mr. Ronald Hedgepeth, pet@hpc.mil |
CUI
Artificial Intelligence, AI, Machine Learning, ML

Description: This advanced level course will cover how to leverage PyTorch Fully Sharded Data Parallel (FSDP) to train large models in a distributed fashion on HPC systems. FSDP utilizes model sharding to reduce the memory footprint of traditional data parallel training at the added cost of additional communication overhead. This course will cover the various performance tradeoffs and how settings can be right-sized for a particular model to ensure optimal performance is obtained without overflowing GPU memory. Examples will be covered with differing model sizes, with a particular attention to vision models. You will learn how various FSDP settings affect memory usage and communication overhead through these examples. In addition to FSDP, other memory-efficient training strategies will be discussed, including activation checkpointing and automated mixed-precision.
| Presenter(s): Dr. Mathew Boyer, GDIT / PET Location: Webcast Date: June 24, 2025 |
Controlled by: DoD HPCMP Controlled by: PET Program CUI Category: OPSEC Limited Dissemination Control: FEDCON POC: Mr. Ronald Hedgepeth, pet@hpc.mil |
CUI
Artificial Intelligence, AI, Machine Learning, ML

Description: This seminar introduces techniques for AI/ML inference on HPCMP machines, including multi-GPU/multi-node model loading, performance optimizations and inference servers. Examples mostly involve LLMs but many of the same techniques and optimizations are also relevant for non-LLM applications.
| Presenter(s): Calvin Anderson, PhD, GDIT / PET Location: Webcast Date: June 24, 2025 |
Controlled by: DoD HPCMP Controlled by: PET Program CUI Category: OPSEC Limited Dissemination Control: FEDCON POC: Mr. Ronald Hedgepeth, pet@hpc.mil |
CUI
Artificial Intelligence, AI, Machine Learning, ML

Description: The 2025 HPCMP AI & ML Virtual Workshop was a three-day interactive event designed to advance the capabilities of the Department of Defense's High Performance Computing (HPC) AI/ML user community. This year’s workshop featured expert-led training, user-driven challenge presentations, and engaging panel discussions with leadership and developers across services. Participants gained technical insights, share real-world challenges, and influence the future direction of HPC AI/ML capabilities through collaborative discussion and feedback.
Sessions include hands-on training on cutting-edge tools such as PyTorch FSDP and TorchServe, overviews of web portal technologies, and opportunities to engage directly with developers and leadership through Q&A panels and innovation forums. A key feature is the HPC User Challenge, where selected real-world AI/ML problems submitted by participants are solved and showcased by the PET team. This was a valuable opportunity to learn, network, and shape the future of AI/ML on HPC systems.
| Presenter(s): Various HPCMP and Vendors Location: Webcast Date: June 24-26, 2025 |
Controlled by: DoD HPCMP Controlled by: PET Program CUI Category: OPSEC Limited Dissemination Control: FEDCON POC: Mr. Ronald Hedgepeth, pet@hpc.mil |
CUI
Artificial Intelligence, AI, Machine Learning, ML
Description: This advanced level course will cover how to fine-tune a large language model with custom data on HPC resources. Users will be shown examples of the following:
- Full-parameter fine-tuning with unsupervised learning on custom data using PyTorch Fully Sharded Data Parallel
- Low-rank adaptation fine-tuning with unsupervised learning on custom data using Parameter-Efficient Fine-Tuning (PEFT) and PyTorch Distributed Data Parallel (DDP)
- Low-rank adaptation with supervised learning to instruction-tune a pretrained model using open-source data.
The examples will use a single initial pretrained model to add understanding of DoD-related concepts and instruction-following capabilities. The course will cover how to format data for unsupervised and supervised fine-tuning, how to leverage multiple GPUs across multiple nodes to accelerate training, hardware constraints on model selection for fine-tuning, and hyperparameter selection.
| Presenter(s): Dr. Mathew Boyer, GDIT / PET Location: Webcast Date & Time: October 30, 2024, 2:00p - 3:30p ET |
Controlled by: DoD HPCMP Controlled by: PET Program CUI Category: OPSEC Limited Dissemination Control: FEDCON POC: Mr. Ronald Hedgepeth, pet@hpc.mil |
CUI
- Presenter: Mathew Boyer
Description: This guide will use Meta’s Segment Anything Model (SAM) as an example on how to get started with a vision model that requires a visual interface on an HPC so users can bring their own data and explore how to use this technology for their missions. .
| Presenter(s): Dr. Mathew Boyer, GDIT/PET Location: Doolittle Institute, Niceville, FL Date & Time: June 25, 2024 |
Controlled by: DoD HPCMP Controlled by: PET Program CUI Category: OPSEC Limited Dissemination Control: FEDCON POC: Mr. Ronald Hedgepeth, pet@hpc.mil |
CUI
- Presenter: Mathew Boyer
Description: An intermediate-to-advanced course showing users how to use GPUs most effectively on HPCs. It starts with ensuring that the user is indeed using a GPU device and that the deep learning packages were installed properly. Next, a simple training example is demonstrated on a single GPU with TensorFlow and PyTorch. These examples are then extended to multiple GPUs on a single node. Finally, the problem is scaled up to multiple nodes for both frameworks.
| Presenter(s): Dr. Calvin Anderson, GDIT / PET Location: Doolittle Institute, Niceville, FL Date: June 24, 2024 |
Controlled by: DoD HPCMP Controlled by: PET Program CUI Category: OPSEC Limited Dissemination Control: FEDCON POC: Mr. Ronald Hedgepeth, pet@hpc.mil |
CUI
- Presenter: Calvin Anderson
Description: This course begins with an introduction to basic data management including transferring files to/from HPCMP systems, the purposes and uses of different types of HPC storage, and permanent archiving of data. It then moves to intermediate topics around how data management interacts with the HPC, such as with batch jobs and node types. The advanced portion of the course includes how to share data with other groups on the HPCs, how to keep files synchronized across remote systems, managing code as data, and finally, AI&ML specific data concerns.
| Presenter(s): Dr. Sean Ziegeler and Stefanie Whittaker, GDIT / PET Location: Doolittle Institute, Niceville, FL Date: June 24, 2024 |
Controlled by: DoD HPCMP Controlled by: PET Program CUI Category: OPSEC Limited Dissemination Control: FEDCON POC: Mr. Ronald Hedgepeth, pet@hpc.mil |
CUI
- Presenter: Stefanie Whittaker
- Presenter: Sean Ziegeler
Description: Introduces AI&ML users to basic HPC concepts tailored toward their specific needs. This assumes little experience with HPC (though we do recommend attending or viewing on-demand our New Account Orientation first), but it may also fill in gaps that more experienced users didn’t know about. It walks the user through the steps of getting allocations on the right systems, how to use the HPCs for common AI&ML workflows, how to access and set up the common software packages, and includes some exercises with actual ML code.
| Presenter(s): Dr. Zac Lamb, GDIT / PET Location: Doolittle Institute, Niceville, FL Date: June 24, 2024 |
Controlled by: DoD HPCMP Controlled by: PET Program CUI Category: OPSEC Limited Dissemination Control: FEDCON POC: Mr. Ronald Hedgepeth, pet@hpc.mil |
CUI
- Presenter: Zachary Lamb
Description: This quick start guide will cover how users can build a chat bot that runs locally on an HPC using HuggingFace Transformers and Langchain. The open-source Mistral 7B-Instruct large-language model will be used to create an inference pipeline in a Jupyter notebook utilizing retrieval augmented generation to couple the model with a database containing external documentation from DOE software. The tutorial will cover how to build the container, how to preprocess documents, and how to build an inference pipeline. The demonstration will include utilizing the chat history for conversational AI.
| Presenter: Dr. Mathew Boyer, GDIT / PET Location: Webcast Date & Time: May 7, 2024, 2:00p - 3:00p ET |
Controlled by: DoD HPCMP Controlled by: PET Program CUI Category: OPSEC Limited Dissemination Control: FEDCON POC: Mr. Ronald Hedgepeth, pet@hpc.mil |
CUI
- Presenter: Mathew Boyer
