#HPCWorkloads

2 posts loaded — scroll for more

Text

govindhtech 6 months ago

govindhtech

Early Fault-Tolerant Quantum Computing Integration For HPC

HPC Centres Should Adopt Quantum: A New Scientific Computing Era.

Early Fault-tolerant quantum computing

A historic joint report by fault-tolerant quantum computing company Alice & Bob and HPC-AI industry analyst firm Hyperion Research urges HPC centres worldwide to prepare for early fault-tolerant quantum computing (eFTQC) integration. “Seizing Quantum’s Edge: Why and How HPC Should Prepare for eFTQC will solve important scientific problems beyond traditional supercomputing in five years.

The paper shows that HPC and hyperscale data centres are rapidly adopting quantum computing. Experts advise HPC specialists to create and implement effective hybrid processes for near-term applications to manage this urgent transition.

Quantum Imperative: Why HPC Needs eFTQC Now

Quantum integration is urgent since classical system performance has stagnated for ten years. CPU development has been slowed by transistor size and chip energy capacity, signalling the “end of Moore’s Law” in this industry. Meanwhile, the predicted resources needed to execute advanced algorithms like Shor’s have fallen by a factor of 1000, accelerating the timescale for practical quantum computing applications.

Early fault-tolerant quantum computing (eFTQC) features 100-1,000 logical qubits and a logical error rate between 10⁻⁶ and 10⁻¹⁰. These technologies should accelerate scientific computing in five years. Benefits should begin in materials science and swiftly move to quantum chemistry and fusion energy simulations.

Potential Unlocked: HPC Workload Benefits

Bob Sorensen, Hyperion Research Senior Vice President and Chief Analyst for Quantum Computing, stressed the importance of this shift. Future quantum technologies could accelerate and enable “a wide range of critical science and engineering applications, which present a pivotal opportunity for the HPC community.” Sorensen warned that new machines “won’t be plug-and-play,” forcing HPC centres to prepare early to influence system design and gain operating expertise.

Early fault-tolerant quantum computing (eFTQC) could aid top U.S. government research institutes with 50% of their HPC workloads. NERSC, Los Alamos National Laboratory, and DOE leadership computer facilities are included. HPC users will benefit from hybrid HPC-quantum processes moving computationally complex subproblems to quantum computers in accuracy, time-to-solution, and computational cost. Bob, Alice CEO, and Théau Peronnin emphasise the positives.

Also see OQC’s 2034 goal for 50,000 logical qubits in quantum plan.

Call to Action: Integration Preparation

To maximise these benefits, the research recommends adding early fault-tolerant quantum computing (eFTQC), GPUs, and CPUs into supercomputing centres. Co-designing hybrid processes with users and providers, creating effective hardware and software infrastructure, and implementing eFTQC prototypes are essential for a “first-mover advantage.” Recommendations include designing HPC-friendly application codes, trustworthy hybrid software stacks, and substantial HPC user education for early fault-tolerant quantum computing (eFTQC) adoption.

Juliette Peyronnet, Alice & Bob’s U.S. General Manager, co-authored the paper and compared it to prior technology developments. HPC centres should integrate eFTQC shortly for the next major accelerator. From vector computers to GPUs, HPC has quickly adopted disruptive architectures, and quantum computing is no exception. Work with quantum vendors to examine heterogeneous workloads to prepare a workforce and infrastructure for quantum.

Read About QUBO System Reduces Parental Stress, Offers Childcare

Quantum-Enhanced Data Centres and Hyperscalers

Despite its focus on HPC facilities, the report’s findings affect hyperscale cloud providers. Google, Microsoft, and Amazon are investing heavily in quantum research and development, suggesting that early hybrid HPC-quantum workloads may be offered as a service. This integration may give materials research, energy, and pharmaceutical companies an edge in high-value tasks.

HPC and early fault-tolerant quantum computing are now actively shaping data centres and scientific computing. Organisations and providers seeking to lead the next wave of computational science must establish a quantum-ready infrastructure through smart relationships and preemptive preparation. The quantum age has arrived, and high-performance computing will use hybrid computing.

Text

govindhtech 1 year ago

govindhtech

Google Cloud Parallelstore Powering AI And HPC Workloads

Parallelstore

Businesses process enormous datasets, execute intricate simulations, and train generative models with billions of parameters using artificial intelligence (AI) and high-performance computing (HPC) applications for a variety of use cases, including LLMs, genomic analysis, quantitative analysis, and real-time sports analytics. Their storage systems are under a lot of performance pressure from these workloads, which necessitate high throughput and scalable I/O performance that keeps latencies under a millisecond even when thousands of clients are reading and writing the same shared file at the same time.

Google Cloud is thrilled to share that Parallelstore, which is unveiled at Google Cloud Next 2024, is now widely available to power these next-generation AI and HPC workloads. Parallelstore, which is based on the Distributed Asynchronous Object Storage (DAOS) architecture, combines a key-value architecture with completely distributed metadata to provide high throughput and IOPS.

Continue reading to find out how Google Parallelstore meets the demands of demanding AI and HPC workloads by enabling you to provision Google Kubernetes Engine and Compute Engine resources, optimize goodput and GPU/TPU utilization, and move data in and out of Parallelstore programmatically.

Optimize throughput and GPU/TPU use

It employs a key-value store architecture along with a distributed metadata management system to get beyond the performance constraints of conventional parallel file systems. Due to its high-throughput parallel data access, it may overwhelm each computing client’s network capacity while reducing latency and I/O constraints. Optimizing the expenses of AI workloads is dependent on maximizing good output to GPUs and TPUs, which is achieved through efficient data transmission. Google Cloud may also meet the needs of modest-to-massive AI and HPC workloads by continuously granting read/write access to thousands of virtual machines, graphics processing units, and TPUs.

The largest Parallelstore deployment of 100 TiB yields throughput scaling to around 115 GiB/s, with a low latency of ~0.3 ms, 3 million read IOPS, and 1 million write IOPS. This indicates that a large number of clients can benefit from random, dispersed access to small files on Parallelstore. According to Google Cloud benchmarks, Parallelstore’s performance with tiny files and metadata operations allows for up to 3.7x higher training throughput and 3.9x faster training timeframes for AI use cases when compared to native ML framework data loaders.

Data is moved into and out of Parallelstore using programming

For data preparation or preservation, cloud storage is used by many AI and HPC applications. You may automate the transfer of the data you want to import into Parallelstore for processing by using the integrated import/export API. With the help of the API, you may ingest enormous datasets into Parallelstore from Cloud Storage at a rate of about 20 GB per second for files bigger than 32 MB and about 5,000 files per second for smaller files.

gcloud alpha parallelstore instances import-data $INSTANCE_ID
–location=$LOCATION –source-gcs-bucket-uri=gs://$BUCKET_NAME
[–destination-parallelstore-path=”/”] –project= $PROJECT_ID

You can programmatically export results from an AI training task or HPC workload to Cloud Storage for additional evaluation or longer-term storage. Moreover, data pipelines can be streamlined and manual involvement reduced by using the API to automate data transfers.

gcloud alpha parallelstore instances export-data $INSTANCE_ID –location=$LOCATION –destination-gcs-bucket-uri=gs://$BUCKET_NAME
[–source-parallelstore-path=”/”]

GKE resources are programmatically provisioned via the CSI driver

The Parallelstores GKE CSI driver makes it simple to effectively manage high-performance storage for containerized workloads. Using well-known Kubernetes APIs, you may access pre-existing Parallelstore instances in Kubernetes workloads or dynamically provision and manage Parallelstore file systems as persistent volumes within your GKE clusters. This frees you up to concentrate on resource optimization and TCO reduction rather than having to learn and maintain a different storage system.

ApiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: parallelstore-class
provisioner: parallelstore.csi.storage.gke.io
volumeBindingMode: Immediate
reclaimPolicy: Delete
allowedTopologies:

matchLabelExpressions:
key: topology.gke.io/zone values:
us-central1-a

The fully managed GKE Volume Populator, which automates the preloading of data from Cloud Storage straight into Parallelstore during the PersistentVolumeClaim provisioning process, will be available to preload data from Cloud Storage in the upcoming months. This helps guarantee that your training data is easily accessible, allowing you to maximize GPU and TPU use and reduce idle compute-resource time.

Utilizing the Cluster Toolkit, provide Compute Engine resources programmatically

With the Cluster Toolkit’s assistance, deploying Parallelstore instances for Compute Engine is simple. Cluster Toolkit is open-source software for delivering AI and HPC workloads; it was formerly known as Cloud HPC Toolkit. Using best practices, Cluster Toolkit allocates computing, network, and storage resources for your cluster or workload. With just four lines of code, you can integrate the Parallelstore module into your blueprint and begin using Cluster Toolkit right away. For your convenience, we’ve also included starter blueprints. Apart from the Cluster Toolkit, Parallelstore may also be deployed using Terraform templates, which minimize human overhead and support operations and provisioning processes through code.

Respo.vision

Leading sports video analytics company Respo. Vision is using it to speed up its real-time system’s transition from 4K to 8K video. Coaches, scouts, and fans can receive relevant information by having Respo.vision help gather and label granular data markers utilizing Parallelstore as the transport layer. Respo.vision was able to maintain low computation latency while managing spikes in high-performance video processing without having to make costly infrastructure upgrades because to Parallelstore.

The use of AI and HPC is expanding quickly. The storage solution you need to maintain the demanding GPU/TPUs and workloads is Parallelstore, with its novel architecture, performance, and integration with Cloud Storage, GKE, and Compute Engine.