Democratize AI Using Optimized CPUs As The Onramp To Generative AI: Interview with Neural Magic CEO Brian Stevens

Neural Magic helps developers and businesses deploy GenAI into their existing applications with ease and in a more affordable way. “In collaboration with the Institute of Science and Technology Austria, Neural Magic develops innovative LLM compression research and shares impactful findings with the open-source community, including the state-of-the-art Sparse Fine-Tuning technique.” Brian Stevens, CEO of Neural Magic shares more details in this Interview with TechBullion.

Please tell us more about yourself.

My name is Brian Stevens and I am CEO of Neural Magic. I’ve been in the tech industry for more than 30 years and have a successful background of building and advising high-impact companies and driving disruptions that transform the industry. In my career, I’ve served in a variety of executive roles at world-renowned companies including VP and CTO of Google Cloud, and CTO and EVP of Worldwide Engineering at Red Hat. Now at Neural Magic, my aim is to democratize Generative AI for enterprises.

What is Neural Magic’s mission and what inspired you to create an open-source platform for the enterprise? 

Neural Magic is on a mission to help customers innovate with machine learning, without added complexity or cost. We are working to deliver on the promise of AI and unlock it from the hands of research scientists and big tech while opening it to every developer and every IT organization. We aim to unlock organizations from deployment constraints. And finally, we’re unlocking it from those that don’t mind overly complex systems and opening it up to developers who simply want a Python API to integrate AI into their applications.

Generative AI is a trillion-dollar industry. Could you give us an overview of this market and the trends shaping the industry?

Generative AI is a rapidly growing and evolving industry with significant economic potential. It encompasses various technologies that involve the generation of content, data, or media by AI systems, often using deep learning and neural network-based approaches. Some of the trends we are seeing today include advanced generative models, multi-modal models, unsupervised learning, autonomous generative AI, and more. 

Is it costly for businesses to deploy an AI Strategy?

While AI adoption can involve upfront investments, the potential benefits and return on investment often outweigh the costs. The actual cost ultimately depends on several factors such as the complexity and scale of the operation and what the business goals are for implementing AI.

What is a software-delivered AI solution, and how can businesses leverage Neural Magic’s technology to enhance their own generative AI capabilities?

Neural Magic’s software is bringing operational simplicity to GenAI deployments. Whether its ensuring quality or predicting outcomes on the manufacturing floor, reimagining the retail shopping experience with computer vision, leveraging contextual data for a superior customer experience or call center environment, or simply ensuring that applications understand your business’s particular product, financial or legal texts processing requirements, Neural Magic has built a complete set of tools to help you do that.

Could you give us a walkthrough of  how Neural Magic’s DeepSparse Inference Runtime allows for deploying deep learning models on commodity CPUs with GPU-class performance in simple terms?

Neural Magic’s DeepSparse architecture is designed to mimic, on commodity hardware, the way brains compute. It uses neural network sparsity combined with locality of communication by utilizing the CPU’s large fast caches and its very large memory. The Deep Sparse software architecture allows Neural Magic to deliver GPU-class performance on CPUs. We can deliver neural network performance all the way from low power, sparse, in cache computation on mobile and edge devices, to large footprint models in the shared memory of multi-socket servers. Neural Magic architecture allows the CPU’s general-purpose flexibility to open the field to developing the neural networks of tomorrow: Enormous models that mimic associative memory by storing information without having to everytime execute the full network, layer after layer.

Why is it important for our audience (Generative AI, AI company, machine learning) to have access to powerful and cost-effective hardware solutions like Neural Magic offers?

Businesses are adopting AI, GenAI, and ML for a variety of reasons including increased efficiency, cost savings, competitive advantage, decision making, data analysis & insights, improved customer experience, security, and more. Our software is making all these emerging technologies accessible to businesses.

Could you provide use case examples of specific constraints or frustrations that GPUs and existing hardware pose in the field of deep learning and artificial intelligence?

There are several examples of GPU constraints and frustrations including: While GPUs excel in parallel processing, they are not as effective in tasks that require strong single-threaded performance. GPUs also have limitations in terms of memory capacity and bandwidth, which can complicate certain workloads. Some high-performance GPUs can be expensive and can also result in increased power consumption and heat generation. Also, while GPUs have found applications beyond graphics rendering, they are not universally applicable to all types of computing tasks. Some workloads may still be better suited to traditional CPUs or other specialized hardware.

Are there any success stories or case studies showcasing how companies have utilized Neural Magic’s solutions to drive innovation in their respective industries?

Neural Magic customers have reported dramatic increases (nearly 2X) in inference speed, the ability to harness CPUs more cost effectively, reduced infrastructure costs, and better performance than GPUs.

You recently announced a partnership with Akamai. What is the importance of this news for the industry?

This partnership is important because it will advance deep learning capabilities on Akamai’s distributed computing infrastructure and give enterprises a high-performing platform to run deep-learning AI software efficiently on CPU-based servers. This will result in lower latency and improved performance for data-intensive AI applications. The partnership can also help foster innovation around edge-AI inference across a host of industries in which massive amounts of input data is generated close to the edge, placing affordable processing power and security closer to the data sources.

Tell us more about your strategic partnerships with CPU manufacturers, do you have more opportunities for investors and partnerships at Neural Magic ?

Neural Magic has strategic partnerships with CPU manufacturers like AMD and Intel, cloud providers like AWS and Google Cloud, and software vendors like Red Hat and Ultralytics. These partnerships allow Neural Magic to provide value at all levels of the development lifecycle, from the models themselves down to the silicon. We are always open to discussing new business and investor partnership opportunities. Contact us at

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button