Launching a local AI model server for a German client

Read 9 min.

Goals & Results

The Context

In the heart of Germany, nestled in a bustling tech hub, there was a daring startup with a vision that could reshape the digital landscape. This innovative company, known for its cutting-edge work in artificial intelligence, had set its sights on something truly extraordinary. They wanted to create an AI language model that could rival the legendary ChatGPT, but with a twist—it had to run entirely on their own network.

This wasn’t just any ordinary project. The startup had already made waves with their groundbreaking app, a sleek and sophisticated chatbot designed to replicate human-like conversation. Imagine a digital companion so intuitive, so sharp, that it could understand your questions and respond with the same nuance and depth as a seasoned expert.

But how did they do it? The secret lay in their mastery of machine learning. They had developed complex algorithms that allowed their AI to learn language from vast amounts of data, recognizing patterns and making decisions on the fly. Unlike traditional software that must be programmed for every possible scenario, their AI could think for itself, adapting and evolving with each new interaction.

The app had already earned a reputation for delivering lightning-fast answers to even the most challenging queries. Its knowledge base was expansive, drawing from an internet of information to provide users with precise, contextually aware responses. Whether you were asking about the mysteries of the universe or the latest trends in technology, this AI has you covered.

As the startup continued to refine their creation, it became clear that they were on the brink of something revolutionary. Their AI tool wasn’t just about answering questions; it was about transforming how people interacted with information. With its user-friendly interface and unparalleled accuracy, it quickly became an indispensable resource for anyone in need of quick, reliable answers.

The German startup’s dream was more than just a technical achievement, it was a leap forward in the way we understand and use AI, and as they prepared to bring their vision to life on their own servers, they knew they were about to make history.

Problem

The stakes were high. The client, a visionary in the world of AI, had a dream that demanded nothing less than cutting-edge technology. They needed a server—a juggernaut of computing power—that could stand up to the titans of the AI world: Meta's LLaMA, Google's Gemini, and Mistral. These aren’t just any language models; they are the pinnacles of modern AI, requiring immense computational muscle to process and analyse an internet’s worth of data with pinpoint accuracy.

In the world of AI, power isn’t just a luxury—it’s a necessity. The client knew that to harness the full potential of these models, their server needed to be an engine, capable of handling intense workloads without so much as a flicker of instability. The slightest glitch could spell disaster, disrupting the seamless flow of information and leading to frustrating delays or, worse, inaccurate results. But with the right server, one built to endure and excel, the client could unlock the full power of their AI solution, delivering lightning-fast, reliable responses every single time.
Yet, the need for power didn’t stop at raw performance. The client also understood that the world of AI is ever-changing, with new knowledge and breakthroughs emerging at a relentless pace. To stay ahead, their AI solution needed to evolve just as quickly, incorporating the latest advancements to remain sharp, relevant, and astonishingly accurate. This meant regular updates and an ongoing commitment to perfection. Plus only a server with the capacity for continuous, efficient updates could ensure the AI remained at the cutting edge, always ready to answer the next big question.

For this client, the server wasn’t just hardware; it was the mind of their AI empire, the key to transforming a vision into reality. With a server that could meet these immense demands, the client would not only push the boundaries of what AI could do but also redefine the future of intelligent technology.

Solution

In the high-stakes world of AI, speed is everything. When it comes to running an AI language model, the ability to process information at lightning speed can mean the difference between brilliance and mediocrity. The challenge is immense: to sift through an internet of data, analyse it in real time, and deliver insights with precision—all in the blink of an eye.

To rise to this challenge, we knew we needed more than just a powerful system; we needed a technological masterpiece. That’s why we selected a server armed with the NVIDIA Tesla V100 GPU, a beast of a machine known for its unparalleled performance. This isn’t just any GPU—it’s the pinnacle of NVIDIA’s lineup, powered by advanced Tensor Core technology that takes AI processing to the next level.

Imagine a machine so powerful, it can tackle the most complex AI tasks with ease, effortlessly crunching through data at mind-boggling speeds. The Tesla V100 is designed to handle the demands of modern AI, making it the perfect choice for any scenario where rapid, high-volume data processing is non-negotiable. With this GPU at the core, the AI language model becomes a force to be reckoned with, capable of delivering results faster and more accurately than ever before.

In the hands of this powerhouse, the AI isn’t just responsive—it’s supercharged, ready to take on the most intricate challenges with finesse. The Tesla V100 doesn’t just meet the requirements of AI; it obliterates them, setting a new standard for what’s possible in the world of intelligent technology.

The process of choosing server configuration

Returning to the process of selecting the ideal server configuration for our customer, it looked as follows:

After carefully gathering the necessary information from the customer regarding their desired tasks, we provided a tailored configuration that meets specific requirements and ensures efficient processing of large amounts of data for AI model training.

The initial configuration we suggested was as follows (note that it can be customised according to the customer’s preferences):

• Processor: 2 x Intel Xeon Gold 6248R

• RAM: 512 GB DDR4

• Storage: 4 TB SSD NVMe

• Graphics cards: 4x NVIDIA Tesla V100

Before delivering the final server solution, it was necessary to further refine the customer-oriented server configuration. To achieve this, we initiated an inquiry to gather information on the following aspects:

The planned amount of data to be processed and trained on the AI model.
Preferences regarding specific GPUs and other important components.

After receiving additional input from the client regarding their data volume and hardware preferences, we proposed a final server configuration that met the following specific requirements.

Server Rental: Considering the client’s initial requirements, the configuration needed powerful processors, a large amount of RAM, and several graphics cards.

Use of Virtualization Software: To meet the client’s needs, we implemented a virtualization-based infrastructure. This setup included multiple virtual machines, each equipped with its own graphics adapters.

Installation of Language Model Servers: Ollama and OpenWebUI language model servers were installed on the virtual machines, along with a server that provided access to a user-friendly and secure web UI for managing the language models, such as AnythingLLM. AnythingLLM also offered API access for integration with other client developments.

Model Launch: The client successfully launched their artificial intelligence model, ensuring stable operation and high performance on the server.

Conclusion

The moment the Tesla V100 was integrated into the client’s server, it was like unleashing a dormant powerhouse. The server’s performance didn’t just improve—it skyrocketed, shattering the limitations of traditional CPUs. With the Tesla V100’s extraordinary GPU capabilities, the server gained unprecedented power and throughput, outpacing the constraints of single-CPU systems with ease.

This leap in technology wasn’t just about hardware; it was about transforming the client’s entire AI project. Our innovative approach to optimising the server configuration became the backbone of their success. As the AI language model went live on this turbocharged server, the results were nothing short of spectacular:

Unparalleled Service Quality

The AI’s ability to deliver rapid, precise responses took a quantum leap. Users began experiencing faster, more accurate answers, with the model deftly navigating even the most complex queries. This wasn’t just an upgrade, it was a revolution in user satisfaction and service performance.

Boosted Performance

Training the AI model became a streamlined, efficient process, thanks to the immense computing power at its disposal. Large datasets that once bogged down systems were now processed at breakneck speeds, slashing training times and accelerating the model’s evolution. The AI’s implementation was faster, smoother, and more effective than ever before.

Limitless Scalability

The server architecture we crafted wasn’t just built for today, it was engineered for the future. With scalability baked into its core, the client could effortlessly expand their project as their user base grew and demands increased. This flexibility ensured that the AI could evolve alongside the client’s ambitions, without the need for drastic technical overhauls.

In the end, this wasn’t just a project—it was a triumph. By integrating the AI language model onto their own server, the client unlocked a trifecta of benefits: vastly improved service quality, enhanced performance, and added robust scalability. These achievements didn’t just meet expectations; they set a new standard for what’s possible when cutting-edge technology meets visionary execution. The future of AI had arrived, and it was brighter than ever.