Category: Machine Learning

Transformers Key-Value Caching Explained

As the complexity and size of transformer-based models grow, so does the need to optimize their inference speed, especially in chat applications where the users expect immediate replies. Key-value (KV) caching is a clever trick to do that: At inference time, key and value matrices are calculated for each generated token. KV caching stores these
Read More »

Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection]

In our paper, Understanding LLMs Requires More Than Statistical Generalization, we argue that current machine learning theory cannot explain the interesting emergent properties of Large Language Models, such as reasoning or in-context learning. From prior work (e.g., Liu et al., 2023) and our experiments, we’ve seen that these phenomena cannot be explained by reaching globally
Read More »

Anatomy of a Parquet File

In recent years, Parquet has become a standard format for data storage in Big Data ecosystems. Its column-oriented format offers several advantages: Faster query execution when only a subset of columns is being processed Quick calculation of statistics across all data Reduced storage volume thanks to efficient compression When combined with storage frameworks like Delta
Read More »

How to Build a RAG System Using LangChain, Ragas, and Neptune

LangChain provides composable building blocks to create LLM-powered applications, making it an ideal framework for building RAG systems. Developers can integrate components and APIs of different vendors into coherent applications. Evaluating a RAG system’s performance is crucial to ensure high-quality responses and robustness. The Ragas framework offers a large number of RAG-specific metrics as well
Read More »

Essential Review Papers on Physics-Informed Neural Networks: A Curated Guide for Practitioners

Staying on top of a fast-growing research field is never easy. I face this challenge firsthand as a practitioner in Physics-Informed Neural Networks (PINNs). New papers, be they algorithmic advancements or cutting-edge applications, are published at an accelerating pace by both academia and industry. While it is exciting to see this rapid development, it inevitably
Read More »

Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) process data from different modalities like text, audio, image, and video. Compared to text-only models, MLLMs achieve richer contextual understanding and can integrate information across modalities, unlocking new areas of application. Prime use cases of MLLMs include content creation, personalized recommendations, and human-machine interaction. Examples of MLLMs that process image
Read More »

One Turn After Another | Towards Data Science

While some games, like rock-paper-scissors, only work if all payers decide on their actions simultaneously, other games, like chess or Monopoly, expect the players to take turns one after another. In Game Theory, the first kind of game is called a static game, while turn-taking is a property of so-called dynamic games. In this article,
Read More »

Hyperparameter Optimization For LLMs: Advanced Strategies

Finding an optimal set of hyperparameters is essential for efficient and effective training of Large Language Models (LLMs). The key LLM hyperparameters influence the model size, learning rate, learning behavior, and token generation process. Due to their computational demands, traditional methods for optimizing hyperparameters, such as grid search, are impractical for LLMs. Advanced hyperparameter optimization
Read More »

Effortless Spreadsheet Normalisation With LLM

This article is part of a series of articles on automating Data Cleaning for any tabular dataset. You can test the feature described in this article on your own dataset using the CleanMyExcel.io service, which is free and requires no registration. Start with the why Let’s consider this Excel spreadsheet, which contains information on awards
Read More »