Tannic
A C++ Tensor Library
|
While exploring the most recent models, I began noticing some weird patterns, CUDA kernels hardcoded as strings, pointers, and constexpr hacks embedded in Python sublanguages. I’m not saying this approach is inherently bad, but I couldn’t shake the feeling that it would be far more sane to rewrite everything directly in C++ using those features directly.
On the other hand, many existing C++ frameworks, while fully native, are low-level and hard to use or extend. They often force developers to manage complex memory layouts or backend-specific details using macros, which makes adding new operations or integrating new hardware backends cumbersome.
This insight led me to create Tannic, a lightweight, fully C++ tensor library designed from the ground up for clarity, composability, and extensibility. It maintains a Python-like API feel, so developers can enjoy familiar, intuitive syntax while working entirely in C++. The library is designed to be easy to adopt, easy to extend, and consistent in its behavior—even as new operations, data types, or backends are added.
Tannic is an extensible C++ tensor library built around a host–device execution model. Unlike monolithic frameworks, it provides only a minimal set of built‑in operators, focusing on a flexible architecture where new operations, data types, and backends can be added easily. This approach keeps the library lightweight while enabling adaptation to a wide range of computational needs.
This library is designed to serve as the foundational core for a neural network inference framework, but is equally suited to other domains such as classical ML or physics simulations—all without requiring Python.
Below is a minimal example demonstrating tensor creation, initialization, basic indexing, and arithmetic operations with Tannic:
It will output:
Equivalent PyTorch code for comparison:
Giving:
Note: Tannic is currently in an early development stage. It is functional but not fully optimized, and some features may still have bugs. The C backend API—used to extend the library—is under active development and may change significantly. The public API described in the documentation is mostly stable, with only minor breaking changes expected as the library evolves.
While the library is currently written in C++23, the arrival of C++26, is shaping up to be a monumental- too significant to ignore. At some point, it may be requirement for Tannic.
This guide is currently in a “works on my machine” state. If you encounter any issues while building Tannic, your feedback is greatly appreciated, please open an issue or submit a pull request. Contributions to improve this guide are also very welcome!
Other optional requirements may be added in the future. Also the arrival of C++26, is shaping up to be a huge and too significant to ignore. At some point, it may be requirement for Tannic.
Use this for development — includes extra checks, assertions, and debug symbols for easier troubleshooting.
Use this for deployment or benchmarking — builds with full compiler optimizations and without debug checks.
You can run the example provided in the main.cpp from the build folder:
CUDA support is enabled by default if a compatible CUDA toolkit (12+) is detected during configuration. If no CUDA installation is found, Tannic will automatically fall back to a CPU‑only build. You can explicitly disable CUDA with:
These defaults provide a fast setup for development with the current state of the library. As Tannic evolves, CUDA configuration options and behavior may change.
To use Tannic, simply include it in your project and interact with it similarly to a Python framework:
Functions in Tannic do not immediately compute results. Instead, they build Expression objects, described in detail in the concepts documentation. Basically an Expression
is any class that follows the pattern:
Any class that follows that pattern is a valid Tannic expression and can interact with other components of the library. All available expressions are detailed under the class list section. You can scroll to the members of each expression and find information about how dtypes are promoted, or how shapes are calculated.
The library is built around the Host-Device computational model, so in order to use CUDA you just have to initialize kernels tensors on the Device you want to use, for example:
The library currently lacks of some easily implementable CUDA features like copying a tensor from Host to Device and viceversa or printing CUDA tensors, I will add them soon.
Data types are dynamic to make it easier to serialize and deserialize tensors at runtime, and deliver machine learning models that can work with arbitrary data types. They are represented using a C enum to be compatible with the C api runtime on wich the library relies on.
This design also paves the way for future features such as tensor quantization.
Contributions to Tannic are welcome! Whether you’re reporting bugs, proposing features, improving documentation, or optimizing kernels, your help is greatly appreciated.
can be refactored into:
This is especially important in CUDA tests, where manually copying device memory sync to the host hurts test performance.
Fork the repository and create a new branch for your feature or bug fix. Example:
Open a pull request describing:
Target branch: PRs for now should just target main till the library matures.
The project is organized into the following main components:
Tensor
class) may become constexpr in the future.Vendor-specific constructs (e.g., streams, events, handles) must not be exposed in the C API. Instead, they should be abstracted using IDs or type erasure.
cudaStream_t
represents a computation queue, which is not specific to CUDA. In the C API, it is stored in a type-erased form:```c struct stream_t { uintptr_t address; };
```
Then if it was created as a cuda stream will be recovered as:
```cpp cudaStream_t cudaStream = reinterpret_cast<cudaStream_t>(stream.address); ```
Currently what is in the src root folder is a mess, lot of code repetition and nasty helper functions with bunch of templates. I promise I will take the time to refactor this but after I find a way to dynamically add and remove nodes from a graph dynamically based on reference counting. This refactor won't change anything on the public API.
The C++ frontend is based on templated expressions, this means that when you write an expression, for example:
The result is not computed inmediatly, instead a templated expression is created with type:
This expression holds the the following methods:
This allows you to:
Create new symbols: All expressions that follows the concept (don't worry this is just like a python protocol):
Will work with current Tensor
class and other templated expressions in this library.
Scalar
, Parameter
, Sequence
classes and plug them into operations like if they were tensors, the resulting expression will be something like this:Finally the computation will be done when calling forward or when the expression is assigned to a non const tensor:
This allows a python alike behavior since you can chain operations on the same variable like this:
Tannic is licensed under the Apache License 2.0, a permissive open-source license that allows you to use, modify, and distribute the code freely—even in commercial projects.
By contributing, you agree that your contributions will also be licensed under Apache 2.0 and that proper attribution is appreciated.
The only thing I ask in return is proper credit to the project and its contributors. Recognition helps the project grow and motivates continued development.