CEX.C Language Documentation
Getting started with CEX.C
What is CEX
Cex.C (pronounced as cexy) is Comprehensively EXtended C Language . CEX was born as alternative answer to a plethora of brand new LLVM based languages which strive to replace old C. CEX still remains C language itself, with small but important tweaks that makes CEX a completely different development experience.
I tried to bring best ideas from the modern languages while maintaining smooth developer experience for writing C code. The main goal of CEX is to provide tools for developers and helping them writing high quality C code in general.
Core features
- Single header, cross-platform, drop-in C language extension
- No dependencies except C compiler
- Self contained build system: CMake/Make/Ninja no more
- Modern memory management model
- New error handling model
- New strings
- Namespaces
- Code quality oriented tools
- New dynamic arrays and hashmaps with seamless C compatibility
Solving old C problems
CEX is another attempt to make old C a little bit better. Unlike other new system languages like Rust, Zig, C3 which tend to start from scratch, CEX focuses on evolution process and leverages existing tools provided by modern compilers to make code safer, easy to write and debug.
C Problem | CEX Solution |
---|---|
Bug prone memory management | CEX provides allocator centric and scoped memory allocation. It uses ArenaAllocators and Temporary allocator in mem$scope() which decrease probability of memory bugs. |
Unsafe arrays | Address sanitizers are enabled by default, so you’ll get your crashes as in other languages. |
3rd party build system | Integrated build system, eliminates flame wars about what it better. Now you can use Cex to run your build scripts, like in Zig |
Rudimentary error handling | CEX introduces Exception type and compiler forces you to check it. New error handling approach make error checking easy and open cool possibilities like stack traces in C. |
C is unsafe | Yeah, and it’s a cool feature! On other hand, CEX provides unit testing engine and fuzz tester support out of the box. |
Bad string support | String operations in CEX are safe, NULL and buffer overflow resilient. CEX has dynamic string builder, slices and C compatible strings. |
No data structures | CEX has type-safe generic dynamic array and hashmap types, they cover 80% of all use cases. |
No namespaces | It’s more about LSP, developer experience and readability. It much better experience to type and read str.slice.starts_with than str_slice_starts_with . |
Making new CEX project
You can initialize a working boiler plate project just using a C compiler and the cex.h
file.
Make sure that you have a C compiler installed, we use cc
command as a default compiler. You may replace it with gcc or clang.
- Make a project directory
mkdir project_dir
cd project_dir
- Download cex.h
- Make a seed program
At this step we are compiling a special pre-seed program that will create a template project at the first run
cc -D CEX_NEW -x c ./cex.h -o ./cex
- Run cex program for project initialization
Cex program automatically creating a project structure with sample app and unit tests. Also it recompiles itself to become universal build system for the project. You may change its logic inside cex.c
file, this is your build script now.
./cex
- Now your project is ready to go
Now you can launch a sample program or run its unit tests.
./cex test run all
./cex app run myapp
- This is how to check your environment and build variables
> ./cex config
cexy$* variables used in build system, see `cex help 'cexy$cc'` for more info
* CEX_LOG_LVL 4
* cexy$build_dir ./build
* cexy$src_dir ./examples
* cexy$cc cc
* cexy$cc_include "-I."
* cexy$cc_args_sanitizer "-fsanitize-address-use-after-scope", "-fsanitize=address", "-fsanitize=undefined", "-fsanitize=leak", "-fstack-protector-strong"
* cexy$cc_args "-Wall", "-Wextra", "-Werror", "-g3", "-fsanitize-address-use-after-scope", "-fsanitize=address", "-fsanitize=undefined", "-fsanitize=leak", "-fstack-protector-strong"
* cexy$cc_args_test "-Wall", "-Wextra", "-Werror", "-g3", "-fsanitize-address-use-after-scope", "-fsanitize=address", "-fsanitize=undefined", "-fsanitize=leak", "-fstack-protector-strong", "-Wno-unused-function", "-Itests/"
* cexy$ld_args
* cexy$fuzzer "clang", "-O0", "-Wall", "-Wextra", "-Werror", "-g", "-Wno-unused-function", "-fsanitize=address,fuzzer,undefined", "-fsanitize-undefined-trap-on-error"
* cexy$debug_cmd "gdb", "-q", "--args"
* cexy$pkgconf_cmd "pkgconf"
* cexy$pkgconf_libs
* cexy$process_ignore_kw ""
* cexy$cex_self_args
* cexy$cex_self_cc cc
Tools installed (optional):
* git OK
* cexy$pkgconf_cmd OK ("pkgconf")
* cexy$vcpkg_root Not set
* cexy$vcpkg_triplet Not set
Global environment:
* Cex Version 0.14.0 (2025-06-05)
* Git Hash 07aa036d9094bc15eac8637786df0776ca010a33
* os.platform.current() linux
* ./cex -D<ARGS> config ""
Meet Cexy build system
cexy$
is a build system integrated with Cex, which helps to manage your project, run tests, find symbols and getting help.
> ./cex --help
Usage:
cex [-D] [-D<ARG1>] [-D<ARG2>] command [options] [args]
CEX language (cexy$) build and project management system
help Search cex.h and project symbols and extract help
process Create CEX namespaces from project source code
new Create new CEX project
stats Calculate project lines of code and quality stats
config Check project and system environment and config
libfetch Get 3rd party source code via git or install CEX libs
test Test running
fuzz Generic fuzz tester
app Generic app build/run/debug
You may try to get help for commands as well, try `cex process --help`
Use `cex -DFOO -DBAR config` to set project config flags
Use `cex -D config` to reset all project config flags to defaults
Code example
Hello world in CEX
#define CEX_IMPLEMENTATION
#include "cex.h"
int
(int argc, char** argv)
main{
.printf("MOCCA - Make Old C Cexy Again!\n");
ioreturn 0;
}
Holistic function
// CEX has special exception return type that forces the caller to check return type of calling
// function, also it provides support of call stack printing on errors in vanilla C
Exception
(u32 argc, char** argv, void* user_ctx)
cmd_custom_test{
// Let's open temporary memory allocator scope (var name is `_`)
// it will free all allocated memory after any exit from scope (including return or goto)
mem$scope(tmem$, _)
{
e$ret(os.fs.mkpath("tests/build/")); // make directory or return error with traceback
e$assert(os.path.exists("tests/build/")); // evergreen assertion or error with traceback
// auto type variables
auto search_pattern = "tests/os_test/*.c";
// Trace with file:<line> + formatting
("Finding/building simple os apps in %s\n", search_pattern);
log$trace
// Search all files in the directory by wildcard pattern
// allocate the results (strings) on temp allocator arena `_`
// return dynamic array items type of `char*`
arr$(char*) test_app_src = os.fs.find(search_pattern, false, _);
// for$each works on dynamic, static arrays, and pointer+length
for$each(src, test_app_src)
{
char* tgt_ext = NULL;
char* test_launcher[] = { cexy$debug_cmd }; // CEX macros contain $ in their names
// arr$len() - universal array length getter
// it supports dynamic CEX arrays and static C arrays (i.e. sizeof(arr)/sizeof(arr[0]))
if (arr$len(test_launcher) > 0 && str.eq(test_launcher[0], "wine")) {
// str.fmt() - using allocator to sprintf() format and return new char*
= str.fmt(_, ".%s", "win");
tgt_ext } else {
= str.fmt(_, ".%s", os.platform.to_str(os.platform.current()));
tgt_ext }
// NOTE: cexy is a build system for CEX, it contains utilities for building code
// cexy.target_make() - makes target executable name based on source
char* target = cexy.target_make(src, cexy$build_dir, tgt_ext, _);
// cexy.src_include_changed - parses `src` .c/.h file, finds #include "some.h",
// and checks also if "some.h" is modified
if (!cexy.src_include_changed(target, src, NULL)) {
continue; // target is actual, source is not modified
}
// Launch OS command and get interactive shell
// os.cmd. provides more capabilities for launching subprocesses and grabbing stdout
e$ret(os$cmd(cexy$cc, "-g", "-Wall", "-Wextra", "-o", target, src));
}
}
// CEX provides capabilities for generating namespaces (for user's code too!)
// For example, cexy namespace contains
// cexy.src_changed() - 1st level function
// cexy.app.run() - sub-level function
// cexy.cmd.help() - sub-level function
// cexy.test.create() - sub-level function
return cexy.cmd.simple_test(argc, argv, user_ctx);
}
Supported compilers/platforms
Tested compilers / Libc support
- GCC - 10, 11, 12, 13, 14, 15
- Clang - 13, 14, 15, 16, 17, 18, 19, 20
- MSVC - unsupported, probably never will
- LibC tested - glibc (linux), musl (linux), ucrt/mingw (windows), macos
Tested platforms / architectures
- Linux - x32 / x64 (glibc, gcc + clang),
- Alpine linux - (libc musl, gcc) on architectures x86_64, x86, aarch64, armhf, armv7, loongarch64, ppc64le, riscv64, and s390x (big-endian)
- Windows (via MSYS2 build) - x64 (mingw64 + clang), libc mscrt/ucrt
- Macos - x64 / arm64 (clang)
Resources
CEX philosophy
Main purpose of CEX
Cex was designed as a thin base layer above core C language, with the followings goals in mind:
- Enhancing developer experience. Most common things should be seamless as possible, without changing core language mechanics. Reducing boilerplate code, with improving readability and debugability.
- Eliminating dependencies. Cex is a single header, all-in-one language, with core tools for building, testing and debugging your project. You need only C compiler (clang or gcc), that’s it. Build system is included, you can write your build logic in C.
- Cross-platform. All Cex capabilities are cross-platform tested, and you don’t need to figure out nuances of behavior of system API for different platforms.
- Self-sufficient build system. CMake/Make/ShellScripts are dependencies, they essentially separate programming languages, it’s a burden. Cex itself is a build system, with simple CLI and supports cross-platform builds, persistent configuration, build logic written in C (like in Zig).
- Scripting-flavor. CEX designed to make common code patterns easy to work with, it combats with extra complexity, reducing mental overhead for writing code. New memory management and error handling make daily life way more easier.
- Less is more and enough is enough. Cex trying to add just enough new entities (types, namespaces, functions) to original C to make life easier, extend C functionality only when needed. Cex embraces conservatism of C, and it’s goal to be a stable base layer for projects years ahead.
- Code quality tools. Cex leverages existing compiler capabilities for making C code better. It includes sanitizers, lib fuzzers, and unit tests out of the box, letting your to focus on development.
- Long-term lifetime. When a project is built with CEX, it carries all what’s needed inside its repo, CEX header itself after 1.0 release will be maintained for ultimate backward compatibility. Ideally, it has to be API stable at SQLite project.
Simplicity as a virtue
C was a least common denominator for legacy and modern system software for decades. It’s a simple language, but very hard to master. Cex tries to add thin layer of things, for making life a little bit easier, but without bringing to much complexity to the code.
It’s challenging to make something simple by adding more stuff, which by definition adding a complexity. However, by adding stuff Cex reduces decision making burden, providing common code patterns, utility functions, and rethinking C experience.
For example:
- Cex errors make vanilla C error handling obsolete. It’s just one type with two states (error / no error), with unlimited options for errors variants. No more special enums, no more
-1 and errno
. Cex errors are easy to throw, easy to log, easy to handle. - Memory handling via allocators make all allocating function explicit. It’s easier to reason about the code, easier to track lifetimes, easier to cleanup when using arenas and scopes. Also having standard allocators allows to make reusable code or use allocators for memory leaks debugging.
- Namespaces mitigate remembering burden when we have dozens of functions with the same prefix, it’s easier to type and follow by LSP suggestions. Using sub-namespaces allows to reduce mental overhead of picking right one. For example,
str.convert.
andstr.slice.
expand to specific sub-namespaces in LSP suggestions in CEX. Using namespaces feels like a decision tree, you branch step by step from upper namespace to sub-namespace to end function. It’s feels much easier to work withstr.slice.remove_prefix
typing than remembering full function namestr_slice_remove_prefix.
- Commonality of data collections. Cex has dynamic arrays and hashmaps, which can be handled as any other C array.Inspired by Python approach of applying of
len
andfor
to anything iterable, Cex also hasarr$len
andfor$each
which can be commonly used for any C/Cex static or dynamic array, hashmap or pointer+length data. - Cex built system might look like overkill (
just use CMake) however with it you still practicing C/Cex, no extra dependency needed for building your project. Cex on the other hand doing its best to provide utility tools for building code, working with files, strings, and OS.
Making C cexy again
Old kind C99 might look too outdated for the modern times, but we have C23 compilers nowadays with brilliant tooling like sanitizers and fuzzers included. This opens a new era of C, safer C.
C looks like it’s a perfect fit for unsafe low-level applications like OS kernels, drivers, math libraries. Doing higher level stuff was always miserable in C in different reasons. Cex is an attempt of bringing C on a little bit higher level.
Joy of C
In my opinion, the unsafety of C is a really fun to work with, everything is under your control, everything is your responsibility. You can do wild stuff without any complaints from the compiler, ultimate freedom of code with ultimate responsibility.
This freedom of code is not nearly achievable with any modern language, they tend to set unlimited guardrails and protect us from any possible issues. This comes hand in hand with language complexity, unlimited struggle with compiler warnings, and adding new abstraction levels over everything which may hurt you. We end up with a sterile world of safe computer science, without any chance to touch and understand how machine works on low level.
Mission of CEX is to bring joy of the programming in C on the new level, to help in making C code safer and easier to write and understand.
Shooting in the foot
What if self shooting in the foot is not a bad idea? Before you start imagining pistol or shotgun, hold off, let’s start small… What if we could pick a toy gun with plastic bullets? What if we could stress test our code under different conditions and see what’s happened? What happens when we pass NULL to that argument? What about buffer overflow?
CEX Principle: making by shaking.
For making safe C code we must have tools for that, fortunately modern compilers already have them:
- Address sanitizers - for clang/gcc catch variety of bugs (buffer overflows, use after free, memory leaks, etc.). We only need to help them to trigger, by shaking the code via unit tests or fuzzers.
- Unit Tests - C is the one of the languages which require 3x more testing efforts than any other programming language.
- Fuzzers - for some cases the variety of inputs is too large, so we could not cover all of them via unit testing. Fuzzers come to help with this issue, but also can be used in Deterministic Simulation Testing, or randomized testing. LibFuzzer is included in clang, or you can use AFL++ if you want.
- Assertions - placing asserts everywhere in your code is a big deal for a code quality, and long term early warning about possible system inconsistencies. They can be used not only at checking input of a function, but validating results, or even at intermediate stages.
All of the tools above are available with cex.h
out of the box, so adding new test via CLI never has been easier:
# New test boilerplate
./cex test create tests/test_my_stuff.c
# Running a test
./cex test run tests/test_my_stuff.c
# Running all tests in a project
./cex test run all
Problems and solutions of C
Every programming language has its own quirks, sharp edges, and workarounds. C is not an exception here. However, in my opinion, sanitizers made a revolution in C development. With modern compilers C is much safer than it used to be even 10 years ago. Fuzzers made another step above, especially if your program works a lot with user input.
Problem | Solution |
---|---|
Memory safety | We can use sanitizers to catch most of the cases, more unit tests! |
Memory leaks | Memory scopes in CEX, sanitizers for checking |
Manual memory management | Temp allocator in CEX, memory scopes, arenas |
Name conflicts | Hard to solve, CEX mitigates it by introducing namespace generation |
Error handling is inconsistent | CEX introduces new unified error handling |
Type overflows | Unit testing, fuzzing |
Unsafe type casting | It hurts especially at refactoring, but unit testing will catch everything. |
Undefined behavior | Use UB sanitizer, fix all compiler warnings |
Macros | People hate macros, I don’t know why, just use in moderation |
No generic types | CEX dynamic arrays and hashmaps are fully generic, solvable with macros and _Generic |
No tracebacks | Use sanitizers, also Cex uassert prints tracebacks, Cex errors handling can generate tracebacks |
Poor core types (strings, dynamic arrays, hashmaps) | CEX introducing general purpose core types |
When C shines
What | Why |
---|---|
Simple semantics | It’s a good thing to have less cryptic combinations of special characters and keywords in the language. C is simple, and it doesn’t mean it’s easy. |
Full control | We have full control over everything what’s happening in our program: how memory is aligned, how control flow is aligned in assembly, how memory is allocated. |
Tooling | C has enourmous amount of development tools: testers, fuzzers, debuggers, coverage, performance, etc… |
Language stability | It’s cool to have a project that compiles and works after 5-10 years, with minimal changes. I would call C is an anti-language to modern NodeJS world. |
Knowledge base | Probably it’s a most diverse and stable knowledge base of all languages. |
Works everywhere | Anybody tried to run doom on a toaster? |
Performance | It feels good when you beat blazingly fast javascript by a factor of 100x with moderate C code |
Compatibility | C is a lowest common denominator of all languages, anything can wrap and call C code |
How to improve C
Cex was initially inspired by my Python experience, especially how very limited set of built-in types (str, list, dict, tuple, set + primitives) and simple semantics were able to produce huge ecosystem of Python nowadays. In my opinion, we don’t need to have every hyped programming paradigm to be added to the C language to make it better.
However, we need some things to be productive in C that CEX tries to implement:
- New error handling system - which makes error handling seamless and easy to work with.
- New memory management model - for making memory management more transparent, tracking lifetimes of objects more clear.
- Better strings - because modern computing became string-centric, strings are everywhere, we need better tool set in C.
- Better arrays - there are no built-in dynamic arrays in C, but it’s the most used data structure of all times.
- Hashmaps / sets - the second most used data structure, without C coverage.
- Build system - current build system situation is endless source of dependency conflicts, cross-platform headaches, and other issues that steals our mental energy.
- Code quality tools - we should lower barriers for running unit tests, fuzzers, coverage, benchmarks, etc.
Cex also includes some things for IO, OS/file system operations, JSON lib for fueling cross-platform build system and configuration. However, the goal of CEX core is to remain thin layer above original C, adding just enough.
Why just not use R**t, Z*g or C@$ ?
There is something appealing in C simplicity, it shines when we need full control over the code and assembly. Maybe it’s not for everyone, and maybe it’s a bad idea to use C for web-backends. But modern languages often affected by rush for adding new things, new paradigms, piling a complexity of semantics and dependencies.
C brings stability on the table, if something is written in C there is a chance that this project will be compilable after 5 years from now. Very few modern languages have this paradigm in mind. Most keep rushing to make changes, adding new features.
Basics
Code Style Guidelines
dollar$means_macros
. CEX style uses$
delimiter as a macro marker, if you see it anywhere in the code this means you are dealing with some sort of macro.first$
part of name usually linked to a namespace of a macro, so you may expect other macros, type names or functions with that prefix.functions_are_snake_case()
. Lower case for functionsMyStruct_c
ormy_struct_s
. Struct types typically expected to be aPascalCase
with suffix,_c
suffix indicates there is a code namespace with the same name (i.e._c
hints it’s a container or kind of object),_s
suffix means simple data-container without special logic.MyObj.method()
ornamespace.func()
. Namespace names typically lower case, and object specific namespace names reflect type name, e.g.MyObj_c
hasMyObj.method()
.Enums__double_underscore
. Enum types are defined asMyEnum_e
and each element looks likeMyEnum__foo
,MyEnum__bar
.CONSTANTS_ARE_UPPER
. Two notations of constants:UPPER_CASE_CONST
ornamespace$CONST_NAME
Types
CEX provides several short aliases for primitive types and some extra types for covering blank spots in C.
Type | Description |
---|---|
auto | automatically inferred variable type |
bool | boolean type |
u8/i8 | 8-bit integer |
u16/i16 | 16-bit integer |
u32/i32 | 32-bit integer |
u64/i64 | 64-bit integer |
f32 | 32-bit floating point number (float) |
f64 | 64-bit floating point number (double) |
usize | maximum array size (size_t) |
isize | signed array size (ptrdiff_t) |
char* | core type for null-term strings |
sbuf_c | dynamic string builder type |
str_s | string slice (buf + len) |
Exc / Exception | error type in CEX |
Error.<some> | generic error collection |
IAllocator | memory allocator interface type |
arr$(T) | generic type dynamic array |
hm$(T) | generic type hashmap |
CEX Core Namespaces
You can get cheat-sheet with ./cex help <namespace>$
command (ending $
is important!). For example, ./cex help io$
, ./cex help e$
.
Namespace | Description |
---|---|
e$ | CEX Exception handling |
for$ | CEX array looping (for$each, etc.) |
log$ | Logging system |
mem$* | Memory management and allocators |
mem$ | Global variable for general purpose heap allocator |
tmem$ | Global variable for temporary arena allocator |
str | General purpose string / slice namespace |
sbuf | String builder class |
arr$ | Type-safe, generic, dynamic array |
hm$ | Type-safe, generic hashmap |
io | Cross-platform IO namespace |
test$ | Unit-test namespace (see tassert_*) |
fuzz$ | Fuzz-test interface |
argparse | Command line argument parsing class |
cg$ | Code generation namespace |
cexy$ | CEX Build system config vars and interface |
Utility macros
Name | Description |
---|---|
uassert() | General purpose assert with tracebacks |
uassertf() | General purpose assert with formatting |
unlikely() | Branch predictor management for unexpected conditions |
likely() | Branch predictor management for expected conditions |
breakpoint() | Cross-platform debugger breakpoint |
fallthrough() | Explicit fallthrough to the next switch case |
unreachable() | Panics in debug mode, __builtin_unreachable() #ifdef NDEBUG mode |
tassert_* | Unit-test assertions see ./cex help tassert_ |
Error handling
The problem of error handling in C
C errors always were a mess due to historical reasons and because of ABI specifics. The main curse of C error is mixing values with errors, for example system specific calls return -1
and set errno
variable. Some return 0 on error, some NULL, sometimes is an enum, or MAP_FAILED (which is (void*)-1)
.
This convention on errors drains a lot of developer energy making him to keep searching docs and figuring out which return values of a function considered errors.
C error handling makes code cluttered with endless if (ret_code == -1)
pattern.
The code below is a typical error handling pattern in C, however it’s illustration for a specific issues:
(char* filename, char* buf, usize buf_size) {
isize read_fileif (buff == NULL || filename == NULL) {
1= EINVAL;
errno return -1;
}
int fd = open(filename, O_RDONLY);
if (fd == -1) {
2(stderr, "Cannot open '%s': %s\n", filename, strerror(errno));
fprintfreturn -1;
}
= read(fd, buf, buf_size);
isize bytes_read if (bytes_read == -1) {
3("Error reading");
perrorreturn -1;
}
4return bytes_read;
}
- 1
-
errno
is set, but it hard to distinguish by which API call or function argument is failed. - 2
- Error message line is located not at the same place as it was reported, so the developer must go through code to check.
- 3
-
errno
is too broad and ambiguous for describing exact reason of failure. - 4
-
foo
return value is mixing error-1
and legitimate value ofbytes_read
. The situation gets worse if we need to use non integer return type of a function.
CEX Error handling goals
CEX made an attempt to re-think general purpose error handling in applications, with the following goals:
- Errors should be unambiguous - detaching errors from valid result of a function, there are only 2 states: OK or an error.
- Error handling should be general purpose - providing generic code patterns for error handling
- Error should be easy to report - avoiding error code to string mapping
- Error should be bubbling up - code can pass the same error to the upper caller
- Error should extendable - allowing unique error identification
- Error should be passed as values - low overhead, error handling
- Error handling should be natural - no special constructs required to handle error in C code
- Error should be forced to check - no occasional error check skips
How error handling is implemented
CEX has a special Exception
type which is essentially alias for char*
, and yes all error handling in CEX is based on char*
. Before you start laughing and rolling on the floor, let me explain the most important part of the Exception
type, this little *
part. Exception in CEX is a pointer (an address, a number) to a some arbitrary char array on memory.
What if the returned pointer could be always some constant area indicating an error? With that rule, we don’t have to match error (string) content, but we can compare only address of the error.
CEX Error in a nutshell
// NOTE: excerpt from cex.h
/// Generic CEX error is a char*, where NULL means success(no error)
typedef char* Exc;
/// Equivalent of Error.ok, execution success
#define EOK (Exc) NULL
/// Use `Exception` in function signatures, to force developer to check return value
/// of the function.
#define Exception Exc __attribute__((warn_unused_result))
/**
* @brief Generic errors list, used as constant pointers, errors must be checked as
* pointer comparison, not as strcmp() !!!
*/
extern const struct _CEX_Error_struct
{
Exc ok; // Success no error
Exc argument;
// ... cut ....
Exc os;
Exc integrity;
} Error;
// NOTE: user code
Exception
(char* path)
remove_file{
if (path == NULL || path[0] == '\0') {
return Error.argument; // Empty of null file
}
if (!os.path.exists(path)) {
return "Not exists" // literal error are allowed, but must be handled as strcmp()
}
if (str.eq(path, "magic.file")) {
// Returns an Error.integrity and logs error at current line to stdout
return e$raise(Error.integrity, "Removing magic file is not allowed!");
}
if (remove(path) < 0) {
return strerror(errno); // using system error text (arbitrary!)
}
return EOK;
}
Exception
(char* path)
main{
// Method 1: low level handling (no re-throw)
if (remove_file(path)) { return Error.os; }
if (remove_file(path) != EOK) { return "bad stuff"; }
if (remove_file(path) != Error.ok) { return EOK; }
// Method 2: handling specific errors
Exc err = remove_file(path);
if (err == Error.argument) { // <<< NOTE: comparing address not a string contents!
.printf("Some weird things happened with path: %s, error: %s\n", path, err);
ioreturn err;
}
// Method 3: helper macros + handling with traceback
(err, remove_file(path)) { // NOTE: this call automatically prints a traceback
e$exceptif (err == Error.integrity) { /* TODO: do special case handling */ }
}
// Method 4: helper macros + handling unhandled
e$ret(remove_file(path)); // NOTE: on error, prints traceback and returns error to the caller
(path); // <<< OOPS compiler error, return value of this function unchecked
remove_file
return 0;
}
Error tracebacks and custom errors in CEX
CEX error system was designed to help in debugging, this is a simple example of deep call stack printing in CEX.
#define CEX_IMPLEMENTATION
#include "cex.h"
const struct _MyCustomError
{
Exc why_arg_is_one;
} MyError = { .why_arg_is_one = "WhyArgIsOneError" };
Exception
(int argc)
baz{
if (argc == 1) { return e$raise(MyError.why_arg_is_one, "Why argc is 1, argc = %d?", argc); }
return EOK;
}
Exception
(int argc)
bar{
e$ret(baz(argc));
return EOK;
}
Exception
(int argc)
foo2{
.printf("MOCCA - Make Old C Cexy Again!\n");
ioe$ret(bar(argc));
return EOK;
}
int
(int argc, char** argv)
main{
(void)argv;
(err, foo2(argc)) {
e$except if (err == MyError.why_arg_is_one) {
.printf("We need moar args!\n");
io}
return 1;
}
return 0;
}
MOCCA - Make Old C Cexy Again!
[ERROR] ( main.c:12 baz() ) [WhyArgIsOneError] Why argc is 1, argc = 1?
[^STCK] ( main.c:19 bar() ) ^^^^^ [WhyArgIsOneError] in function call `baz(argc)`
[^STCK] ( main.c:27 foo2() ) ^^^^^ [WhyArgIsOneError] in function call `bar(argc)`
[^STCK] ( main.c:35 main() ) ^^^^^ [WhyArgIsOneError] in function call `foo2(argc)`
We need moar args!
Rewriting initial C example to CEX
Main benefits of using CEX error handling system:
- Error messages come with
source_file.c:line
andfunction()
for easier to debugging - Easier to do quick checks with
e$assert
- Easier to re-throw generic unhandled errors inside code
- Unambiguous return values: OK or error.
- Unlimited variants of returning different types of errors (
Error.argument
,"literals"
,strerror(errno)
,MyCustom.error
) - Easy to log - Exceptions are just
char*
strings - Traceback support when chained via multiple functions
Exception read_file(char* filename, char* buf, isize* out_buf_size) {
1e$assert(buff != NULL);
e$assert(filename != NULL && "invalid filename");
int fd = 0;
2(fd = open(filename, O_RDONLY)) { return Error.os; }
e$except_errno3(*out_buf_size = read(fd, buf, *out_buf_size)) { return Error.io; }
e$except_errno4return EOK;
}
- 1
-
Returns error with printing out internal expression:
[ASSERT] ( main.c:26 read_file() ) buff != NULL
.e$assert
is an Exception returning assert, it doesn’t abort your program, and these asserts are not stripped in release builds. - 2
-
Handles typical
-1 + errno
check with print:[ERROR] ( main.c:27 read_file() ) fd = open("foo.txt", O_RDONLY) failed errno: 2, msg: No such file or directory
- 3
-
Result of a function returned by reference to the
out
parameter. - 4
- Unambiguous return code for success.
(char* filename, char* buf, usize buf_size) {
isize read_fileif (buff == NULL || filename == NULL) {
= EINVAL;
errno return -1;
}
int fd = open(filename, O_RDONLY);
if (fd == -1) {
(stderr, "Cannot open '%s': %s\n", filename, strerror(errno));
fprintfreturn -1;
}
= read(fd, buf, buf_size);
isize bytes_read if (bytes_read == -1) {
("Error reading");
perrorreturn -1;
}
return bytes_read;
}
Helper macros e$...
CEX has a toolbox of macros with e$
prefix, which are dedicated to the Exception
specific tasks. However, it’s not mandatory to use them, and you can stick to regular control flow constructs from C.
In general, e$
macros provide location logging (source file, line, function), which is a building block for error traceback mechanism in CEX.
e$
macros mostly designed to work with functions that return Exception
type.
Returning the Exc[eption]
Errors in CEX are just plain string pointers. If the Exception
function returns NULL
or EOK
or Error.ok
this is indication of successful execution, otherwise any other value is an error.
Also you may return with e$raise(error_to_return, format, ...)
macro, which prints location of the error in the code with message formatting.
Exception error_sample1(int a) {
if (a == 0) return Error.argument; // standard set of errors in CEX
if (a == -1) return "Negative one"; // error literal also works, but harder to handle
if (a == -2) return UserError.neg_two; // user error
if (a == 7) return e$raise(Error.argument, "Bad a=%d", a); // error with logging
return EOK; // success
// return Error.ok; // success
// return NULL; // success
}
Handling errors
Error handling in CEX supports two ways:
- Silent handling - which suppresses error location logging, this might be useful for performance critical code, or tight loops. Also, this is a general way of returning errors for CEX standard lib.
- Loud handling with logging - this way is useful for one shot complex functions which may return multiple types of errors for different reasons. This is the way if you wanted to incorporate tracebacks for your errors.
Silent handling example
Avoid using e$raise() in called functions if you need silent error handling, use plain return Error.*
Exception foo_silent(void) {
// Method 1: quick and dirty checks
if (error_sample1(0)) { return "Error"; /* Silent handling without logic */ }
if (error_sample1(0)) { /* Discarding error of a call */ }
// Method 2: silent error condition
Exc err = error_sample1(0);
if (err) {
if (err == Error.argument) {
/* Handling specific error here */
}
return err;
}
// Method 3: silent macro, with temp error value
(err, error_sample1(0)) {
e$except_silent// NOTE: nesting is allowed!
(err, error_sample1(-2)) {
e$except_silentreturn err; // err = UserError.neg_two
}
// err = Error.argument now
if (err == Error.argument) {
/* Handling specific error here */
}
// break; // BAD! See caveats section below
}
return EOK;
}
e$except_silent
will print error log when code runs under unit test or inside CEX build system, this helps a lot with debugging.
Loud handling with logging
If you write some general purpose code with debugability in mind, the logged error handling can be a breeze. It allows traceback error logging, therefore deep stack errors now easier to track and reason about.
There are special error handling macros for this purpose:
e$except(err, func_call()) { ... }
- error handling scope which initialize temporary variableerr
and logs if there was an error returned byfunc_call()
.func_call()
must returnException
type for this macro.e$except_errno(sys_func()) { ... }
- error handling for system functions, returning-1
and settingerrno
.e$except_null(ptr_func()) { ... }
- error handling forNULL
on error functions.e$except_true(func()) { ... }
- error handling for functions returning non-zero code on error.e$ret(func_call());
- runs theException
type returning functionfunc_call()
, and on error it logs the traceback and re-return the same return value. This is a main code shortcut and driver for all CEX tracebacks. Use it if you don’t care about precise error handling and fine to return immediately on error.e$goto(func_call(), goto_err_label);
- runs theException
type function, and doesgoto goto_err_label;
. This macro is useful for resource deallocation logic, and intended to use for typical C error handling patterngoto fail
.e$assert(condition)
ore$assert(condition && "What's wrong")
ore$assertf(condition, format, ...)
- quick condition checking insideException
functions, logs a error location + returnsError.assert
. These asserts remain in release builds and do not affected byNDEBUG
flag.
Exception foo_loud(int a) {
e$assert(a != 0);
e$assert(a != 11 && "a is suspicious");
(a != 22, "a=%d is something bad", a);
e$assertf
char* m = malloc(20);
e$assert(m != NULL && "memory error"); // ever green assert
e$ret(error_sample1(9)); // Re-return on error
e$goto(error_sample1(0), fail); // goto fail and free the resource
(err, error_sample1(0)) {
e$except// NOTE: nesting is allowed!
(err, error_sample1(-2)) {
e$exceptreturn err; // err = UserError.neg_two
}
// err = Error.argument now
if (err == Error.argument) {
/* Handling specific error here */
}
// continue; // BAD! See caveats section below
}
// For these e$except_*() macros you can use assignment expression
// e$except_errno(fd = open(..))
// e$except_null(f = malloc(..))
// e$except_true (sqlite3_open(db_path, &db))
FILE* f;
(f = fopen("foo.txt", "r")) {
e$except_nullreturn Error.io;
}
return EOK;
:
fail(m);
freereturn Error.runtime;
}
Caveats
Most of e$excep_*
macros are backed by for()
loop, so you have to be careful when you nest them inside outer loops and try to break
/continue
outer loop on error.
In my opinion using e$except_
inside loops is generally bad idea, and you should consider:
- Factoring error emitting code into a separate function
- Using
if(error_sample(i))
instead ofe$except
Bad example!
Exception foo_err_loop(int a) {
for (int i = 0; i < 10; i++) {
(err, error_sample1(i)) {
e$exceptbreak; // OOPS: `break` stops `e$except`, not outer for loop
}
}
return EOK;
}
Standard Error
CEX implements a standard Error
namespace, which typical for most common situations if you might need to handle them.
const struct _CEX_Error_struct Error = {
.ok = EOK, // Success
.memory = "MemoryError", // memory allocation error
.io = "IOError", // IO error
.overflow = "OverflowError", // buffer overflow
.argument = "ArgumentError", // function argument error
.integrity = "IntegrityError", // data integrity error
.exists = "ExistsError", // entity or key already exists
.not_found = "NotFoundError", // entity or key already exists
.skip = "ShouldBeSkipped", // NOT an error, function result must be skipped
.empty = "EmptyError", // resource is empty
.eof = "EOF", // end of file reached
.argsparse = "ProgramArgsError", // program arguments empty or incorrect
.runtime = "RuntimeError", // generic runtime error
.assert = "AssertError", // generic runtime check
.os = "OSError", // generic OS check
.timeout = "TimeoutError", // await interval timeout
.permission = "PermissionError", // Permission denied
.try_again = "TryAgainError", // EAGAIN / EWOULDBLOCK errno analog for async operations
}
Exception foo(int a) {
(err, error_sample1(0)) {
e$exceptif (err == Error.argument) {
return Error.runtime; // Return another error
}
}
return Error.ok; // success
}
Making custom user exceptions
Extending with existing functionality
Probably you only need to make custom errors when you need specific needs of handling, which is rare case. In common case you might need to report details of the error and forget about it. Before we dive into customized error structs, let’s consider what simple instruments do we have for error customization without making another entity in the code:
- You may try to return string literals as a custom error, these errors are convenient options when you don’t need to handle them (e.g. for rare/weird edge cases)
Exception foo_literal(int a) {
if (a == 777999) return "a is a duplicate of magic number";
return EOK;
}
- You may try to return standard error + log something with
e$raise()
which support location logging and custom formatting.
Exception foo_ret(int a) {
if (a == 777999) return e$raise(Error.argument, "a=%d looks weird", a);
return EOK;
}
Custom error structs
If you need custom handling, you might need to create a new dedicated structure for errors.
Here are some requirements for a custom error structure:
- It has to be a constant global variable
- All fields must be initialized, uninitialized fields are NULL therefore they are success code.
// myerr.h
extern const struct _MyError_struct
{
Exc foo;
Exc bar;
Exc baz;
} MyError;
// myerr.c
const struct _MyError_struct MyError = {
.foo = "FooError",
.bar = "BarError",
// WARNING: missing .baz - which will be set to NULL => EOK
}
// other.c
#include "cex.h"
#include "myerr.h"
Exception foo(int a) {
(err, error_sample1(0)) {
e$exceptif (err == Error.argument) {
return MyError.foo;
}
}
return Error.ok; // success
}
Advanced topics
Performance
Errors are pointers
Using strings as error value carrier may look controversial at the first glance. However let’s remember that strings in C are char*
, and essentially *
part means that it’s a size_t
integer value of a memory address. Therefore CEX approach is to have set of pre-defined and constant memory addresses that hold standard error values (see Standard Error
section above).
So for error handling we need to compare return value with EOK|NULL|Error.ok
to check if error was returned or not. Then we check address of returned error and compare it with the address of the standard error.
With this being said, performance of typical error handling in CEX is one assembly instruction that compares a register with NULL
and one instruction for comparing address of an error with some other constant address when handling returned error type.
CEX uses direct pointer comparison if (err == Error.argument)
, instead of string content comparison if(strcmp(err, "ArgumentError") == 0) /* << BAD */
Branch predictor control
All CEX e$
macros uses unlikely
a.k.a. __builtin_expect
to shape assembly code in the way of favoring happy path, for example this is a e$assert
source snippet:
# define e$assert(A) \
({ \
if (unlikely(!((A)))) { \
__cex__fprintf(stdout, "[ASSERT] ", __FILE_NAME__, __LINE__, __func__, "%s\n", #A); \
return Error.assert; \
} \
})
The unlikely(!(A))
hints the compiler to place assembly instructions in a way of favoring happy path of the e$assert
, which is a performance gain when you have multiple error handling checks and/or big blocks for error handling.
Compatibility
Be careful if you need to expose CEX exception returning functions to an API. Sometimes, if you are working with different shared libraries, the addresses of the same errors might be different. If user code is intended to check and handle API errors, maybe it’s better to stick to C-compatible approach instead of CEX errors.
CEX Exceptions work best when you use them in single address space of an app or a library. If you need to cross this boundary, do your best assessment for pros and cons.
Useful code patterns
Escape main()
when possible
CEX approach is to keep main()
function separated and as short as possible. This opens capabilities for full code unit testing, unity builds, and tracebacks. This is a typical example app:
// app_main.c file
#include "cex.h"
Exception
(int argc, char** argv)
app_main{
bool my_flag = false;
= {
argparse_c args .description = "New CEX App",
(
argparse$opt_list(),
argparse$opt_help(&my_flag, 'c', "ctf", .help = "Capture the flag"),
argparse$opt),
};
if (argparse.parse(&args, argc, argv)) { return Error.argsparse; }
.printf("MOCCA - Make Old C Cexy Again!\n");
io.printf("%s\n", (my_flag) ? "Flag is captured" : "Pass --ctf to capture the flag");
ioreturn EOK;
}
// main.c file
#define CEX_IMPLEMENTATION // this only appears in main file, before #include "cex.h"
#include "cex.h"
#include "app_main.c" // NOTE: include .c, using unity build approach
int
(int argc, char** argv)
main{
if(app_main(argc, argv)) { return 1; }
return 0;
}
Inversion of error checking
Instead of doing if
nesting, try an opposite approach, check an error and exit. In CEX you can also use e$assert()
for a quick and dirty checking with one line.
Exception
(int argc, char** argv)
app_main{
e$assert(argc == 2); // assert shortcut
if (str.eq(argv[1], "MOCCA")) { return Error.integrity; }
.printf("MOCCA - Make Old C Cexy Again!\n");
ioreturn EOK;
}
Exception
(int argc, char** argv)
app_main{
if (argc > 1) {
if (str.eq(argv[1], "MOCCA")) {
.printf("MOCCA - Make Old C Cexy Again!\n");
io} else {
return Error.integrity;
}
} else {
return Error.argument;
}
return EOK;
}
Resource cleanup
Sometimes you need to open resources, manage your memory, and carry error code. Or maybe we have to use legacy API inside function, with some incompatible error code calls. Here is a CEX flavored implementation of common goto fail
C code pattern.
Exception
(char* zip_path, char* extract_dir)
print_zip{
Exc result = Error.runtime; // NOTE: default error code, setting to error by default
// Open the ZIP archive
int err;
struct zip* archive = NULL;
(archive = zip_open(zip_path, 0, &err)) { goto end; }
e$except_null
= zip_get_num_entries(archive, 0);
i32 num_files
for (i32 i = 0; i < num_files; i++) {
struct zip_stat stat;
if (zip_stat_index(archive, i, 0, &stat) != 0) {
= Error.integrity; // NOTE: we can substitute error code if needed
result goto end;
}
// NOTE: next may return error on buffer overflow -> goto end then
char output_path[64];
e$goto(str.sprintf(output_path, sizeof(output_path), "%s/%s", extract_dir, stat.name), end);
.printf("Element: %s\n", output_path);
io}
// NOTE: success when no `goto end` happened, only one happy outcome
= EOK;
result
:
end// Cleanup and result
(archive);
zip_closereturn result;
}
MyObj(char* path, usize buf_size)
MyObj_create{
= {0};
MyObj self
(self.file = fopen(path, "r")) { goto fail; }
e$except_null
.buf = malloc(buf_size);
selfif (self.buf == NULL) { goto fail; }
e$goto(fetch_data(&self.data), fail);
// MyObj was initialized and in consistent state
return self;
:
fail// On error - do a cleanup of initialized stuff
if (self.file) { fclose(self.file); }
if (self.buf) { free(self.buf); }
(&self, 0, sizeof(MyObj));
memsetreturn self;
}
Memory management
The problem of memory management in C
C has a long-lasting history of memory management issues. Many modern languages proposed multiple solutions for these issues: RAII, borrow checkers, garbage collection, allocators, etc. All of them work and solve the memory problem to some extent, but sometimes adding new sets of problems in different places.
From my prospective, the root cause of the C memory problem is hidden memory allocation. When developer works with a function which does memory allocation, it’s hard to remember its behavior without looking into source code or documentation. Absence of explicit indication of memory allocation lead to the flaws with memory handling, for example: memory leaks, use after free, or performance issues.
While C remains system and low-level language, it’s important to have precise control over code behavior and memory allocations. So in my opinion, RAII and garbage collection are alien approaches to C philosophy, but on the other hand modern languages like Zig
or C3
have allocator centric approach, which is more explicit and suitable for C.
Modern way of memory management in CEX
Allocator-centric approach
CEX tries to adopt allocator-centric approach to memory management, which help to follow those principles:
- Explicit memory allocation. Each object (class) or function that may allocate memory has to have an allocator parameter. This requirement, adds explicit API signature hints, and communicates about memory implications of a function without deep dive into documentation or source code.
- Transparent memory management. All memory operations are provided by
IAllocator
interface, which can be interchangeable allocator object of different type. - Memory scoping. When possible memory usage should be limited by scope, which naturally regulates lifetimes of allocated memory and automatically free it after exiting scope.
- UnitTest Friendly. Allocators allowing implementation of additional levels of memory safety when run in unit test environment. For example, CEX allocators add special poisoned areas around allocated blocks, which trigger address sanitizer when this region accesses with user code. Allocators open door for a memory leak checks, or extra memory error simulations for better out-of-memory error handling.
- Standard and Temporary allocators. Sometimes it’s useful to have initialized allocator under your belt for short-lived temporary operations. CEX provides two global allocators by default:
mem$
- is a standard heap allocator usingmalloc/realloc/free
, andtmem$
- is dynamic arena allocator of small size (about 256k of per page).
Example
This is a small example of key memory management concepts in CEX:
1mem$scope(tmem$, _)
{
2arr$(char*) incl_path = arr$new(incl_path, _);
for$each (p, alt_include_path) {
3(incl_path, p);
arr$pushif (!os.path.exists(p)) { log$warn("alt_include_path not exists: %s\n", p); }
}
4}
- 1
-
Initializes a temporary allocator (
tmem$
) scope inmem$scope(tmem$, _) {...}
and assigns it as a variable_
(you can use any name). - 2
-
Initializes dynamic array with the scoped allocator variable
_
, allocates new memory. - 3
- May allocate memory
- 4
- All memory will be freed at exit from this scope
Lifetimes and scopes
Use of memory scopes naturally regulates lifetime of initialized memory. From the example above you can’t use incl_path
variable outside of mem$scope
. And more to say, that memory will be automatically freed after exiting scope. This design approach significantly reduces surface for use after free errors in general.
Temporary memory allocator
Dealing with lots of small memory allocations always was a pain in C, because we need to deallocate them at the end, also because of potential overhead each individual memory allocation might have. Temporary allocator in CEX works as a small-page (around 256kb) memory arena, which can be dynamically resized when needed. The most important feature of temporary arena allocator it does the full cleanup at the mem$scope
exit automatically.
Temporary allocator is always available via tmem$
global variable and can be used anytime at the program lifetime. It allowed to be used only inside mem$scope
, with support of up to 32 levels of mem$scope
nesting. At the end of the program, CEX will automatically finalize and free all allocated memory.
You can find more technical details about implementation below in this article.
Memory management in CEX
Allocators
Allocators add many benefits into codebase design and development experience:
- All memory allocating functions or objects become explicit, because they require
IAllocator
argument - Logic of the code become detached from memory model, the same dynamic array can be backed by heap, arena, or stack based static char buffer with the same allocator interface. The same piece of code may work on Linux OS or embedded device without changes to memory allocation model.
- Allocators may add testing capabilities, i.e. simulating out-of-mem errors in unit tests, or adding memory checks or extra integrity checks of memory allocations
- There are multiple memory allocation models (heap, arenas, temp allocation), so you can find the best type of allocator for your needs and use case.
- It’s easier to trace and doing memory benchmarks with allocators.
- Automatic garbage collection with
mem$scope
and arena allocators - you’ll get everything freed on scope exit
Allocator interface
The allocator interface is represented by IAllocator
type, which is an interface structure of function pointers for generic operations. Allocators in CEX support malloc/realloc/calloc/free
functions similar to their analogs in C, the only optional parameter is alignment for requested memory region.
#define IAllocator const struct Allocator_i*
typedef struct Allocator_i
{
// >>> cacheline
alignas(64) void* (*const malloc)(IAllocator self, usize size, usize alignment);
void* (*const calloc)(IAllocator self, usize nmemb, usize size, usize alignment);
void* (*const realloc)(IAllocator self, void* ptr, usize new_size, usize alignment);
void* (*const free)(IAllocator self, void* ptr);
const struct Allocator_i* (*const scope_enter)(IAllocator self); /* Only for arenas/temp alloc! */
void (*const scope_exit)(IAllocator self); /* Only for arenas/temp alloc! */
(*const scope_depth)(IAllocator self); /* Current mem$scope depth */
u32 struct {
;
u32 magic_idbool is_arena;
bool is_temp;
} meta;
//<<< 64 byte cacheline
} Allocator_i;
mem$
API
You shouldn’t use allocator interface directly (it’s less convenient), so it’s better to use memory specific macros:
mem$malloc(allocator, size, [alignment])
- allocates uninitialized memory withallocator
,size
in bytes,alignment
parameter is optional, by default it’s system specific alignment (up to 64 byte alignment is supported)mem$calloc(allocator, nmemb, size, [alignment])
- allocates zero-initialized memory withallocator
,nbemb
elements ofsize
each,alignment
parameter is optional, by default it’s system specific alignment (up to 64 byte alignment is supported)mem$realloc(allocator, old_ptr, size, [alignment])
- reallocates previously initializedold_ptr
withallocator
,alighment
parameter is optional and must match initial alignment of aold_ptr
mem$free(allocator, old_ptr)
- freesold_prt
and implicitly set it toNULL
to avoid use-after-free issues.mem$new(allocator, T)
- generic allocation of new instance ofT
(type), with respect of its size and alignment.
Allocator scoping:
mem$arena(page_size) { ... }
- enters new instance of allocator arena with thepage_size
.mem$scope(arena_or_tmem$, scope_var) { ... }
- opens new memory scope (works only with arena allocators or temp allocator)
Dynamic arenas
Dynamic arenas using an array of dynamically allocated pages, each page has static size and allocated on heap. When you allocate memory on arena and there is enough room on page, the arena allocates this chunk of memory inside page (simply moving a pointer without real allocation). If your memory request is big enough, the arena creates new page while keeping all old pages untouched and manages new allocation on the new page.
Arenas are designed to work with mem$scope()
, this allowing you create temporary memory allocation, without worrying about cleanup. Once scope is left, the arena will deallocate all memory and return to the initial state. This approach allowing to use up to 32 levels of mem$scope()
nesting. Essentially it is exact mechanism that fuels tmem$
- temporary allocator in CEX.
Working with arenas:
1= AllocatorArena.create(4096);
IAllocator arena 2u8* p = mem$malloc(arena, 100);
3mem$scope(arena, tal)
{
4u8* p2 = mem$malloc(tal, 100000);
mem$scope(arena, tal)
{
u8* p3 = mem$malloc(tal, 100);
5}
6}
7.destroy(arena); AllocatorArena
- 1
- New arena with 4096 byte page
- 2
- Allocating some memory from arena
- 3
- Entering new memory scope
- 4
-
Allocation size exceeds page size, new page will be allocated then.
p
address remain the same! - 5
-
At scope exit
p3
will be freed,p2
andp
remain - 6
-
At scope exit
p2
will be freed, excess pages will be freed,p
remains - 7
-
Arena destruction, all pages are freed,
p
is invalid now.
(4096, arena)
mem$arena{
// This needs extra page
u8* p2 = mem$malloc(arena, 10040);
mem$scope(arena, tal)
{
u8* p3 = mem$malloc(tal, 100);
}
}
1mem$scope(tmem$, _)
{
2u8* p2 = mem$malloc(_, 110241024);
3mem$scope(tmem$, _)
{
u8* p3 = mem$malloc(_, 100);
4}
5}
- 1
-
Initializes a temporary allocator (
tmem$
) scope inmem$scope(tmem$, _) {...}
and assigns it as a variable_
(you can use any name)._
is a pattern for temp allocator in CEX. - 2
- New page for temp allocator created, because size exceeds existing page size
- 3
- Nested scope is allowed
- 4
-
Scope exit
p3
automatically cleaned up - 5
-
Scope exit
p2
cleaned up + extra page freed.
Standard allocators
There are two general purpose allocators globally available out of the box for CEX:
mem$
- is a heap allocator, the same oldmalloc/free
type of allocation, with extra alignment support. In unit tests this allocator provides simple memory leak checks even without address sanitizer enabled.tmem$
- dynamic arena, with 256kb page size, used for short lived temporary operations, cleans up pages automatically at program exit. Does page allocation only at the first allocation, otherwise remain global static struct instance (about 128 bytes size). Thread safe, usesthread_local
.
Caveats
Do cross-scope memory access carefully
Never reallocate memory from one scope, in the nested scope, which will automatically lead to use-after-free issue. This is a bad example:
// BAD!
mem$scope(tmem$, _)
{
1u8* p2 = mem$malloc(_, 100);
mem$scope(tmem$, _)
{
2= mem$realloc(_, p2, 110241024);
p2 3}
4if(p2[128] == '0') { /* OOPS */}
}
- 1
- Initially allocation at first scope
- 2
-
realloc
uses different scope depth, this might lead to assertion in CEX unit test - 3
-
p2
automatically freed, because now it belongs to different scope - 4
- You’ll face use-after-free, which typically expressed use-after-poison in temp allocator.
CEX does its best to catch these cases in unit test mode, it will raise an assertion at the mem$realloc
line with some meaningful error about this. Standard CEX collections like dynamic arrays arr$
and hashmap hm$
also get triggered when they need to resize in a different level of mem$scope
.
Be careful with reallocations on arenas
CEX arenas are designed to be always growing, if your code pattern is based on heavily reallocating memory, the arena growth may lead to performance issues, because each reallocation may trigger memory copy with new page creation. Consider pre-allocate some reasonable capacity for your data when working with arenas (including temp allocator). However, if you’re reallocating the exact last pointer, the arena might do it in place on the same page.
Unit Test specific behavior
When run in test mode (or specifically #ifdef CEX_TEST
is true) the memory allocation model in CEX includes some extra safety capabilities:
- Heap based allocator (
mem$
) starts tracking memory leaks, comparing number of allocations and frees. mem$malloc()
- return uninitialized memory with0xf7
byte pattern- If Address Sanitizer is available all allocations for arenas and heap will be surrounded by poisoned areas. If you see use-after-poison errors, it’s likely a sign of use-after-free or out of bounds access in
tmem$
. Try to switch your code to themem$
allocator if possible to triage the exact reason of the error. - Allocators do sanity checks at the end of the each unit test case
Be careful with break/continue
mem$scope/mem$arena
are macros backed by for
loop, be careful when you use them inside loops and trying to break/continue
outer loop.
// BAD!
for(u32 i = 0; i < 10; i++){
{
mem$scope(tmem$, _)
{
u8* p2 = mem$malloc(_, 100);
if(p2[1] == '0') {
break; // OOPS, this will break mem$scope not a outer for loop
}
}
}
Never return pointers from scope
Function exit will lead to memory cleanup after memory scope p2
address now is invalid. You might get use-after-poison or use-after-free ASAN crash. Or 0xf7
pattern of data when running in test environment without ASAN.
// BAD!
mem$scope(tmem$, _)
{
u8* p2 = mem$malloc(_, 100);
return p2; // BAD! This address will be freed at scope exit
}
Advanced topics
Performance tips
TempAllocator makes CPU cache hot
If we use mem$scope(tmem$)
a lot, the ArenaAllocator re-uses same memory pages, therefore these memory areas will be prefetched by CPU cache, which will be beneficial for performance. The ArenaAllocator in general works like a stack, with automatic memory cleanup at the scope exit.
Arena allocation is cheap
ArenaAllocator implements memory allocation by moving a memory pointer back and forth, it doesn’t take much for allocating small chunks if there is no need for requesting memory from the OS for the new arena page.
Be careful with ArenaAllocator when you need to reallocate a lot
AllocatorArena and temporary allocator do not reuse blank chunks of the freed memory in pages, they simply allocate new memory. This might be a problem when you try to dynamically resize some container (e.g. dynamic array arr$
), which could lead to uncontrollable growth of arena pages and therefor performance degradation.
On the other hand, it’s totally fine to pre-allocate some capacity for your needs upfront. Just try to be mindful about you memory allocation and usage patterns.
When to use arena or heap allocator
ArenaAllocator and tmem$
use cases
ArenaAllocator works great when you need disposable memory for a temporary needs or you have limited boundaries in time and space for a program operation. For example, read file, process it, calculate stuff, close it, done.
Another great benefit of arenas is stability of memory pointers, once memory is allocated it sits there at the same place.
Arenas simplifies managing memory for small objects, so you don’t need to write extra memory management logic for each small allocation, everything will be cleared at scope exit.
HeapAllocator (mem$
) use cases
HeapAllocator is simply system allocator, backed by malloc/free. You can use it for long living or frequently resizable objects (e.g. dynamic arrays or hashmaps). Works best for bigger allocations with longer lives.
UnitTesting and memory leaks
When you run CEX allocators in unit test code, they apply extra sanity check logic for helping you to debug memory related issues:
- New
mem$malloc
allocations formem$/tmem$
are filled by0xf7
byte pattern, which indicates uninitialized memory. mem$
allocator tracks number of allocations and deallocations and emits unit test[LEAK]
warning (in the case if ASAN is disabled)- There are some small poisoned areas around allocations by
mem$/tmem$
which trigger ASANuse-after-poison
crash (read/write), or check validity of poison pattern inside these areas when ASAN is disabled. - After each test CEX automatically performs
tmem$
sanity checks in order to find memory corruption
If you need to debug memory leaks for your code consider to use mem$
(heap based) allocation, which utilizes ASAN memory leak tracking mechanisms.
Out-of-bounds access and poisoning
CEX encourage to use ASAN everywhere for debug needs. ASAN works great for handling out-of-bounds access for heap allocated memory. It’s a little bit difficult for arenas, because they use big pages of memory (we own it), therefore no complaints from the ASAN. In order to fix this tmem$
and AllocatorArena add poison areas around each allocation which triggers use-after-poison
crash. If you face it, make sure that your program doesn’t read/write out of out-of-bounds, try to temporarily substitute tmem$
by mem$
to get more precise error information.
Code patterns
Using temporary memory scope
mem$scope(tmem$, _)
{
u8* p2 = mem$malloc(_, 100);
}
CEX convention to use _
variable as temp allocator.
Using heap allocator
u8* p2 = mem$malloc(mem$, 100); // mem$ is a global variable for HeapAllocator
(mem$, p2); // we must manually free it
mem$freeuassert(p2 == NULL); // p2 set to NULL by mem$free()
Opening new ArenaAllocator scope
(4096, arena)
mem$arena{
u8* p2 = mem$malloc(arena, 10040);
}
Mixing ArenaAllocator and temp allocator
(4096, arena)
mem$arena{
// We will store result in the arena
u8* result = mem$malloc(arena, 10040);
mem$scope(tmem$, _)
{
// Do a temporary calculations with tmem$
u8* p2 = mem$malloc(_, 100);
// Copy persistent results here
[0] = p[0];
result} // NOTE: p2 and all temp data freed
// result remains
}
// result freed
Strings
Problems with strings in C
Strings in C are historically endless source of problems, bugs and vulnerabilities. String manipulation in standard lib C is very low level and sometimes confusing. But in my opinion, the most of the problems with string in C is a result of poor code practices, rather than language issues itself.
With modern tooling like Address Sanitizer it’s much easier to catch these bugs, so we are starting to face developer experience issues rather than security complications.
Problems with C char*
strings:
- No length information included, which leads to performance issues with overuse of
strlen
- Null terminator is critical for security, but not all libc functions handle strings securely
- String slicing is impossible without copy and setting null-terminator at the end of slice
- libc string functions behavior sometimes is implementation specific and insecure
Strings in CEX
There are 3 key string manipulation routines in general:
- General purpose string manipulation - uses vanilla
char*
type, with null-terminator, with dedicatedstr
namespace. The main purpose is to make strings easy to work with, and keeping them C compatible.str
namespace uses allocators for all memory allocating operations, which allows us to use temporary allocations withtmem$
. - String slicing - sometimes we need to obtain and work with a part of existing string, so CEX use
str_s
type for defining slices. There is dedicated sub-namespacestr.slice
which is specially designed for working with slices. Slices may or may not be null-terminated, they carry pointer and length. Typically is a quick and non-allocating way of working of string view representation. - String builder - in the case if we need to build string dynamically we may use
sbuf_c
type andsbuf
namespace in CEX. This type is dedicated for dynamically growing strings backed by allocator, that are always null-terminated and compatible withchar*
without casting.
Cex strings follow these principles:
- Security first - all strings are null-terminated, all buffer related operations always checking bounds.
- NULL-tolerant - all strings may accept NULL pointers and return NULL result on error. This significantly reduces count of
if(s == NULL)
error checks after each function, allowing to chain string operations and checkNULL
at the last step. - Memory allocations are explicit - if string function accepts
IAllocator
this is indication of allocating behavior. - Developer convenience - sometimes it’s easier to allocate and make new formatted string on
tmem$
for examplestr.fmt(_, "Hello: %s", "CEX")
, or use builtin pattern matching enginestr.match(arg[1], "command_*_(insert|delete|update))")
, or work with read-only slice representation of constant strings.
To get brief cheat sheet on functions list via Cex CLI type ./cex help str$
or ./cex help sbuf$
General purpose strings
Use str
for general purpose string manipulation, this namespace typically returns char*
or NULL on error, all function are tolerant to NULL arguments of char*
type and re-return NULL in this case. Each allocating function must have IAllocator
argument, also return NULL on memory errors.
char* str.clone(char* s, IAllocator allc);
Exception str.copy(char* dest, char* src, usize destlen);
bool str.ends_with(char* s, char* suffix);
bool str.eq(char* a, char* b);
bool str.eqi(char* a, char* b);
char* str.find(char* haystack, char* needle);
char* str.findr(char* haystack, char* needle);
char* str.fmt(IAllocator allc, char* format,...);
char* str.join(char** str_arr, usize str_arr_len, char* join_by, IAllocator allc);
.len(char* s);
usize strchar* str.lower(char* s, IAllocator allc);
bool str.match(char* s, char* pattern);
int str.qscmp(const void* a, const void* b);
int str.qscmpi(const void* a, const void* b);
char* str.replace(char* s, char* old_sub, char* new_sub, IAllocator allc);
.sbuf(char* s, usize length);
str_s strarr$(char*) str.split(char* s, char* split_by, IAllocator allc);
arr$(char*) str.split_lines(char* s, IAllocator allc);
Exc str.sprintf(char* dest, usize dest_len, char* format,...);
.sstr(char* ccharptr);
str_s strbool str.starts_with(char* s, char* prefix);
.sub(char* s, isize start, isize end);
str_s strchar* str.upper(char* s, IAllocator allc);
Exception str.vsprintf(char* dest, usize dest_len, char* format, va_list va);
String slices
CEX has a special type and namespace for slices, which are dedicated struct of (len, char*)
fields, which intended for working with parts of other strings, or can be a representation of a null-terminated string of full length.
Creating string slices
char* my_cstring = "Hello CEX";
// Getting a sub-string of a C string
= str.sub(my_cstring, -3, 0); // Value: CEX, -3 means from end of my_cstring
str_s my_cstring_sub
// Created from any other null-terminated C string
= str.sstr(my_cstring);
str_s my_slice
// Statically initialized slice with compile time known length
= str$s("Length of this slice created compile time");
str_s compile_time_slice
// Making slice from a buffer (may not be null-terminated)
char buf[100] = {"foo bar"};
= str.sbuf(buf, arr$len(buf)); str_s my_slice_buf
str_s
types are always passed by value, it’s a 16-byte struct, which fits 2 CPU registers on x64
Using slices
Once slice is created and you see str_s
type, it’s only safe to use special functions which work only with slices, because null-termination is not guaranteed anymore.
There are plenty of operations which can be made only on string view, without touching underlying string data.
char* str.slice.clone(str_s s, IAllocator allc);
Exception str.slice.copy(char* dest, str_s src, usize destlen);
bool str.slice.ends_with(str_s s, str_s suffix);
bool str.slice.eq(str_s a, str_s b);
bool str.slice.eqi(str_s a, str_s b);
.slice.index_of(str_s s, str_s needle);
isize str.slice.iter_split(str_s s, char* split_by, cex_iterator_s* iterator);
str_s str.slice.lstrip(str_s s);
str_s strbool str.slice.match(str_s s, char* pattern);
int str.slice.qscmp(const void* a, const void* b);
int str.slice.qscmpi(const void* a, const void* b);
.slice.remove_prefix(str_s s, str_s prefix);
str_s str.slice.remove_suffix(str_s s, str_s suffix);
str_s str.slice.rstrip(str_s s);
str_s strbool str.slice.starts_with(str_s s, str_s prefix);
.slice.strip(str_s s);
str_s str.slice.sub(str_s s, isize start, isize end); str_s str
All Cex formatting functions (e.g. io.printf()
, str.fmt()
) support special format %S
dedicated for string slices, allowing to work with slices naturally.
char* my_cstring = "Hello CEX";
= str.sstr(my_cstring);
str_s my_slice = str.slice.sub(my_slice, -3, 0);
str_s my_sub
.printf("%S - Making Old C Cexy Again\n", my_sub);
io.printf("buf: %c %c %c len: %zu", my_sub.buf[0], my_sub.buf[1], my_sub.buf[2], my_sub.len); io
Error handling
On error all slice related routines return empty (str_s){.buf = NULL, .len = 0}
, all routines check if .buf == NULL
therefore it’s safe to pass empty/error slice multiple times without need for checking errors after each call. This allows operations chaining like this:
= str.slice.sub(my_slice, -3, 0);
str_s my_sub = str.slice.remove_prefix(my_sub, str$s("pref"));
my_sub = str.slice.strip(my_sub);
my_sub if (!my_sub.buf) {/* OOPS error */}
String conversions
When working with strings, conversion from string into numerical types become very useful. Libc conversion functions are messy end error prone, CEX uses own implementation, with support for both char*
and slices str_s
.
You may use one of the functions above or pick type-safe/generic macro str$convert(str_or_slice, out_var_pointer)
Exception str.convert.to_f32(char* s, f32* num);
Exception str.convert.to_f32s(str_s s, f32* num);
Exception str.convert.to_f64(char* s, f64* num);
Exception str.convert.to_f64s(str_s s, f64* num);
Exception str.convert.to_i16(char* s, i16* num);
Exception str.convert.to_i16s(str_s s, i16* num);
Exception str.convert.to_i32(char* s, i32* num);
Exception str.convert.to_i32s(str_s s, i32* num);
Exception str.convert.to_i64(char* s, i64* num);
Exception str.convert.to_i64s(str_s s, i64* num);
Exception str.convert.to_i8(char* s, i8* num);
Exception str.convert.to_i8s(str_s s, i8* num);
Exception str.convert.to_u16(char* s, u16* num);
Exception str.convert.to_u16s(str_s s, u16* num);
Exception str.convert.to_u32(char* s, u32* num);
Exception str.convert.to_u32s(str_s s, u32* num);
Exception str.convert.to_u64(char* s, u64* num);
Exception str.convert.to_u64s(str_s s, u64* num);
Exception str.convert.to_u8(char* s, u8* num);
Exception str.convert.to_u8s(str_s s, u8* num);
For example:
= 0;
i32 num = "-2147483648";
s
// Both are equivalent
e$ret(str.convert.to_i32(s, &num));
e$ret(str$convert(s, &num));
Dynamic strings / string builder
If you need to build string dynamically you can use sbuf_c
type, which is simple alias for char*
, but with special logic attached. This type implements dynamic growing / shrinking, and formatting of strings with null-terminator.
Example
1= sbuf.create(5, mem$);
sbuf_c s
char* cex = "CEX";
2e$ret(sbuf.appendf(&s, "Hello %s", cex));
3e$assert(str.ends_with(s, "CEX"));
.destroy(&s); sbuf
- 1
- Creates new dynamic string on heap, with 5 bytes initial capacity
- 2
- Appends text to string with automatic resize (memory reallocation)
- 3
-
s
variable of typesbuf_c
is compatible with anychar*
routines, because it’s an alias ofchar*
If you need one-shot format for string try to use str.fmt(allocator, format, ...)
inside temporary allocator mem$scope(tmem$, _)
sbuf
namespace
/// Append string to the builder
Exc sbuf.append(sbuf_c* self, char* s);
/// Append format (using CEX formatting engine)
Exc sbuf.appendf(sbuf_c* self, char* format,...);
/// Append format va (using CEX formatting engine), always null-terminating
Exc sbuf.appendfva(sbuf_c* self, char* format, va_list va);
/// Returns string capacity from its metadata
.capacity(sbuf_c* self);
u32 sbuf/// Clears string
void sbuf.clear(sbuf_c* self);
/// Creates new dynamic string builder backed by allocator
.create(usize capacity, IAllocator allocator);
sbuf_c sbuf/// Creates dynamic string backed by static array
.create_static(char* buf, usize buf_size);
sbuf_c sbuf/// Destroys the string, deallocates the memory, or nullify static buffer.
.destroy(sbuf_c* self);
sbuf_c sbuf/// Returns false if string invalid
bool sbuf.isvalid(sbuf_c* self);
/// Returns string length from its metadata
.len(sbuf_c* self);
u32 sbuf/// Shrinks string length to new_length
Exc sbuf.shrink(sbuf_c* self, usize new_length);
/// Validate dynamic string state, with detailed Exception
Exception sbuf.validate(sbuf_c* self);
String formatting in CEX
All CEX routines with format strings (e.g. io.printf()
/log$error()
/str.fmt()
) use CEX special formatting engine with extended features:
%S
format specifier is used for printing string slices ofstr_s
type%S
format has a sanity checks in the case if simple string is passed to its place, it will print(%S-bad/overflow)
in the text. However, it’s not guaranteed behavior, and depends on platform.%lu
/%ld
- formats are dedicated for printing 64-bit integers, they are not platform specific%u
/%d
- formats are dedicated for printing 32-bit integers, they are not platform specific- Other formats should be compatible with vanilla libC.
Data structures and arrays
Data structures in CEX
There is a lack of support for data structures in C, typically it’s up to developer to decide what to do. However, I noticed that many other C projects tend to reimplement over and over again two core data structures, which are used in 90% of cases: dynamic arrays and hashmaps.
Key requirements of the CEX data structures:
- Allocator based memory management - allowing you to decide memory model and tweak it anytime.
- Type safety and LSP support - each DS must have a specific type and support LSP suggestions.
- Generic types - DS must be generic.
- Seamless C compatibility - allowing accessing CEX DS as plain C arrays and pass them as pointers.
- Support any item type including overaligned.
Dynamic arrays
Dynamic arrays (a.k.a vectors or lists) are designed specifically for developer convenience and based on ideas of Sean Barrett’s STB DS.
What is dynamic array in CEX
Technically speaking it’s a simple C pointer T*
, where T
is any generic type. The memory for that pointer is allocated by allocator, and its length is stored at some byte offset before the address of the dynamic array head.
With this type representation we can get some useful benefits:
- Array access with simple indexing, i.e.
arr[i]
instead ofdynamic_arr_get_at(arr, i)
- Passing by pointer into vanilla C code. For example, a function signature
my_func(int* arr, usize arr_len)
is compatible witharr$(int*)
, so we can call it asmy_func(arr, arr$len(arr))
- Passing length information integrated into single pointer,
arr$len(arr)
extracts length from dynamic array pointer - Type safety out of the box and full LSP support without dealing with
void*
arr$
namespace
arr$
is completely macro-driven namespace, with generic type support and safety checks.
arr$
API:
Macro | Description |
---|---|
arr$(T) | Macro type definition, just for indication that it’s a dynamic array |
arr$new(arr, allocator, kwargs…) | Initialization of the new instance of dynamic array |
arr$free(arr) | Dynamic array cleanup (if HeapAllocator was used) |
arr$clear(arr) | Clearing dynamic array contents |
arr$push(arr, item) | Adding new item to the end of array |
arr$pushm(arr, item, item1, itemN) | Adding many new items to the end of array |
arr$pusha(arr, other_arr, [other_arr_len]) | Adding many new item to the end of array |
arr$pop(arr) | Returns last element and removes it |
arr$at(arr, i) | Returns element at index with boundary checks for i |
arr$last(arr) | Returns last element |
arr$del(arr, i) | Removes element at index (following data is moved at the i-th position) |
arr$delswap(arr, i) | Removes element at index, the removed element is replaced by last one |
arr$ins(arr, i, value) | Inserts element at index |
arr$grow_check(arr, add_len) | Grows array by add_len if needed |
arr$sort(arr, qsort_cmp) | Sorting array with qsort function |
Examples
1arr$(int) arr = arr$new(arr, mem$);
2(arr, 1);
arr$push3(arr, 2, 3, 4);
arr$pushmint static_arr[] = { 5, 6 };
4(arr, static_arr /*, array_len (optional) */);
arr$pusha
.printf("arr[0]=%d\n", arr[0]); // prints arr[0]=1
io
// Iterate over array: prints lines 1 ... 6
5for$each (v, arr) {
.printf("%d\n", v);
io}
6(arr); arr$free
- 1
- Initialization and allocator
- 2
- Adding single element
- 3
- Adding multiple elements via vargs.
- 4
-
Adding arbitrary array, supports static arrays, dynamic CEX arrays or
int*
+arr_len - 5
-
Array iteration via
for$each
is common and compatible with all arrays in Cex (dynamic, static, pointer+len) - 6
- Deallocating memory (only needed when HeapAllocator is used)
(test_overaligned_struct)
test$case{
struct test32_s
{
alignas(32) usize s;
};
arr$(struct test32_s) arr = arr$new(arr, mem$);
struct test32_s f = { .s = 100 };
(mem$aligned_pointer(arr, 32) == arr);
tassert
for (u32 i = 0; i < 1000; i++) {
.s = i;
f(arr, f);
arr$push
(arr$len(arr), i + 1);
tassert_eq}
(arr$len(arr), 1000);
tassert_eq
for (u32 i = 0; i < 1000; i++) {
(arr[i].s, i);
tassert_eq(mem$aligned_pointer(&arr[i], 32) == &arr[i]);
tassert}
(arr);
arr$freereturn EOK;
}
(test_array_char_ptr)
test$case{
arr$(char*) array = arr$new(array, mem$);
(array, "foo");
arr$push(array, "bar");
arr$push(array, "baz", "CEX", "is", "cool");
arr$pushmfor (usize i = 0; i < arr$len(array); ++i) { io.printf("%s \n", array[i]); }
(array);
arr$free
return EOK;
}
1mem$scope(tmem$, _)
{
2arr$(char*) incl_path = arr$new(incl_path, _, .capacity = 128);
for$each (p, alt_include_path) {
3(incl_path, p);
arr$pushif (!os.path.exists(p)) { log$warn("alt_include_path not exists: %s\n", p); }
}
4}
- 1
-
Initializes a temporary allocator (
tmem$
) scope inmem$scope(tmem$, _) {...}
and assigns it as a variable_
(you can use any name). - 2
-
Initializes dynamic array with the scoped allocator variable
_
, allocates with specific capacity argument. - 3
- May allocate memory
- 4
- All memory will be freed at exit from this scope
Hashmaps
Hashmaps (hm$
) in CEX are backed by structs with key
and value
fields, essentially they are backed by plain dynamic arrays of structs (iterable values) with hash table part for implementing keys hashing.
Hashmaps in CEX are also generic, you may use any type of keys or values. However, there are special handling for string keys (char*
, or str_s
CEX slices). Typically string keys are not copied by hashmap by default, and stored by reference, so you’ll have to keep their allocation stable.
Hashmap initialization is similar to the dynamic arrays, you should define type and call hm$new
.
Array compatibility
Hashmaps in CEX are backed by dynamic arrays, which leads to the following developer experience enhancements:
arr$len
can be applied to hashmaps for checking number of available elementsfor$each/for$eachp
can be used for iteration over hashmap key/values pairs- Hashmap items can be accessed as arrays with index
Initialization
There are several ways for declaring hashmap types:
- Local function hashmap variables
hm$(char*, int) intmap = hm$new(intmap, mem$);
hm$(const char*, int) ap = hm$new(map, mem$);
hm$(struct my_struct, int) map = hm$new(map, mem$);
- Global hashmaps with special types
// NOTE: struct must have .key and .value fields
typedef struct
{
int key;
float my_val;
char* my_string;
int value;
} my_hm_struct;
void foo(void) {
// NOTE: this is equivalent of my_hm_struct* map = ...
(my_hm_struct) map = hm$new(map, mem$);
hm$s}
void my_func(hm$s(my_hm_struct)* map) {
// NOTE: passing hashmap type, parameter
int v = hm$get(*map, 1);
// NOTE: hm$set() may resize map, because of this we use `* map` argument, for keeping pointer valid!
(*map, 3, 4);
hm$set
// Setting entire structure
(*map, (my_hm_struct){ .key = 5, .my_val = 3.14, .my_string = "cexy", .value = 98 }));
hm$sets}
- Declaring hashmap as type
typedef hm$(char*, int) MyHashMap;
struct my_hm_struct {
;
MyHashmap hm};
void foo(void) {
// Initialing new variable
= hm$new(map, mem$);
MyHashMap map
// Initialing hashmap as a member of struct
struct my_hm_struct hs = {0};
(hs.hm, mem$);
hm$new
}
Hashmap API
Macro | Description | |
---|---|---|
hm$new(hm, allocator, kwargs…) | Initialization of hashmap | |
hm$set(hm, key, value) | Set element | |
hm$setp(hm, key, value) | Set element and return pointed to the newly added item inside hashmap | |
hm$sets(hm, struct_value…) | Set entire element as backing struct | |
hm$get(hm, key) | Get a value by key (as a copy) | |
hm$getp(hm, key) | Get a value by key as a pointer to hashmap value | |
hm$gets(hm, key) | Get a value by key as a pointer to a backing struct | |
hm$clear(hm) | Clears contents of hashmap | |
hm$del(hm, key) | Delete element by key | |
hm$len(hm) | Number of elements in hashmap / arr$len() also works |
Initialization params
hm$new
accepts optional params which may help you to adjust hashmap key behavior:
.capacity=16
- initial capacity of the hashmap, will be rounded to closest power of 2 number.seed=
- initial seed for hashing algorithm.copy_keys=false
- enabling copy ofchar*
keys and storing them specifically in hashmap.copy_keys_arena_pgsize=0
- enabling using arena forcopy_keys
mode
Example:
(test_hashmap_string_copy_arena)
test$case{
hm$(char*, int) smap = hm$new(smap, mem$, .copy_keys = true, .copy_keys_arena_pgsize = 1024);
char key2[10] = "foo";
(smap, key2, 3);
hm$set(hm$len(smap), 1);
tassert_eq(hm$get(smap, "foo"), 3);
tassert_eq(hm$get(smap, key2), 3);
tassert_eq(smap[0].key, "foo");
tassert_eq
// Initial buffer gets destroyed, but hashmap keys remain the same
(key2, 0, sizeof(key2));
memset
(smap[0].key, "foo");
tassert_eq(hm$get(smap, "foo"), 3);
tassert_eq
(smap);
hm$freereturn EOK;
}
Examples
hm$(char*, int) smap = hm$new(smap, mem$);
(smap, "foo", 3);
hm$set(smap, "foo");
hm$get(smap);
hm$len(smap, "foo");
hm$del(smap); hm$free
(test_hashmap_string)
test$case{
char key_buf[10] = "foobar";
hm$(char*, int) smap = hm$new(smap, mem$);
char* k = "foo";
char* k2 = "baz";
char key_buf2[10] = "foo";
char* k3 = key_buf2;
(smap, "foo", 3);
hm$set
(hm$len(smap), 1);
tassert_eq(hm$get(smap, "foo"), 3);
tassert_eq(hm$get(smap, k), 3);
tassert_eq(hm$get(smap, key_buf2), 3);
tassert_eq(hm$get(smap, k3), 3);
tassert_eq
(hm$get(smap, "bar"), 0);
tassert_eq(hm$get(smap, k2), 0);
tassert_eq(hm$get(smap, key_buf), 0);
tassert_eq
(hm$del(smap, key_buf2), 1);
tassert_eq(hm$len(smap), 0);
tassert_eq
(smap);
hm$freereturn EOK;
}
(test_hashmap_basic_iteration)
test$case{
hm$(int, int) intmap = hm$new(intmap, mem$);
(intmap, 1, 10);
hm$set(intmap, 2, 20);
hm$set(intmap, 3, 30);
hm$set
(hm$len(intmap), 3); // special len
tassert_eq(arr$len(intmap), 3); // NOTE: arr$len is compatible
tassert_eq
// Iterating by value (data is copied)
= 1;
u32 nit for$each (it, intmap) {
(it.key, nit);
tassert_eq(it.value, nit * 10);
tassert_eq++;
nit}
// Iterating by pointers (data by reference)
(it, intmap)
for$eachp{
= intmap - it; // deriving index from pointers
isize _nit (it->key, _nit);
tassert_eq(it->value, _nit * 10);
tassert_eq}
(intmap);
hm$free
return EOK;
}
Working with arrays
Arrays are probably most used concept in any language, with C arrays may have many different forms. Unfortunately, the main problem of working with arrays in C is a specialization of methods and operations, each type of array may require special iteration macro, or function for getting array length or element.
Collection types in C:
- Static arrays
i32 arr[10]
- Dynamic arrays as pointers
(i32* arr, usize arr_len)
- Custom dynamic arrays
dynamic_array_push_back(&int_array, &i);
- Char buffers
char buf[1024]
- Null-terminated strings and slices
- Hashmaps
Cex tries to solve this by unification of all arrays operations around standard design principles, without getting too far away from standard C.
arr$len
unified length
arr$len(array)
macro is a ultimate tool for getting lengths of arrays in CEX. It supports: static arrays, char buffers, string literals, dynamic arrays of CEX arr$
and hashmaps of CEX hm$
. Also it’s a NULL resilient macro, which returns 0 if array
argument is NULL.
Not all array pointers are supports by arr$len
(only dynamic arrays or hashmaps are valid), however in debug mode arr$len
will raise an assertion/ASAN crash if you passed wrong pointer type there.
Example:
(test_array_len)
test$case{
arr$(int) array = arr$new(array, mem$);
(array, 1, 2, 3);
arr$pushm
// Works with CEX dynamic arrays
(arr$len(array), 3);
tassert_eq
// NULL is supported, and emits 0 length
(array);
arr$free(array == NULL);
tassert(arr$len(array), 0); // NOTE: NULL array - len = 0
tassert_eq
// Works with static arrays
char buf[] = {"hello"};
(arr$len(buf), 6); // NOTE: includes null term
tassert_eq
// Works with arrays of given capacity
char buf2[10] = {0};
(arr$len(buf2), 10);
tassert_eq
// Type doesn't matter
[7] = {0};
i32 a(arr$len(a), 7);
tassert_eq
// Works with string literals
(arr$len("CEX"), 4); // NOTE: includes null term
tassert_eq
// Works with CEX hashmap
hm$(int, int) intmap = hm$new(intmap, mem$);
(intmap, 1, 3);
hm$set(arr$len(intmap), 1);
tassert_eq
(intmap);
hm$free
return EOK;
}
Accessing elements of array is unified
(test_array_access)
test$case{
arr$(int) array = arr$new(array, mem$);
(array, 1, 2, 3);
arr$pushm
// Dynamic array access is natural C index
(array[2], 3);
tassert_eq// tassert_eq(arr$at(array, 3), 3); // NOTE: this is bounds checking access, with assertion
(array);
arr$free
// Works with static arrays
char buf[] = {"hello"};
(buf[1], 'e');
tassert_eq
// Works with CEX hashmap
hm$(int, int) intmap = hm$new(intmap, mem$);
(intmap, 1, 3);
hm$set(intmap, 2, 5);
hm$set(arr$len(intmap), 2);
tassert_eq
// Accessing hashmap as array
// NOTE: hashmap elements are ordered until first deletion
(intmap[0].key, 1);
tassert_eq(intmap[0].value, 3);
tassert_eq
(intmap[1].key, 2);
tassert_eq(intmap[1].value, 5);
tassert_eq
(intmap);
hm$free
return EOK;
}
CEX way of iteration over arrays
CEX introduces an unified for$*
macros which helps with dealing with looping, these are typical patters for iteration:
for$each(it, array, [array_len])
- iterates over array,it
represents value of array item.array_len
is optional and usesarr$len(array)
by default, or you might explicitly set it for iterating over arbitrary C pointer+len arrays.for$eachp(it, array, [array_len])
- iterates over array,it
represent a pointer to array item.array_len
is inferred by default.for$iter(it_val_type, it, iter_funct)
- a special iterator for non-indexable collections or function based iteration, tailored for customized iteration of unknown length.for(usize i = 0; i < arr$len(array); i++)
- classic also works :)
(test_array_iteration)
test$case{
arr$(int) array = arr$new(array, mem$);
(array, 1, 2, 3);
arr$pushm
= 0; // it's only for testing
i32 nit for$each(it, array) {
(it, ++nit);
tassert_eq.printf("el=%d\n", it);
io}
// Prints:
// el=1
// el=2
// el=3
= 0;
nit // NOTE: prefer this when you work with bigger structs to avoid extra memory copying
(it, array) {
for$eachp// TIP: making array index out of `it`
= it - array;
usize i (i, nit);
tassert_eq
// NOTE: it now is a pointer
(*it, ++nit);
tassert_eq.printf("el[%zu]=%d\n", i, *it);
io}
// Prints:
// el[0]=1
// el[1]=2
// el[2]=3
// Static arrays work as well (arr$len inferred)
[] = {1, 2, 3, 4, 5};
i32 arr_intfor$each(it, arr_int) {
.printf("static=%d\n", it);
io}
// Prints:
// static=1
// static=2
// static=3
// static=4
// static=5
// Simple pointer+length also works (let's do a slice)
* slice = &arr_int[2];
i32for$each(it, slice, 2) {
.printf("slice=%d\n", it);
io}
// Prints:
// slice=3
// slice=4
(array);
arr$freereturn EOK;
}
Making custom collection iterators
It’s possible to make custom iterator, specifically for unbounded collections or sparse data structures. However, this iteration has higher overhead than simple for$each
loop, but sometimes it’s necessary.
Consider using iter_
prefix of the function name, by convention, it’s a good indicator of using for$iter()
Example, of how str.slice.iter_split()
was implemented:
typedef struct
{
struct
{
union
{
;
usize ichar* skey;
void* pkey;
};
} idx;
char _ctx[47]; // <<< use this buffer to store iterator state, it's usize aligned
u8 stopped;
u8 initialized;
} cex_iterator_s;
static_assert(sizeof(cex_iterator_s) <= 64, "cex size");
static str_s
(str_s s, char* split_by, cex_iterator_s* iterator)
cex_str__slice__iter_split{
uassert(iterator != NULL && "null iterator");
uassert(split_by != NULL && "null split_by");
// temporary struct based on _ctxbuffer
struct iter_ctx
{
;
usize cursor;
usize split_by_len;
usize str_len}* ctx = (struct iter_ctx*)iterator->_ctx;
static_assert(sizeof(*ctx) <= sizeof(iterator->_ctx), "ctx size overflow");
static_assert(alignof(struct iter_ctx) <= alignof(usize), "cex_iterator_s _ctx misalign");
if (unlikely(!iterator->initialized)) {
// First run handling
->initialized = 1;
iteratorif (unlikely(!_cex_str__isvalid(&s) || s.len == 0)) {
->stopped = 1;
iteratorreturn (str_s){ 0 };
}
->split_by_len = strlen(split_by);
ctxuassert(ctx->split_by_len < UINT8_MAX && "split_by is suspiciously long!");
if (ctx->split_by_len == 0) {
->stopped = 1;
iteratorreturn (str_s){ 0 };
}
= _cex_str__index(&s, split_by, ctx->split_by_len);
isize idx if (idx < 0) { idx = s.len; }
->cursor = idx;
ctx->str_len = s.len; // this prevents s being changed in a loop
ctx->idx.i = 0;
iteratorif (idx == 0) {
// first line is \n
return (str_s){ .buf = "", .len = 0 };
} else {
return str.slice.sub(s, 0, idx);
}
} else {
if (unlikely(ctx->cursor >= ctx->str_len)) {
->stopped = 1;
iteratorreturn (str_s){ 0 };
}
->cursor++;
ctxif (unlikely(ctx->cursor == ctx->str_len)) {
// edge case, we have separator at last col
// it's not an error, return empty split token
->idx.i++;
iteratorreturn (str_s){ .buf = "", .len = 0 };
}
// Get remaining string after prev split_by char
= str.slice.sub(s, ctx->cursor, 0);
str_s tok = _cex_str__index(&tok, split_by, ctx->split_by_len);
isize idx
->idx.i++;
iterator
if (idx < 0) {
// No more splits, return remaining part
->cursor = s.len;
ctx// iterator->stopped = 1;
return tok;
} else if (idx == 0) {
return (str_s){ .buf = "", .len = 0 };
} else {
// Sub from prev cursor to idx (excluding split char)
->cursor += idx;
ctxreturn str.slice.sub(tok, 0, idx);
}
}
}
Namespaces
Naming collisions will always remain a problem of C language. However, we could try our best to reduce surface of conflict, by aggregating functions with prefixes to nice-looking namespace symbols. But the primary role of CEX namespacing approach is to keep project structure organized, easier to navigate and understand. Another beneficial effect of using namespaces is reduction of cognitive work when we try to recall the function name when typing with LSP, we’ll see this effect below. At last, using namespacing we can add OOP-ish flavor to our structures, which could behave as classes.
Key features of namespaces
- They can be automatically generated from
.c
file, no need for maintaining changes in.h
for every function signature change. - Helping to maintain naming conventions name of the
.c
file must be the same as namespace - Reducing surface for name collisions, only global namespace name is exposed.
- Allowing support of sub-namespaces, easier to remember and type with LSP
- Less symbols in LSP suggestions
- Better readability with
.
separator, and color highlighting of different parts of the function call - Namespace structure combines function signatures closely in one place, so it’s easier to figure out what functions are available.
CEX Namespaces in the nutshell
CEX namespace is a global const struct, with function pointers in it.
#define CEX_NAMESPACE __attribute__((visibility("hidden"))) extern const
typedef struct {...} KeyMap_c;
struct __cex_namespace__KeyMap {
// Autogenerated by CEX
// clang-format off
/// NOTE: Cex may generate brief doc string here, if was added prior function implementation in .c file
Exception (*create)(KeyMap_c* self, char* input_dev_or_name);
// Destroys KeyMap instance
void (*destroy)(KeyMap_c* self);
Exception (*find_mapped_keyboard)(KeyMap_c* self, char* keyboard_name);
Exception (*handle_events)(KeyMap_c* self);
Exception (*handle_key)(KeyMap_c* self, struct input_event* ev);
Exception (*handle_mouse_move)(KeyMap_c* self);
// clang-format on
};
struct __cex_namespace__KeyMap KeyMap; CEX_NAMESPACE
So the KeyMap
namespace allowing the following usage:
1= { 0 };
KeyMap_c keymap 2e$goto(KeyMap.create(&keymap, file), end);
3e$goto(KeyMap.handle_events(&keymap), end);
- 1
-
_c
suffix ofKeyMap_c
is an indication of namespace, it can be interpreted asclass
orhas code
conceptually. - 2
-
Functions of
KeyMap
namespace are separated by dots, it’s easier to read, and type with LSP, because it filters only relevant information. See pictures below. - 3
-
Dotted notation may get distinct color highlighting which help to distinguish
namespace
and itsfunction
LSP Suggestions are much better
- If you start type conventional
KeyMap_create()
function name, the LSP suggestions will get cluttered, fuzz typing may return not what you want - With CEX namespace you get only list of
KeyMap
functions, and fuzzy typing works way better because you have limited options
Sub-namespaces
Sometimes libraries or namespaces can have dozens of functions, so it’s more convenient to add extra level of namespacing. For example, CEX str
namespace have many of functions which are grouped by functionality. str.slice.
works with str_s
types, str.convert.
dealing with conversions, some functions take place in the root namespace, for example str.find()
.
Sub-namespaces allow to build mental model of code, and helping write function names as a decision tree. For example, if I need str
, .
, then I need deal with slice slice
, .
, then I have to find exact thing what I need.
Check full str
namespace options with ./cex help str$
How to make a namespace
Making code
For example: you need to add new foo
namespace
- Create a pair of files with name prefix
src/foo.c
andsrc/foo.h
- You can create
static
functionsfoo_fun1()
,foo_fun2()
,foo__bar__fun3()
,foo__bar__fun4()
. These functions will be processed and wrapped into afoo
namespace so you can access them viafoo.fun1()
,foo.fun2()
,foo.bar.fun3()
,foo.bar.fun4()
- Run
./cex process src/foo.c
Requirements / caveats:
- You must have
foo.c
andfoo.h
in the same folder - Filename must start with
foo
- namespace prefix - Each function in
foo.c
that you’d like to add to namespace must start withfoo_
- For adding sub-namespace use
foo__subname__
prefix - Only one level of sub-namespace is allowed
- You may not declare function signature in header, and only use .c static functions
- Functions with
static inline
are not included into namespace - Functions with prefix
foo__some
are considered internal and not included - New namespace is created when you use exact
src/foo.c
argument,all
just for updates
Style conventions
In my experience, it’s helpful to distinguish between type of code (namespace) we are dealing with, nothing strict, just guidelines:
- Sometimes we need OOP-ish / object / class behavior, which wraps a
typedef struct MyClass_c
, with constructorMyClass.create()
and destructorMyClass.destroy(MyClass_c* self)
. This type of code should be placed intoMyClass.c/MyClass.h
files. - Sometimes we need just a bunch of functions logically combined together and dealing with different set of types, then we should use lower case name
foo
, ormy_namespace
. For example, CEXstr
namespace. Also you may want to add_c
suffix to thetypedef struct my_type_c
to indicate that it has a namespace code attachedmy_type.some_func()
.
CLI Commands
# More help
➜ ./cex process --help
# Creates new `foo` namespace or update existing one
➜ ./cex process src/foo.c
# Update all existing namespaces in the project
# use it after you change signatures of your functions
➜ ./cex process all
Special notes
Performance questions
While namespaces are the static structures with pointers in them, they may be a cause of performance hit for calling functions without compiler optimization enabled (the same as C++ virtual functions hit). However, modern compilers are smart enough to replace function pointer dereferencing call with direct function call when -O1
optimization is enabled.
Getting LSP help / goto definition
CEX Namespaces work with clangd
LSP server pretty well. However, LSP help functionality is limited, we can get only list of parameters for completion.
Go to definition works, but with some caveats. If you place cursor (|
) at KeyMap.cre|ate
and do go to definition in LSP, it will jump onto KeyMap
structure type. If you need to goto implementation place cursor like this KeyM|ap.create
, and goto definition, it will jump on the KeyMap
struct implementation inside KeyMap.c
file. Then find KeyMap_create
record and jump at it once again.
Build system
CEX has integrated build system cexy$
, inspired by Zig-build and Tsoding’s nob.h
. It allows you to build your project without dealing with CMake/Make/Ninja/Meson dependencies. For small projects cexy has simplified mode when build is config-driven. For complex or cross-platform projects cexy
enables low-level tools for running the compiler and building specific project assembly logic.
How it works
- You need to create
cex.c
file, which is entry point for all building process and cexy tools. For the newer projects, ifcex.c
is not there, run the bootstrapping routine:
cc -D CEX_NEW -x c ./cex.h -o ./cex
- Then you should compile
cex
CLI, simply using following command:
cc ./cex.c -o ./cex
- Afterwards you should have the
./cex
executable in project directory. It’s your main entry point for CEX project management and your project is ready to go.
Now you can launch a sample program or run its unit tests.
./cex test run all
./cex app run myapp
Key-features of cexy$ CLI tool
- Main project management CLI: building, running unit tests, fuzzer, stats, etc
- Allows to generate new apps or projects
- Generates CEX namespaces for user code
- Fuzzy search for help in user code base
- Supports custom command runner
- Supports build-mode configuration
- Allows OS related operations with files, paths, command launching, etc
- Adds support for external dependencies via pkg-config and vcpkg
- UnitTest and Fuzzer runner
- Fetches 3rd party code, updates
cex.h
itself orcex lib
via git
Simple-mode
cexy$
has built-in build routine for building/running/debugging apps, running unit tests and fuzzers. It can be configured using # define cexy$<config-constant-here>
in your cex.c
file.
# full list of cexy API namespace and cexy$ variables
./cex help cexy$
# list of actual values for cexy$ vars in current project
./cex config
When you run ./cex app run|test|fuzz myapp
it uses cexy$
config vars internally, and runs build routine which may cover of 80% generic project needs.
Simple mode add several project structure constraints:
- Source code should be in
src/
directory - If you have
myapp
application itsmain()
function should be located atsrc/myapp.c
orsrc/myapp/main.c
- Simple-mode uses unity build approach, so all your sources have to be included as
#include "src/foo.c"
insrc/myapp.c
. - Simple-mode does not produce object files and does not do extra linking stage. It’s intentional, and in my opinion is better for smaller/medium (<100k LOC) projects.
Project configuration
cexy$
is configured via setting constants in header files, which can be directly compiled as C code in your project as well. Use ./cex config
for checking current project configuration. Configuration can be optionally includes as cex_config.h
(or any other name), or directly set in cex.c
file.
You can change pre-defined cexy config with ./cex -D<YOUR_VAR> config
, it will recompile cex CLI with new settings and all subsequent ./cex
call will be using new settings. You may reset to defaults with ./cex -D config
.
// file: cex.c
#if __has_include("cex_config.h")
// Custom config file
# include "cex_config.h"
#else
// Overriding config values
# if defined(CEX_DEBUG)
# define CEX_LOG_LVL 4 /* 0 (mute all) - 5 (log$trace) */
# else
# define cexy$cc_args "-Wall", "-Wextra", "-Werror", "-g", "-O3", "-fwhole-program"
# endif
#endif
# Check current config (CEX_DEBUG not set, using -O3 gcc/clang argument)
./cex config
>>>
* cexy$cc_args "-Wall", "-Wextra", "-Werror", "-g", "-O3", "-fwhole-program"
* ./cex -D<ARGS> config ""
<<<
# Using CEX_DEBUG from `cex.c` (step 1 tab), you may use any name
./cex -DCEX_DEBUG config
# Check what's changed
./cex config
>>>
* cexy$cc_args "-Wall", "-Wextra", "-Werror", "-g3", "-fsanitize-address-use-after-scope", "-fsanitize=address", "-fsanitize=undefined", "-fsanitize=leak", "-fstack-protector-strong"
* ./cex -D<ARGS> config "-DCEX_DEBUG "
<<<
# Revert previous config back
./cex -D config
Minimalist cexy build system
If you wish you could build using your own logic, let’s make a simple custom build command, without utilizing cexy machinery.
// file: cex.c
#define CEX_IMPLEMENTATION
#define CEX_BUILD
#include "cex.h"
Exception cmd_mybuild(int argc, char** argv, void* user_ctx);
int
(int argc, char** argv)
main{
(); // cex self rebuild and init
cexy$initialize= {
argparse_c args .description = cexy$description,
.epilog = cexy$epilog,
.usage = cexy$usage,
(
argparse$cmd_list,
cexy$cmd_all// cexy$cmd_fuzz, /* disable built-in commands */
// cexy$cmd_test, /* disable built-in commands */
// cexy$cmd_app, /* disable built-in commands */
{ .name = "my-build", .func = cmd_mybuild, .help = "My Custom build" },
),
};
if (argparse.parse(&args, argc, argv)) { return 1; }
void* my_user_ctx = NULL; // passed as `user_ctx` to command
if (argparse.run_command(&args, my_user_ctx)) { return 1; }
return 0;
}
Exception cmd_mybuild(int argc, char** argv, void* user_ctx) {
("Launching my-build command\n");
log$infoe$ret(os$cmd("gcc", "-Wall", "-Wextra", "hello.c", "-o", "hello"));
return EOK;
}
// file: hello.c
#define CEX_IMPLEMENTATION
#include "cex.h"
int
(int argc, char** argv)
main{
(void)argc;
(void)argv;
.printf("Hello from CEX\n");
ioreturn 0;
}
~ ➜ ./cex my-build
[INFO] ( cex.c:50 cmd_mybuild() ) Launching my-build command
[DEBUG] ( cex.c:51 cmd_mybuild() ) CMD: gcc -Wall -Wextra hello.c -o hello
~ ➜ ./hello
Hello from CEX
You can use cexy build source directly and adjust if needed, just use this command to extract source code from ./cex help --source cexy.cmd.simple_app
Dependency management
Dependencies are always pain-points, it’s against CEX philosophy but sometimes it’s necessary evil. CEX has capabilities for using pkgconf
compatible-utilities, and vcpkg
framework. You may check examples/
folder in cex
GIT repo, it contains couple sample projects with dependencies. Windows OS dependencies is a hell, try to use MSYS2 or vcpkg.
Currently pkgconf/vcpkg
dependencies are supported in simple mode, or figure out how to integrate cexy$pkgconf()
macro into your custom build yourself.
Here is excerpt of libcurl+libzip
build for Linux+MacOS+windows:
// file: cex.c
# define cexy$pkgconf_libs "libcurl", "libzip"
# define CEX_LOG_LVL 4 /* 0 (mute all) - 5 (log$trace) */
# if _WIN32
// using mingw libs .a
# define cexy$build_ext_lib_stat ".a"
// NOTE: windows is a special case, the best way to manage dependencies to have vcpkg
// you have to manually install vcpkg and configure paths. Currently it uses static
// environment and mingw because it was tested under MSYS2
//
// Also install the following in `classic` mode:
// > vcpkg install --triplet=x64-mingw-static curl
// > vcpkg install --triplet=x64-mingw-static libzip
# define cexy$vcpkg_triplet "x64-mingw-static"
# define cexy$vcpkg_root "c:/vcpkg/"
# else
// NOTE: linux / macos will use system wide libs
// make sure you installed libcurl-dev libzip-dev via package manager
// names of packages will depend on linux distro and macos home brew.
# endif
#endif
Cross-platform builds
For compile time you may use platform specific constants, for example #ifdef _WIN32
or you can set arbitrary config define that switching to platform logic (compile time). Also cex has os.platform.
sub-namespace for runtime platform checks:
# if _WIN32
// using mingw libs .a
# define cexy$build_ext_lib_stat ".a"
# define cexy$vcpkg_triplet "x64-mingw-static"
#elif defined(__APPLE__) || defined(__MACH__)
# define cexy$vcpkg_triplet "arm64-osx"
# else
# define cexy$vcpkg_triplet "x64-linux"
# endif
#endif
// NOTE: activate with the following command
// ./cex -DCEX_WIN config
// file: cex.c
#ifdef CEX_WIN
# define cexy$cc "x86_64-w64-mingw32-gcc"
# define cexy$cc_args_sanitizer "-g3"
# define cexy$debug_cmd "wine"
# define cexy$build_ext_exe ".exe"
#endif
// platform-dependent compilation flags (runtime)
// file: cex.c (as a part of custom build command)
arr$(char*) args = arr$new(args, _);
(args, cexy$cc, "shell.c", "../sqlite3.o", "-o", "../sqlite3");
arr$pushmif (os.platform.current() == OSPlatform__win) {
(args, "-lpthread", "-lm");
arr$pushm} else {
(args, "-lpthread", "-ldl", "-lm");
arr$pushm}
(args, NULL);
arr$pushe$ret(os$cmda(args));
You can get example source code with highlighting if any function is used in the project, use shell command: ./cex help --example os.platform.current
Developer Tools
CEX language is designed for improving developer experience with C, ./cex
CLI contains key tools for managing project, running apps, debugging, unit testing and fuzzing.
Sanitizers
CEX enables sanitizers by default if they are supported by your OS and compiler. ASAN/UBSAN are extremely useful for catching bugs. Also CEX uses sanitizers for call stack printouts for uassert()
. clang
has the best sanitizer support across many platforms, gcc
sanitizers are supported on Linux.
Default sanitizer arguments:
// file cex.c
#define cexy$cc_args_sanitizer "-fsanitize-address-use-after-scope", "-fsanitize=address", "-fsanitize=undefined", "-fsanitize=leak", "-fstack-protector-strong"
Asserts
I’m a big fan of “asserts everywhere” code style, which is also known design by contract, or TigerBeetle style, it has many names. Apparently, C asserts kinda work, but are huge pain for debugging without live debugger session.
So cex.h
has 2 types of asserts:
uassert*()
family work like vanilla assertion and lead to abortion at failure (but they print tracebacks with call stack and line numbers). These asserts are stripped whenNDEBUG
is defined.e$assert()
returnsError.assert
and only intended for usage in function withException
return type. These asserts remain in place even whenNDEBUG
is defined.
// Raises abort
uassert(a == 4); // vanilla
uassert(b == a && "Oops it's a message"); // with static message
(b == 2, "b[%d] != 2", b); // with formatting
uassertf
// Disabling uassert() - only for unit test mode
();
uassert_disable(NULL);
run_bad_stuff();
uassert_enable
// Returns Error.assert on failure + prints [ASSERT] file:line in the stdout
Exception read_file(char* filename, char* buf, isize* out_buf_size) {
e$assert(buff != NULL); // vanilla
e$assert(filename != NULL && "invalid filename"); // with static message
(filename == NULL, "filename: %s", filename); // with formatting
e$assertfreturn EOK;
}
uassert() tracebacks only available if program was compiled with ASAN flags.
Unit Testing Tool
Each CEX test file is compiled as stand alone executable, this allow making specialized tests with mocks, experiment with parts of bigger project without fixing plethora of compiler errors, and do a test driven development and debugging.
Create new test with: ./cex test create tests/test_file.c
, run it ./cex test run tests/test_file.c
or ./cex test run all
.
# Getting built-in help
➜ ./cex test
Usage:
cex test [options] {run,build,create,clean,debug} all|tests/test_file.c [--test-options]
CEX built-in simple test runner
Each cexy test is self-sufficient and unity build, which allows you to test
static funcions, apply mocks to some selected modules and functions, have more
control over your code. See `cex config --help` for customization/config info.
CEX test runner keep checking include modified time to track changes in the
source files. It expects that each #include "myfile.c" has "myfile.h" in
the same folder. Test runner uses cexy$cc_include for searching.
CEX is a test-centric language, it enables additional sanity checks then in
test suite, all warnings are enabled -Wall -Wextra. Sanitizers are enabled by
default.
Code requirements:
1. You should include all your source files directly using #include "path/src.c"
2. If needed provide linker options via cexy$ld_libs / cexy$ld_args
3. If needed provide compiler options via cexy$cc_args_test
4. All tests have to be in tests/ folder, and start with `test_` prefix
5. Only #include with "" checked for modification
Test suite setup/teardown:
// setup before every case
test$setup_case() {return EOK;}
// teardown after every case
test$setup_case() {return EOK;}
// setup before suite (only once)
test$setup_suite() {return EOK;}
// teardown after suite (only once)
test$setup_suite() {return EOK;}
Test case:
test$case(my_test_case_name) {
// run `cex help tassert_` / `cex help tassert_eq` to get more info
tassert(0 == 1);
tassertf(0 == 1, "this is a failure msg: %d", 3);
tassert_eq(buf, "foo");
tassert_eq(1, true);
tassert_eq(str.sstr("bar"), str$s("bar"));
tassert_ne(1, 0);
tassert_le(0, 1);
tassert_lt(0, 1);
return EOK;
}
If you need more control you can build your own test runner. Just use cex help
and get source code `./cex help --source cexy.cmd.simple_test`
-h, --help show this help message and exit
Test running examples:
cex test create tests/test_file.c - creates new test file from template
cex test build all - build all tests
cex test run all - build and run all tests
cex test run tests/test_file.c - run test by path
cex test debug tests/test_file.c - run test via `cexy$debug_cmd` program
cex test clean all - delete all test executables in `cexy$build_dir`
cex test clean test/test_file.c - delete specific test executable
cex test run tests/test_file.c [--help] - run test with passing arguments to the test runner program
Fuzzers
CEX has a fuzzers back-end, currently libfuzzer
- built-in in clang
is preferable, but AFL++
also works. CEX fuzzers are designed to hit directly in heart of the code, therefore it’s easier to use clang
, however CEX fuzzer API in CEX remain compatible with AFL as well.
Try to split functionality across many small fuzz files for different aspects of your program. This will help to hit specific pain points easier. Look into fuzz examples in CEX GIT repo in fuzz/
folder.
Making new fuzzer test
# Placing into fuzz/ directory is mandatory
./cex fuzz create fuzz/myapp/fuzz_bar.c
./cex fuzz create fuzz/mymodule/fuzz_foo.c
Sample fuzz file
// file: fuzz/myapp/fuzz_bar.c
#define CEX_IMPLEMENTATION
#include "cex.h"
/*
// setup is not mandatory, but useful for establishing corpus
fuzz$setup(void){
// This function allows programmatically seed new corpus for fuzzer
io.printf("CORPUS: %s\n", fuzz$corpus_dir);
mem$scope(tmem$, _){
char* fn = str.fmt(_, "%s/my_case", fuzz$corpus_dir);
(void)fn;
// io.file.save(fn, "my seed data");
}
}
*/
int
(const u8* data, usize size){
fuzz$case// TODO: do your stuff based on input data and size
if (size > 2 && data[0] == 'C' && data[1] == 'E' && data[2] == 'X') {
();
__builtin_trap}
return 0;
}
(); fuzz$main
Running fuzzer case
# Run specific test (infinite timeout)
./cex fuzz run fuzz/myapp/fuzz_bar.c
# Run all with time limit per test
./cex fuzz run all
>> (output of fuzzer)
SUMMARY: libFuzzer: deadly signal
MS: 4 PersAutoDict-ChangeBit-ShuffleBytes-CMP- DE: "E\000"-"X\000"-; base unit: a04ab19fbcf9e6dd3b7f1b71cb156335556f3507
0x43,0x45,0x58,0x0,0x3e,
CEX\000>
artifact_prefix='fuzz_file.'; Test unit written to fuzz_file.crash-88777
Base64: Q0VYAD4=
>> cat fuzz_file.crash-88777
CEX>
Running / debugging crash file
# Run single artifact file caused crash
# NOTE: fuzz_file.crash-88777 must be located at fuzz/myapp/
./cex fuzz run fuzz/myapp/fuzz_bar.c fuzz_file.crash-88777
# run in gdb (see cexy$debug_cmd )
./cex fuzz debug fuzz/myapp/fuzz_bar.c fuzz_file.crash-88777
Lines Of Code stats
./cex stats
calculates .c/.h
lines of code and estimates assertion percentage as a code quality metric.
~ ➜ ./cex stats 'src/*.c' 'tests/*.c'
Project stats (parsed in 0.020sec)
--------------------------------------------------------
Metric | Code | Tests |
--------------------------------------------------------
Files | 27 | 30 |
Asserts | 361 | 4230 |
Lines of code | 11494 | 12261 |
Lines of comments | 725 | 606 |
Asserts per LOC | 3.14% | 34.50% |
Total asserts per LOC | 39.94% | <<< |
--------------------------------------------------------
Fetching libraries and CEX updates
./cex libfetch
command is a simple git
wrapper for retrieving/updating cex lib/
files or updating cex.h
itself. This command can be used with any git repo, for getting single-header files.
~ ➜ ./cex libfetch --help
Usage:
cex libfetch [options]
Fetching 3rd party libraries via git (by default it uses cex git repo as source)
-h, --help show this help message and exit
-u, --git-url Git URL of the repository (default: 'https://github.com/alexveden/cex.git')
-l, --git-label Git label (default: 'HEAD')
-o, --out-dir Output directory relative to project root (default: './')
-U, --update Force replacing existing code with repository files (default: N)
-p, --preserve-dirs Preserve directory structure as in repo (default: Y)
Command examples:
cex libfetch lib/test/fff.h - fetch signle header lib from CEX repo
cex libfetch -U cex.h - update cex.h to most recent version
cex libfetch lib/random/ - fetch whole directory recursively from CEX lib
cex libfetch --git-label=v2.0 file.h - fetch using specific label or commit
cex libfetch -u https://github.com/m/lib.git file.h - fetch from arbitrary repo
cex help --example cexy.utils.git_lib_fetch - you can call it from your cex.c (see example)
Getting help for project
./cex help
is CLI command for getting help for your project, it works for CEX and your project as well. You can use it as symbol search: types, functions, files, examples and source code. Also it supports CEX namespaces as struct interfaces and macro$namespaces as well.
Help command
~ ➜ ./cex help --help
Usage:
cex help [options] [query]
Symbol / documentation search tool for C projects
Options
-h, --help show this help message and exit
-f, --filter file pattern for searching (default: './*.[hc]')
-s, --source show full source on match (default: N)
-e, --example finds random example in source base (default: N)
-o, --out write output of command to file (default: '')
Query examples:
cex help - list all namespaces in project directory
cex help foo - find any symbol containing 'foo' (case sensitive)
cex help foo. - find namespace prefix: foo$, Foo_func(), FOO_CONST, etc
cex help os$ - find CEX namespace help (docs, macros, functions, types)'foo_*_bar' - find using pattern search for symbols (see 'cex help str.match')
cex help '*_(bar|foo)' - find any symbol ending with '_bar' or '_foo'
cex help
cex help str.find - display function documentation if exactly matched'os$PATH_SEP' - display macro constant value if exactly matched
cex help
cex help str_s - display type info and documentation if exactly matched
cex help --source str.find - display function source if exactly matchedin codebase if exactly matched cex help --example str.find - display random function use
Getting language cheat-sheets
All core namespaces in CEX have cheat-sheets with full members reference and examples.
Just type:
# Cheat-sheet for os namespace
./cex help os$
# Help for CEX errors
./cex help e$
# Help for some parts of the language for$ / arr$ / hm$
./cex help arr$
# Make your own cheat-sheet!
./cex help myproj_namespace$
Core namespace reference was fully generated from CEX build-in cheat-sheets and docstrings!
For making your own cheat-sheet place doxygen comment (/** multiline help */
) right before struct definition in the mynamespace.h
(assuming mynamespace
). Or add one before #define __mynamespace$
if you use macro-only namespace.
Real project search
// NOTE: you may get help for any available code in your project, not limited by cex.h
// >> ./cex help arr
./fuzz/CexParser/fuzz_cex_parser_corpus.out/cex_base.h:455
macro_func _arr$slice_get arr$ ./src/ds.h:140
macro_func ./src/ds.h:201
macro_func arr$at ./src/ds.h:172
macro_func arr$cap ./src/ds.h:169
macro_func arr$clear ./src/ds.h:175
macro_func arr$del ./src/ds.h:184
macro_func arr$delswap ./src/ds.h:163
macro_func arr$free ./src/ds.h:283
macro_func arr$grow ./src/ds.h:276
macro_func arr$grow_check ./src/ds.h:263
macro_func arr$ins ./src/ds.h:193
macro_func arr$last ./src/ds.h:301
macro_func arr$len ./src/ds.h:147
macro_func arr$new ./src/ds.h:209
macro_func arr$pop ./src/ds.h:217
macro_func arr$push ./src/ds.h:237
macro_func arr$pusha ./src/ds.h:227
macro_func arr$pushm ./src/ds.h:166
macro_func arr$setcap ./fuzz/CexParser/fuzz_cex_parser_corpus.out/cex_base.h:491
macro_func arr$slice ./src/ds.h:255
macro_func arr$sort ./examples/libs_vcpkg/build/vcpkg/packages/curl_x64-linux/include/curl/typecheck-gcc.h:472
macro_func curlcheck_arr ./lib/json/json.h:96
macro_func json$arr ./lib/json/json.h:82
macro_func json$karr ./examples/lua_module/build/lua/src/lgc.h:118
macro_func luaC_barrier ./examples/lua_module/build/lua/src/lgc.h:122
macro_func luaC_barrierback ./examples/lua_module/build/lua/src/lgc.h:126
macro_func luaC_objbarrier ./examples/lua_module/build/lua/src/lgc.h:130
macro_func luaC_upvalbarrier ./examples/lua_module/build/lua/src/lmem.h:43
macro_func luaM_freearray ./src/test.h:245
macro_func tassert_eq_arr ./src/ds.h:73
macro_const __arr$ ./examples/libs_vcpkg/build/sqlite-amalgamation-3490200/sqlite3.c:43856
macro_const unixShmBarrier ./examples/libs_vcpkg/build/sqlite-amalgamation-3490200/shell.c:9917
func_def apndShmBarrier ./examples/lua_module/build/lua/src/ltable.c:144
func_def arrayindex ./examples/lua_module/build/lua/src/ltable.c:259
func_def numusearray ./examples/libs_vcpkg/build/sqlite-amalgamation-3490200/shell.c:21195
func_def recoverVfsShmBarrier ./examples/lua_module/build/lua/src/ltable.c:301
func_def setarrayvector ./examples/libs_vcpkg/build/sqlite-amalgamation-3490200/sqlite3.c:30252
func_def sqlite3MemoryBarrier ./examples/libs_vcpkg/build/sqlite-amalgamation-3490200/sqlite3.c:26551
func_def sqlite3OsShmBarrier ./examples/libs_vcpkg/build/sqlite-amalgamation-3490200/shell.c:23732
func_def str_in_array ./examples/libs_vcpkg/build/sqlite-amalgamation-3490200/shell.c:17030
func_def vfstraceShmBarrier ./src/ds.h:99
func_decl _cexds__arr_integrity ./src/ds.h:100
func_decl _cexds__arr_len ./src/ds.h:98
func_decl _cexds__arrfreef ./src/ds.h:97
func_decl _cexds__arrgrowf ./examples/lua_module/build/lua/src/lgc.h:140
func_decl luaC_barrier_ ./examples/lua_module/build/lua/src/lgc.h:141
func_decl luaC_barrierback_ ./examples/lua_module/build/lua/src/lgc.h:142
func_decl luaC_upvalbarrier_ ./examples/lua_module/build/lua/src/ltable.h:54
func_decl luaH_resizearray typedef _cex_arr_slice ./fuzz/CexParser/fuzz_cex_parser_corpus.out/cex_base.h:448
typedef _cexds__arr_new_kwargs_s ./src/ds.h:142
typedef _cexds__array_header ./src/ds.h:130
Example roulette
If you need some use-case example for some function/symbol in your project you can test your luck and find random use-case of that function with the following:
// NOTE: in example mode you must provide full symbol name without wildcards
~ ➜ ./cex help --example str.find
./cex.h:13704
Found at Exception cexy__test__create(char* target, bool include_sample)
{
if (os.path.exists(target)) {
return e$raise(Error.exists, "Test file already exists: %s", target);
}
if (str.eq(target, "all") || str.find(target, "*")) {
return e$raise(
.argument,
Error"You must pass exact file path, not pattern, got: %s",
target);
}
...
}
Project management
Using ./cex
CLI you could seed new project or add/run/debug/clean apps/fuzz/tests in existing project.
Try ./cex app --help
, ./cex test --help
, ./cex fuzz --help
for more info. Also, this functionality may not work properly if you use custom build routines.
Creating
# Create new project from scratch + bootstraps ./cex cli + sample hello world project structure
./cex new new_dir_path
# Create new app for existing project as src/myapp/main.c
./cex app create myapp
# Create new unit test for existing project
./cex test create tests/test_file.c
# Create new fuzz test
./cex fuzz create fuzz/myapp/fuzz_bar.c
Running
./cex app run myapp --app-opt=1 app_arg1 app_arg2
./cex test run tests/test_file.c
./cex fuzz run fuzz/myapp/fuzz_bar.c
Debugging
./cex app debug myapp
./cex test debug tests/test_file.c
./cex fuzz debug fuzz/myapp/fuzz_bar.c
Core Namespace Reference
Getting help
All core namespaces in CEX have cheat-sheets with full members reference and examples.
Just type:
# Cheat-sheet for os namespace
./cex help os$
# Help for CEX errors
./cex help e$
# Help for some parts of the language for$ / arr$ / hm$
./cex help arr$
# Make your own cheat-sheet!
./cex help myproj_namespace$
argparse
- Command line args parsing
// NOTE: Command example
Exception cmd_build_docs(int argc, char** argv, void* user_ctx);
int
(int argc, char** argv)
main{
// clang-format off
= {
argparse_c args .description = "My description",
.usage = "Usage help",
.epilog = "Epilog text",
(
argparse$cmd_list{ .name = "build-docs", .func = cmd_build_docs, .help = "Build CEX documentation" },
),
};
if (argparse.parse(&args, argc, argv)) { return 1; }
if (argparse.run_command(&args, NULL)) { return 1; }
return 0;
}
Exception
(int argc, char** argv, void* user_ctx)
cmd_build_docs{
// Command handling func
}
- Parsing custom arguments
// Simple options example
int
(int argc, char** argv)
main{
bool force = 0;
bool test = 0;
int int_num = 0;
float flt_num = 0.f;
char* path = NULL;
char* usage = "basic [options] [[--] args]\n"
"basic [options]\n";
= {
argparse_c argparse (
argparse$opt_list(),
argparse$opt_help("Basic options"),
argparse$opt_group(&force, 'f', "force", "force to do"),
argparse$opt(&test, 't', "test", .help = "test only"),
argparse$opt(&path, 'p', "path", "path to read", .required = true),
argparse$opt("Another group"),
argparse$opt_group(&int_num, 'i', "int", "selected integer"),
argparse$opt(&flt_num, 's', "float", "selected float"),
argparse$opt),
// NOTE: usage/description are optional
.usage = usage,
.description = "\nA brief description of what the program does and how it works.",
"\nAdditional description of the program after the description of the arguments.",
};
if (argparse.parse(&args, argc, argv)) { return 1; }
// NOTE: all args are filled and parsed after this line
return 0;
}
/// holder for list of
#define argparse$cmd_list(...)
/// command line option record (generic type of arguments)
#define argparse$opt(value, ...)
/// options group separator
#define argparse$opt_group(h)
/// built-in option for -h,--help
#define argparse$opt_help()
/// holder for list of argparse$opt()
#define argparse$opt_list(...)
/// main argparse struct (used as options config)
typedef argparse_c
/// command settings type (prefer macros)
typedef argparse_cmd_s
/// command line options type (prefer macros)
typedef argparse_opt_s
{
argparse // Autogenerated by CEX
// clang-format off
char* (*next)(argparse_c* self);
Exception (*parse)(argparse_c* self, int argc, char** argv);
Exception (*run_command)(argparse_c* self, void* user_ctx);
void (*usage)(argparse_c* self);
// clang-format on
};
arr$
- Creating array
// Using heap allocator (need to free later!)
arr$(i32) array = arr$new(array, mem$);
// adding elements
(array, 1, 2, 3); // multiple at once
arr$pushm(array, 4); // single element
arr$push
// length of array
(array);
arr$len
// getting i-th elements
[1];
array
// iterating array (by value)
for$each(it, array) {
.printf("el=%d\n", it);
io}
// iterating array (by pointer - prefer for bigger structs to avoid copying)
(it, array) {
for$eachp// TIP: making array index out of `it`
= it - array;
usize i
// NOTE: 'it' now is a pointer
.printf("el[%zu]=%d\n", i, *it);
io}
// free resources
(array); arr$free
- Array of structs
typedef struct
{
int key;
float my_val;
char* my_string;
int value;
} my_struct;
void somefunc(void)
{
arr$(my_struct) array = arr$new(array, mem$, .capacity = 128);
uassert(arr$cap(array), 128);
;
my_struct s= (my_struct){ 20, 5.0, "hello ", 0 };
s (array, s);
arr$push= (my_struct){ 40, 2.5, "failure", 0 };
s (array, s);
arr$push= (my_struct){ 40, 1.1, "world!", 0 };
s (array, s);
arr$push
for (usize i = 0; i < arr$len(array); ++i) {
.printf("key: %d str: %s\n", array[i].key, array[i].my_string);
io}
(array);
arr$free
return EOK;
}
/// Generic array type definition. Use arr$(int) myarr - defines new myarr variable, as int array
#define arr$(T)
/// Get element at index (bounds checking with uassert())
#define arr$at(a, i)
/// Returns current array capacity
#define arr$cap(a)
/// Clear array contents
#define arr$clear(a)
/// Delete array elements by index (memory will be shifted, order preserved)
#define arr$del(a, i)
/// Delete element by swapping with last one (no memory overhear, element order changes)
#define arr$delswap(a, i)
/// Free resources for dynamic array (only needed if mem$ allocator was used)
#define arr$free(a)
/// Grows array capacity
#define arr$grow(a, add_len, min_cap)
/// Check array capacity and return false on memory error
#define arr$grow_check(a, add_extra)
/// Inserts element into array at index `i`
#define arr$ins(a, i, value...)
/// Return last element of array
#define arr$last(a)
/// Versatile array length, works with dynamic (arr$) and static compile time arrays
#define arr$len(arr)
/// Array initialization: use arr$(int) arr = arr$new(arr, mem$, .capacity = , ...)
#define arr$new(a, allocator, kwargs...)
/// Pop element from the end
#define arr$pop(a)
/// Push element to the end
#define arr$push(a, value...)
/// Push another array into a. array can be dynamic or static or pointer+len
#define arr$pusha(a, array, array_len...)
/// Push many elements to the end
#define arr$pushm(a, items...)
/// Set array capacity and resize if needed
#define arr$setcap(a, n)
/// Sort array with qsort() libc function
#define arr$sort(a, qsort_cmp)
cexy
/// Build dir for project executables and tests (may be overridden by user)
#define cexy$build_dir
/// Extension for executables (e.g. '.exe' for win32)
#define cexy$build_ext_exe
/// Extension for dynamic linked libs (".dll" win, ".so" linux)
#define cexy$build_ext_lib_dyn
/// Extension for static libs (".lib" win, ".a" linux)
#define cexy$build_ext_lib_stat
/// Default compiler for building tests/apps (by default inferred from ./cex tool compiler)
#define cexy$cc
/// Common compiler flags (may be overridden by user)
#define cexy$cc_args
/// Debug mode and tests sanitizer flags (may be overridden by user)
#define cexy$cc_args_sanitizer
/// Test runner compiler flags (may be overridden by user)
#define cexy$cc_args_test
/// Include path for the #include "some.h" (may be overridden by user)
#define cexy$cc_include
/// Compiler flags used for building ./cex.c -> ./cex (may be overridden by user)
#define cexy$cex_self_args
/// Macro constant derived from the compiler type used to initially build ./cex app
#define cexy$cex_self_cc
/// All built-in commands for ./cex tool
#define cexy$cmd_all
/// Simple app build command (unity build, simple linking, runner, debugger launch, etc)
#define cexy$cmd_app
/// Simple fuzz tests runner command
#define cexy$cmd_fuzz
/// Simple test runner command (test runner, debugger launch, etc)
#define cexy$cmd_test
/// Command for launching debugger for cex test/app debug (may be overridden)
#define cexy$debug_cmd
/// ./cex --help description
#define cexy$description
/// ./cex --help epilog
#define cexy$epilog
/// Fuzzer compilation command (supports clang libfuzzer and afl++)
#define cexy$fuzzer
/// Initialize CEX build system (build itself)
#define cexy$initialize()
/// Linker flags (e.g. -L./lib/path/ -lmylib -lm) (may be overridden)
#define cexy$ld_args
/// Helper macro for running cexy.utils.pkgconf() a dependency resolver for libs
#define cexy$pkgconf(allocator, out_cc_args, pkgconf_args...)
/// Dependency resolver command: pkg-config, pkgconf, etc. May be used in cross-platform
/// compilation, allowed multiple command arguments here
#define cexy$pkgconf_cmd
/// list of standard system project libs (for example: "lua5.3", "libz")
#define cexy$pkgconf_libs
/// Pattern for ignoring extra macro keywords in function signatures (for cex process).
#define cexy$process_ignore_kw
/// Directory for applications and code (may be overridden by user)
#define cexy$src_dir
/// ./cex --help usage
#define cexy$usage
/// Current vcpkg root path (where ./vcpkg tool is located)
#define cexy$vcpkg_root
/// Current build triplet (empty, NULL, or string like "x64-linux")
/// if you are using `vcpkg install mydep`, ignored if blank or NULL,
/// list of all supported triplets is here: `vcpkg help triplet`)
#define cexy$vcpkg_triplet
{
cexy // Autogenerated by CEX
// clang-format off
void (*build_self)(int argc, char** argv, char* cex_source);
bool (*src_changed)(char* target_path, char** src_array, usize src_array_len);
bool (*src_include_changed)(char* target_path, char* src_path, arr$(char*) alt_include_path);
char* (*target_make)(char* src_path, char* build_dir, char* name_or_extension, IAllocator allocator);
struct {
Exception (*clean)(char* target);
Exception (*create)(char* target);
Exception (*find_app_target_src)(IAllocator allc, char* target, char** out_result);
Exception (*run)(char* target, bool is_debug, int argc, char** argv);
} app;
struct {
Exception (*config)(int argc, char** argv, void* user_ctx);
Exception (*help)(int argc, char** argv, void* user_ctx);
Exception (*libfetch)(int argc, char** argv, void* user_ctx);
Exception (*new)(int argc, char** argv, void* user_ctx);
Exception (*process)(int argc, char** argv, void* user_ctx);
Exception (*simple_app)(int argc, char** argv, void* user_ctx);
Exception (*simple_fuzz)(int argc, char** argv, void* user_ctx);
Exception (*simple_test)(int argc, char** argv, void* user_ctx);
Exception (*stats)(int argc, char** argv, void* user_ctx);
} cmd;
struct {
Exception (*create)(char* target);
} fuzz;
struct {
Exception (*clean)(char* target);
Exception (*create)(char* target, bool include_sample);
Exception (*make_target_pattern)(char** target);
Exception (*run)(char* target, bool is_debug, int argc, char** argv);
} test;
struct {
char* (*git_hash)(IAllocator allc);
Exception (*git_lib_fetch)(char* git_url, char* git_label, char* out_dir, bool update_existing, bool preserve_dirs, char** repo_paths, usize repo_paths_len);
Exception (*make_compile_flags)(char* flags_file, bool include_cexy_flags, arr$(char*) cc_flags_or_null);
Exception (*make_new_project)(char* proj_dir);
Exception (*pkgconf)(IAllocator allc, arr$(char*)* out_cc_args, char** pkgconf_args, usize pkgconf_args_len);
} utils;
// clang-format on
};
cg$
- Code generation module
(test_codegen_test)
test$case{
= sbuf.create(1024, mem$);
sbuf_c b // NOTE: cg$ macros should be working within cg$init() scope or make sure cg$var is available
(&b);
cg$init
(cg$var->buf == &b);
tassert(cg$var->indent == 0);
tassert
("printf(\"hello world\");");
cg$pn("#define GOO");
cg$pn("// this is empty scope");
cg$pn("", "")
cg$scope{
("printf(\"hello world: %d\");", 2);
cg$pf}
("void my_func(int arg_%d)", 2)
cg$func{
("var my_var = (mytype)", "")
cg$scope{
(".arg1 = %d,", 1);
cg$pf(".arg2 = %d,", 2);
cg$pf}
(";\n", "");
cg$pa
("foo == %d", 312)
cg$if{
("printf(\"Hello: %d\", foo);");
cg$pn}
("bar == foo + %d", 7)
cg$elseif{
("// else if scope");
cg$pn}
()
cg$else{
("// else scope");
cg$pn}
("foo == %d", 312)
cg$while{
("printf(\"Hello: %d\", foo);");
cg$pn}
("u32 i = 0; i < %d; i++", 312)
cg$for{
("printf(\"Hello: %d\", foo);");
cg$pn("it, my_var", "")
cg$foreach{
("printf(\"Hello: %d\", foo);");
cg$pn}
}
("do ", "")
cg$scope{
("// do while", 1);
cg$pf}
(" while(0);\n", "");
cg$pa}
("foo", "")
cg$switch{
("'%c'", 'a')
cg$case{
("// case scope");
cg$pn}
("case '%c': ", 'b')
cg$scope{
("fallthrough();");
cg$pn}
()
cg$default{
("// default scope");
cg$pn}
}
(cg$is_valid());
tassert
("result: \n%s\n", b);
printf
.destroy(&b);
sbufreturn EOK;
}
/// add case in switch() statement
#define cg$case(format, ...)
/// decrease code indent by 4
#define cg$dedent()
/// add default in switch() statement
#define cg$default()
/// add else
#define cg$else()
/// add else if
#define cg$elseif(format, ...)
/// add for loop
#define cg$for(format, ...)
/// add CEX for$each loop
#define cg$foreach(format, ...)
/// add new function cg$func("void my_func(int arg_%d)", 2)
#define cg$func(format, ...)
/// add if statement
#define cg$if(format, ...)
/// increase code indent by 4
#define cg$indent()
/// Initializes new code generator (uses sbuf instance as backing buffer)
#define cg$init(out_sbuf)
/// false if any cg$ operation failed, use cg$var->error to get Exception type of error
#define cg$is_valid()
/// append code at the current line without "\n"
#define cg$pa(format, ...)
/// add new line of code with formatting
#define cg$pf(format, ...)
/// add new line of code
#define cg$pn(text)
#define cg$printva(cg)
/// add new code scope with indent (use for low-level stuff)
#define cg$scope(format, ...)
/// add switch() statement
#define cg$switch(format, ...)
/// Common code gen buffer variable (all cg$ macros use it under the hood)
#define cg$var
/// add while loop
#define cg$while(format, ...)
e$
CEX Error handling cheat sheet:
- Errors can be any
char*
, or string literals. - EOK / Error.ok - is NULL, means no error
- Exception return type forced to be checked by compiler
- Error is built-in generic error type
- Errors should be checked by pointer comparison, not string contents.
e$
are helper macros for error handling- DO NOT USE break/continue inside e$except/e$except_* scopes (these macros are for loops too)!
Generic errors:
.ok = EOK; // Success
Error.memory = "MemoryError"; // memory allocation error
Error.io = "IOError"; // IO error
Error.overflow = "OverflowError"; // buffer overflow
Error.argument = "ArgumentError"; // function argument error
Error.integrity = "IntegrityError"; // data integrity error
Error.exists = "ExistsError"; // entity or key already exists
Error.not_found = "NotFoundError"; // entity or key already exists
Error.skip = "ShouldBeSkipped"; // NOT an error, function result must be skipped
Error.empty = "EmptyError"; // resource is empty
Error.eof = "EOF"; // end of file reached
Error.argsparse = "ProgramArgsError"; // program arguments empty or incorrect
Error.runtime = "RuntimeError"; // generic runtime error
Error.assert = "AssertError"; // generic runtime check
Error.os = "OSError"; // generic OS check
Error.timeout = "TimeoutError"; // await interval timeout
Error.permission = "PermissionError"; // Permission denied
Error.try_again = "TryAgainError"; // EAGAIN / EWOULDBLOCK errno analog for async operations Error
Exception
(char* path)
remove_file{
if (path == NULL || path[0] == '\0') {
return Error.argument; // Empty of null file
}
if (!os.path.exists(path)) {
return "Not exists" // literal error are allowed, but must be handled as strcmp()
}
if (str.eq(path, "magic.file")) {
// Returns an Error.integrity and logs error at current line to stdout
return e$raise(Error.integrity, "Removing magic file is not allowed!");
}
if (remove(path) < 0) {
return strerror(errno); // using system error text (arbitrary!)
}
return EOK;
}
Exception read_file(char* filename) {
e$assert(buff != NULL);
int fd = 0;
(fd = open(filename, O_RDONLY)) { return Error.os; }
e$except_errnoreturn EOK;
}
Exception do_stuff(char* filename) {
// return immediately with error + prints traceback
e$ret(read_file("foo.txt"));
// jumps to label if read_file() fails + prints traceback
e$goto(read_file(NULL), fail);
// silent error handing without tracebacks
(err, foo(0)) {
e$except_silent
// Nesting of error handlers is allowed
(err, foo(2)) {
e$except_silent return err;
}
// NOTE: `err` is address of char* compared with address Error.os (not by string contents!)
if (err == Error.os) {
// Special handing
.print("Ooops OS problem\n");
io} else {
// propagate
return err;
}
}
return EOK;
:
fail// TODO: cleanup here
return Error.io;
}
/// Non disposable assert, returns Error.assert CEX exception when failed
#define e$assert(A)
/// Non disposable assert, returns Error.assert CEX exception when failed (supports formatting)
#define e$assertf(A, format, ...)
/// catches the error of function inside scope + prints traceback
#define e$except(_var_name, _func)
/// catches the error of system function (if negative value + errno), prints errno error
#define e$except_errno(_expression)
/// catches the error is expression returned null
#define e$except_null(_expression)
/// catches the error of function inside scope (without traceback)
#define e$except_silent(_var_name, _func)
/// catches the error is expression returned true
#define e$except_true(_expression)
/// `goto _label` when _func returned error + prints traceback
#define e$goto(_func, _label)
/// raises an error, code: `return e$raise(Error.integrity, "ooops: %d", i);`
#define e$raise(return_uerr, error_msg, ...)
/// immediately returns from function with _func error + prints traceback
#define e$ret(_func)
for$
- using for$ as unified array iterator
(test_array_iteration)
test$case{
arr$(int) array = arr$new(array, mem$);
(array, 1, 2, 3);
arr$pushm
for$each(it, array) {
.printf("el=%d\n", it);
io}
// Prints:
// el=1
// el=2
// el=3
// NOTE: prefer this when you work with bigger structs to avoid extra memory copying
(it, array) {
for$eachp// TIP: making array index out of `it`
= it - array;
usize i
// NOTE: it now is a pointer
.printf("el[%zu]=%d\n", i, *it);
io}
// Prints:
// el[0]=1
// el[1]=2
// el[2]=3
// Static arrays work as well (arr$len inferred)
[] = {1, 2, 3, 4, 5};
i32 arr_intfor$each(it, arr_int) {
.printf("static=%d\n", it);
io}
// Prints:
// static=1
// static=2
// static=3
// static=4
// static=5
// Simple pointer+length also works (let's do a slice)
* slice = &arr_int[2];
i32for$each(it, slice, 2) {
.printf("slice=%d\n", it);
io}
// Prints:
// slice=3
// slice=4
// it is type of cex_iterator_s
// NOTE: run in shell: ➜ ./cex help cex_iterator_s
= str.sstr("123,456");
s (str_s, it, str.slice.iter_split(s, ",", &it.iterator)) {
for$iter .printf("it.value = %S\n", it.val);
io}
// Prints:
// it.value = 123
// it.value = 456
(array);
arr$freereturn EOK;
}
/// Iterates over arrays `it` is iterated **value**, array may be arr$/or static / or pointer,
/// array_len is only required for pointer+len use case
#define for$each(it, array, array_len...)
/// Iterates over arrays `it` is iterated by **pointer**, array may be arr$/or static / or pointer,
/// array_len is only required for pointer+len use case
#define for$eachp(it, array, array_len...)
/// Iterates via iterator function (see usage below)
#define for$iter(it_val_type, it, iter_func)
fuzz$
- Fuzz runner commands
./cex fuzz create fuzz/myapp/fuzz_bar.c
./cex fuzz run fuzz/myapp/fuzz_bar.c
- Fuzz testing tools using
fuzz
namespace
int
(const u8* data, usize size)
fuzz$case{
= fuzz.create(data, size);
cex_fuzz_s fz = 0;
u16 random_val = {0};
some_struct random_struct
while(fuzz.dget(&fz, &random_val, sizeof(random_val))) {
// testing function with random data
(random_val);
my_func
// checking probability based on fuzz data
if (fuzz.dprob(&fz, 0.2)) {
(random_val * 10);
my_func}
if (fuzz.dget(&fz, &random_struct, sizeof(random_struct))){
(&random_struct);
my_func_struct}
}
}
- Fuzz testing tools using
fuzz$
macros (shortcuts)
int
(const u8* data, usize size)
fuzz$case{
(data, size);
fuzz$dnew
= 0;
u16 random_val = {0};
some_struct random_struct
while(fuzz$dget(&random_val)) {
// testing function with random data
(random_val);
my_func
// checking probability based on fuzz data
if (fuzz$dprob(0.2)) {
(random_val * 10);
my_func}
// it's possible to fill whole structs with data
if (fuzz$dget(&random_struct)){
(&random_struct);
my_func_struct}
}
}
- Fuzz corpus priming (it’s optional step, but useful)
typedef struct fuzz_match_s
{
char pattern[100];
char null_term;
char text[300];
char null_term2;
} fuzz_match_s;
Exception
(char* out_file, char* text, char* pattern)
match_make{
= { 0 };
fuzz_match_s f e$ret(str.copy(f.text, text, sizeof(f.text)));
e$ret(str.copy(f.pattern, pattern, sizeof(f.pattern)));
FILE* fh;
e$ret(io.fopen(&fh, out_file, "wb"));
e$ret(io.fwrite(fh, &f, sizeof(f)));
.fclose(&fh);
io
return EOK;
}
()
fuzz$setup{
if (os.fs.mkdir(fuzz$corpus_dir)) {}
struct
{
char* text;
char* pattern;
} match_tuple[] = {
{ "test", "*" },
{ "", "*" },
{ ".txt", "*.txt" },
{ "test.txt", "" },
{ "test.txt", "*txt" },
};
mem$scope(tmem$, _)
{
for (u32 i = 0; i < arr$len(match_tuple); i++) {
char* fn = str.fmt(_, "%s/%05d", fuzz$corpus_dir, i);
(err, match_make(fn, match_tuple[i].text, match_tuple[i].pattern)) {
e$except (false, "Error writing file: %s", fn);
uassertf}
}
}
}
/// Fuzz case: ``int fuzz$case(const u8* data, usize size) { return 0;}
#define fuzz$case
/// Current fuzz_ file corpus directory relative to calling source file
#define fuzz$corpus_dir
/// Load random data into variable by pointer from random fuzz data
#define fuzz$dget(out_result_ptr)
/// Initialize fuzz$ helper macros
#define fuzz$dnew(data, size)
/// Get deterministic probability based on fuzz data
#define fuzz$dprob(prob_threshold)
/// Special fuzz variable used by all fuzz$ macros
#define fuzz$dvar
/// Fuzz main function
#define fuzz$main()
/// Fuzz test constructor (for building corpus seeds programmatically)
#define fuzz$setup
{
fuzz // Autogenerated by CEX
// clang-format off
/// Get current corpus dir relative tho the `this_file_name`
char* (*corpus_dir)(char* this_file_name);
/// Creates new fuzz data generator, for fuzz-driven randomization
(*create)(const u8* data, usize size);
cex_fuzz_s /// Get result from random data into buffer (returns false if not enough data)
bool (*dget)(cex_fuzz_s* fz, void* out_result, usize result_size);
/// Get deterministic probability using fuzz data, based on threshold
bool (*dprob)(cex_fuzz_s* fz, double threshold);
// clang-format on
};
hm$
Generic type-safe hashmap
Principles:
- Data is backed by engine similar to arr$
arr$len()
works with hashmap too- Array indexing works with hashmap
for$each
/for$eachp
is applicablehm$
generic type is essentially a struct withkey
andvalue
fieldshm$
supports following keys: numeric (by default it’s just binary representation), char*, char[N], str_s (CEX sting slice).hm$
with string keys are stored without copy, usehm$new(hm, mem$, .copy_keys = true)
for copy-mode.hm$
can store string keys inside an Arena allocator whenhm$new(hm, mem$, .copy_keys = true, .copy_keys_arena_pgsize = NNN)
(test_simple_hashmap)
test$case{
hm$(int, int) intmap = hm$new(intmap, mem$);
// Setting items
(intmap, 15, 7);
hm$set(intmap, 11, 3);
hm$set(intmap, 9, 5);
hm$set
// Length
(hm$len(intmap), 3);
tassert_eq(arr$len(intmap), 3);
tassert_eq
// Getting items **by value**
(hm$get(intmap, 9) == 5);
tassert(hm$get(intmap, 11) == 3);
tassert(hm$get(intmap, 15) == 7);
tassert
// Getting items **pointer** - NULL on missing
(hm$getp(intmap, 1) == NULL);
tassert
// Getting with default if not found
(hm$get(intmap64, -1, 999), 999);
tassert_eq
// Accessing hashmap as array by i-th index
// NOTE: hashmap elements are ordered until first deletion
(intmap[0].key, 1);
tassert_eq(intmap[0].value, 3);
tassert_eq
// removing items
(intmap, 100);
hm$del
// cleanup
(intmap);
hm$clear
// basic iteration **by value**
for$each (it, intmap) {
.printf("key=%d, value=%d\n", it.key, it.value);
io}
// basic iteration **by pointer**
for$each (it, intmap) {
.printf("key=%d, value=%d\n", it->key, it->value);
io}
(intmap);
hm$free}
- Using hashmap as field of other struct
typedef hm$(char* , int) MyHashmap;
struct my_hm_struct {
;
MyHashmap hm};
(test_hashmap_string_copy_clear_cleanup)
test$case{
struct my_hm_struct hs = {0};
// NOTE: .copy_keys - makes sure that key string was copied
(hs.hm, mem$, .copy_keys = true);
hm$new(hs.hm, "foo", 3);
hm$set}
- Storing string values in the arena
(test_hashmap_string_copy_arena)
test$case{
hm$(char*, int) smap = hm$new(smap, mem$, .copy_keys = true, .copy_keys_arena_pgsize = 1024);
char key2[10] = "foo";
(smap, key2, 3);
hm$set(hm$len(smap), 1);
tassert_eq(hm$get(smap, "foo"), 3);
tassert_eq(hm$get(smap, key2), 3);
tassert_eq(smap[0].key, "foo");
tassert_eq
(key2, 0, sizeof(key2));
memset(smap[0].key, "foo");
tassert_eq(hm$get(smap, "foo"), 3);
tassert_eq
(smap);
hm$freereturn EOK;
}
- Checking errors + custom struct backing
(test_hashmap_basic)
test$case{
hm$(int, int) intmap;
if(hm$new(intmap, mem$) == NULL) {
// initialization error
}
// struct as a value
struct test64_s
{
;
usize foo;
usize bar};
hm$(int, struct test64_s) intmap = hm$new(intmap, mem$);
// custom struct as hashmap backend
struct test64_s
{
;
usize fooa; // this field `key` is mandatory
usize key};
(struct test64_s) smap = hm$new(smap, mem$);
hm$s(smap != NULL);
tassert
// Setting hashmap as a whole struct key/value record
(hm$sets(smap, (struct test64_s){ .key = 1, .fooa = 10 }));
tassert(hm$len(smap), 1);
tassert_eq(smap[0].key, 1);
tassert_eq(smap[0].fooa, 10);
tassert_eq
// Getting full struct by .key value
struct test64_s* r = hm$gets(smap, 1);
(r != NULL);
tassert(r == &smap[0]);
tassert(r->key, 1);
tassert_eq(r->fooa, 10);
tassert_eq
}
/// Defines hashmap generic type
#define hm$(_KeyType, _ValType)
/// Clears hashmap contents
#define hm$clear(t)
/// Deletes items, IMPORTANT hashmap array may be reordered after this call
#define hm$del(t, k)
/// Frees hashmap resources
#define hm$free(t)
/// Get item by value, def - default value (zeroed by default), can be any type
#define hm$get(t, k, def...)
/// Get item by pointer (no copy, direct pointer inside hashmap)
#define hm$getp(t, k)
/// Get a pointer to full hashmap record, NULL if not found
#define hm$gets(t, k)
/// Returns hashmap length, also you can use arr$len()
#define hm$len(t)
/// Creates new hashmap of hm$(KType, VType) using allocator, kwargs: .capacity, .seed,
/// .copy_keys_arena_pgsize, .copy_keys
#define hm$new(t, allocator, kwargs...)
/// Defines hashmap type based on _StructType, must have `key` field
#define hm$s(_StructType)
/// Set hashmap key/value, replaces if exists
#define hm$set(t, k, v...)
/// Add new item and returns pointer of hashmap record for `k`, for further editing
#define hm$setp(t, k)
/// Set full record, must be initialized by user
#define hm$sets(t, v...)
io
Cross-platform IO namespace
- Read all file content (low level api)
(test_readall)
test$case{
// Open new file
FILE* file;
e$ret(io.fopen(&file, "tests/data/text_file_50b.txt", "r"));
// get file size
(50, io.file.size(file));
tassert_eq
// Read all content
;
str_s contente$ret(io.fread_all(file, &content, mem$));
(mem$, content.buf); // content.buf is allocated by mem$ !
mem$free
// Cleanup
.fclose(&file); // file will be set to NULL
io(file == NULL);
tassert
return EOK;
}
- File load/save (easy api)
(test_fload_save)
test$case{
(Error.ok, io.file.save("tests/data/text_file_write.txt", "Hello from CEX!\n"));
tassert_eqchar* content = io.file.load("tests/data/text_file_write.txt", mem$);
(content);
tassert(content, "Hello from CEX!\n");
tassert_eq(mem$, content);
mem$freereturn EOK;
}
- File read/write lines
(test_write_line)
test$case{
FILE* file;
(Error.ok, io.fopen(&file, "tests/data/text_file_write.txt", "w+"));
tassert_eq
;
str_s contentmem$scope(tmem$, _)
{
// Writing line by line
(EOK, io.file.writeln(file, "hello"));
tassert_eq(EOK, io.file.writeln(file, "world"));
tassert_eq
// Reading line by line
.rewind(file);
io
// easy api - backed by temp allocator
("hello", io.file.readln(file, _));
tassert_eq
// low-level api (using heap allocator, needs free!)
(EOK, io.fread_line(file, &content, mem$));
tassert_er(str.slice.eq(content, str$s("world")));
tassert(mem$, content.buf);
mem$free}
.fclose(&file);
ioreturn EOK;
}
- File low-level write/read
(test_read_loop)
test$case{
FILE* file;
(Error.ok, io.fopen(&file, "tests/data/text_file_50b.txt", "r+"));
tassert_eq
char buf[128] = {0};
// Read bytes
= 0;
isize nread while((nread = io.fread(file, buf, 10))) {
if (nread < 0) {
// TODO: io.fread() error occured, you should handle it here
// NOTE: you can use os.get_last_error() for Exception representation of io.fread() err
break;
}
(nread, 10);
tassert_eq[10] = '\0';
buf.printf("%s", buf);
io}
// Write bytes
char buf2[] = "foobar";
(EOK, io.fwrite(file, buf2, arr$len(buf2)));
tassert_ne
.fclose(&file);
ioreturn EOK;
}
/// Makes string literal with ansi colored test
#define io$ansi(text, ansi_col)
{
io // Autogenerated by CEX
// clang-format off
/// Closes file and set it to NULL.
void (*fclose)(FILE** file);
/// Flush changes to file
Exception (*fflush)(FILE* file);
/// Obtain file descriptor from FILE*
int (*fileno)(FILE* file);
/// Opens new file: io.fopen(&file, "file.txt", "r+")
Exception (*fopen)(FILE** file, char* filename, char* mode);
/// Prints formatted string to the file. Uses CEX printf() engine with special formatting.
Exc (*fprintf)(FILE* stream, char* format,...);
/// Read file contents into the buf, return nbytes read (can be < buff_len), 0 on EOF, negative on
/// error (you may use os.get_last_error() for getting Exception for error, cross-platform )
(*fread)(FILE* file, void* buff, usize buff_len);
isize /// Read all contents of the file, using allocator. You should free `s.buf` after.
Exception (*fread_all)(FILE* file, str_s* s, IAllocator allc);
/// Reads line from a file into str_s buffer, allocates memory. You should free `s.buf` after.
Exception (*fread_line)(FILE* file, str_s* s, IAllocator allc);
/// Seek file position
Exception (*fseek)(FILE* file, long offset, int whence);
/// Returns current cursor position into `size` pointer
Exception (*ftell)(FILE* file, usize* size);
/// Writes bytes to the file
Exception (*fwrite)(FILE* file, void* buff, usize buff_len);
/// Check if current file supports ANSI colors and in interactive terminal mode
bool (*isatty)(FILE* file);
/// Prints formatted string to stdout. Uses CEX printf() engine with special formatting.
int (*printf)(char* format,...);
/// Rewind file cursor at the beginning
void (*rewind)(FILE* file);
struct {
/// Load full contents of the file at `path`, using text mode. Returns NULL on error.
char* (*load)(char* path, IAllocator allc);
/// Reads line from file, allocates result. Returns NULL on error.
char* (*readln)(FILE* file, IAllocator allc);
/// Saves full `contents` in the file at `path`, using text mode.
Exception (*save)(char* path, char* contents);
/// Return full file size, always 0 for NULL file or atty
(*size)(FILE* file);
usize /// Writes new line to the file
Exception (*writeln)(FILE* file, char* line);
} file;
// clang-format on
};
log$
Simple console logging engine:
- Prints file:line + log type:
[INFO] ( file.c:14 cexy_fun() ) Message format: ./cex
- Supports CEX formatting engine
- Can be regulated using compile time level, e.g.
#define CEX_LOG_LVL 4
Log levels (CEX_LOG_LVL value):
- 0 - mute all including assert messages, tracebacks, errors
- 1 - allow log$error + assert messages, tracebacks
- 2 - allow log$warn
- 3 - allow log$info
- 4 - allow log$debug (default level if CEX_LOG_LVL is not set)
- 5 - allow log$trace
/// Log debug (when CEX_LOG_LVL > 3)
#define log$debug(format, ...)
/// Log error (when CEX_LOG_LVL > 0)
#define log$error(format, ...)
/// Log info (when CEX_LOG_LVL > 2)
#define log$info(format, ...)
/// Log tace (when CEX_LOG_LVL > 4)
#define log$trace(format, ...)
/// Log warning (when CEX_LOG_LVL > 1)
#define log$warn(format, ...)
mem$
Mem cheat-sheet
Global allocators:
mem$
- heap based allocator, typically used for long-living data, requires explicit mem$freetmem$
- temporary allocator, based by ArenaAllocator, with 256kb page, requiresmem$scope
Memory management hints:
- If function accept IAllocator as argument, it allocates memory
- If class/object accept IAllocator in constructor it should track allocator’s instance
mem$scope()
- automatically free memory at scope exit by any reason (return
,goto
out,break
)- consider
mem$malloc/mem$calloc/mem$realloc/mem$free/mem$new
- You can init arena scope with
mem$arena(page_size, arena_var_name)
- AllocatorArena grows dynamically if there is no room in existing page, but be careful when you use many
realloc()
, it can grow arenas unexpectedly large. - Use temp allocator as
mem$scope(tmem$, _) {}
it’s a common CEX pattern,_
istmem$
short-alias - Nested
mem$scope
are allowed, but memory freed at nested scope exit. NOTE: don’t share pointers across scopes. - Use address sanitizers as often as possible
Examples:
- Vanilla heap allocator
(test_allocator_api)
test$case{
u8* p = mem$malloc(mem$, 100);
(p != NULL);
tassert
// mem$free always nullifies pointer
(mem$, p);
mem$free(p == NULL);
tassert
= mem$calloc(mem$, 100, 100, 32); // malloc with 32-byte alignment
p (p != NULL);
tassert
// Allocates new ZII struct based on given type
auto my_item = mem$new(mem$, struct my_type_s);
return EOK;
}
- Temporary memory scope
mem$scope(tmem$, _)
{
arr$(char*) incl_path = arr$new(incl_path, _);
for$each (p, alt_include_path) {
(incl_path, p);
arr$pushif (!os.path.exists(p)) { log$warn("alt_include_path not exists: %s\n", p); }
}
}
- Arena Scope
(4096, arena)
mem$arena{
// This needs extra page
u8* p2 = mem$malloc(arena, 10040);
mem$scope(arena, tal)
{
u8* p3 = mem$malloc(tal, 100);
}
}
- Arena Instance
= AllocatorArena.create(4096);
IAllocator arena
u8* p = mem$malloc(arena, 100); // direct use allowed
mem$scope(arena, tal)
{
// NOTE: this scope will be freed after exit
u8* p2 = mem$malloc(tal, 100000);
mem$scope(arena, tal)
{
u8* p3 = mem$malloc(tal, 100);
}
}
.destroy(arena); AllocatorArena
/// General purpose heap allocator
#define mem$
/// Gets address of struct member
#define mem$addressof(typevar, value)
/// Checks if pointer address of `p` is aligned to `alignment`
#define mem$aligned_pointer(p, alignment)
/// Rounds `size` to the closest alignment
#define mem$aligned_round(size, alignment)
/// Creates new ArenaAllocator instance in scope, frees it at scope exit
#define mem$arena(page_size, allc_var)
/// true - if program was compiled with address sanitizer support
#define mem$asan_enabled()
/// Poisons memory region with ASAN, or fill it with 0xf7 byte pattern (no ASAN)
#define mem$asan_poison(addr, size)
/// Check if previously poisoned address is consistent, and 0x7f pattern not overwritten (no ASAN)
#define mem$asan_poison_check(addr, size)
/// Unpoisons memory region with ASAN, or fill it with 0x00 byte pattern (no ASAN)
#define mem$asan_unpoison(addr, size)
/// Allocate zero initialized chunk of memory using `allocator`
#define mem$calloc(allocator, nmemb, size, alignment...)
/// Free previously allocated chunk of memory, `ptr` implicitly set to NULL
#define mem$free(allocator, ptr)
/// Checks if `s` value is power of 2
#define mem$is_power_of2(s)
/// Allocate uninitialized chunk of memory using `allocator`
#define mem$malloc(allocator, size, alignment...)
/// Allocates generic type instance using `allocator`, result is zero filled, size and alignment
/// derived from type T
#define mem$new(allocator, T)
/// Gets offset in bytes of struct member
#define mem$offsetof(var, field)
/// Returns 32 for 32-bit platform, or 64 for 64-bit platform
#define mem$platform()
/// Reallocate chunk of memory using `allocator`
#define mem$realloc(allocator, old_ptr, size, alignment...)
/// Opens new memory scope using Arena-like allocator, frees all memory after scope exit
#define mem$scope(allocator, allc_var)
os
Cross-platform OS related operations:
os.cmd.
- for running commands and interacting with themos.fs.
- file-system related tasksos.env.
- getting setting environment variableos.path.
- file path operationsos.platform.
- information about current platform
Examples:
- Running simple commands
// NOTE: there are many operation with os-related stuff in cexy build system
// try to play in example roulette: ./cex help --example os.cmd.run
// Easy macro, run fixed number of arguments
e$ret(os$cmd(
,
cexy$cc"src/main.c",
"/sqlite3.o",
cexy$build_dir ,
cexy$cc_include"-lpthread",
"-lm",
"-o",
"/hello_sqlite"
cexy$build_dir ));
- Running dynamic arguments
mem$scope(tmem$, _)
{
arr$(char*) args = arr$new(args, _);
(
arr$pushm,
args,
cexy$cc"-Wall",
"-Werror",
);
if (os.platform.current() == OSPlatform__win) {
(args, "-lbcrypt");
arr$pushm}
(args, NULL); // NOTE: last element must be NULL
arr$pushme$ret(os$cmda(args));
}
}
- Getting command output (low level api)
(os_cmd_create)
test$case{
= { 0 };
os_cmd_c c mem$scope(tmem$, _)
{
char* args[] = { "./cex", NULL };
(EOK, os.cmd.create(&c, args, arr$len(args), NULL));
tassert_er
char* output = os.cmd.read_all(&c, _);
(output != NULL);
tassert.printf("%s\n", output);
io
int err_code = 0;
(Error.runtime, os.cmd.join(&c, 0, &err_code));
tassert_er(err_code, 1);
tassert_eq}
return EOK;
}
- Working with files
(test_os_find_all_c_files)
test$case{
mem$scope(tmem$, _)
{
// Check if exists and remove
if (os.path.exists("./cex")) { e$ret(os.fs.remove("./cex")); }
// illustration of path combining
char* pattern = os$path_join(_, "./", "*.c");
// find all matching *.c files
for$each (it, os.fs.find(pattern, _), false , _)) {
("found file: %s\n", it);
log$debug}
}
return EOK;
}
/// OS path separator, generally '\' for Windows, '/' otherwise
#define os$PATH_SEP
/// Run command by arbitrary set of arguments (returns Exc, but error check is not mandatory). Pipes
/// all IO to stdout/err/in into current terminal, feels totally interactive.
/// Example: e$ret(os$cmd("cat", "./cex.c"))
#define os$cmd(args...)
/// Run command by dynamic or static array (returns Exc, but error check is not mandatory). Pipes
/// all IO to stdout/err/in into current terminal, feels totally interactive.
#define os$cmda(args, args_len...)
/// Path parts join by variable set of args: os$path_join(mem$, "foo", "bar", "cex.c")
#define os$path_join(allocator, path_parts...)
/// Command container (current state of subprocess)
typedef os_cmd_c
/// Additional flags for os.cmd.create()
typedef os_cmd_flags_s
/// File stats metadata (cross-platform), returned by os.fs.stats
typedef os_fs_stat_s
{
os // Autogenerated by CEX
// clang-format off
/// Get last system API error as string representation (Exception compatible). Result content may be
/// affected by OS locale settings.
Exc (*get_last_error)(void);
/// Sleep for `period_millisec` duration
void (*sleep)(u32 period_millisec);
/// Get high performance monotonic timer value in seconds
(*timer)(void);
f64
struct {
/// Creates new os command (use os$cmd() and os$cmd() for easy cases)
Exception (*create)(os_cmd_c* self, char** args, usize args_len, os_cmd_flags_s* flags);
/// Check if `cmd_exe` program name exists in PATH. cmd_exe can be absolute, or simple command name,
/// e.g. `cat`
bool (*exists)(char* cmd_exe);
/// Get running command stderr stream
FILE* (*fstderr)(os_cmd_c* self);
/// Get running command stdin stream
FILE* (*fstdin)(os_cmd_c* self);
/// Get running command stdout stream
FILE* (*fstdout)(os_cmd_c* self);
/// Checks if process is running
bool (*is_alive)(os_cmd_c* self);
/// Waits process to end, and get `out_ret_code`, if timeout_sec=0 - infinite wait, raises
/// Error.runtime if out_ret_code != 0
Exception (*join)(os_cmd_c* self, u32 timeout_sec, i32* out_ret_code);
/// Terminates the running process
Exception (*kill)(os_cmd_c* self);
/// Read all output from process stdout, NULL if stdout is not available
char* (*read_all)(os_cmd_c* self, IAllocator allc);
/// Read line from process stdout, NULL if stdout is not available
char* (*read_line)(os_cmd_c* self, IAllocator allc);
/// Run command using arguments array and resulting os_cmd_c
Exception (*run)(char** args, usize args_len, os_cmd_c* out_cmd);
/// Writes line to the process stdin
Exception (*write_line)(os_cmd_c* self, char* line);
} cmd;
struct {
/// Get environment variable, with `deflt` if not found
char* (*get)(char* name, char* deflt);
/// Set environment variable
Exception (*set)(char* name, char* value);
} env;
struct {
/// Change current working directory
Exception (*chdir)(char* path);
/// Copy file
Exception (*copy)(char* src_path, char* dst_path);
/// Copy directory recursively
Exception (*copy_tree)(char* src_dir, char* dst_dir);
/// Iterates over directory (can be recursive) using callback function
Exception (*dir_walk)(char* path, bool is_recursive, os_fs_dir_walk_f callback_fn, void* user_ctx);
/// Finds files in `dir/pattern`, for example "./mydir/*.c" (all c files), if is_recursive=true, all
/// *.c files found in sub-directories.
arr$(char*) (*find)(char* path_pattern, bool is_recursive, IAllocator allc);
/// Get current working directory
char* (*getcwd)(IAllocator allc);
/// Makes directory (no error if exists)
Exception (*mkdir)(char* path);
/// Makes all directories in a path
Exception (*mkpath)(char* path);
/// Removes file or empty directory (also see os.fs.remove_tree)
Exception (*remove)(char* path);
/// Removes directory and all its contents recursively
Exception (*remove_tree)(char* path);
/// Renames file or directory
Exception (*rename)(char* old_path, char* new_path);
/// Returns cross-platform path stats information (see os_fs_stat_s)
(*stat)(char* path);
os_fs_stat_s } fs;
struct {
/// Returns absolute path from relative
char* (*abs)(char* path, IAllocator allc);
/// Get file name of a path
char* (*basename)(char* path, IAllocator allc);
/// Get directory name of a path
char* (*dirname)(char* path, IAllocator allc);
/// Check if file/directory path exists
bool (*exists)(char* file_path);
/// Join path with OS specific path separator
char* (*join)(char** parts, u32 parts_len, IAllocator allc);
/// Splits path by `dir` and `file` parts, when return_dir=true - returns `dir` part, otherwise
/// `file` part
(*split)(char* path, bool return_dir);
str_s } path;
struct {
/// Returns OSArch from string
(*arch_from_str)(char* name);
OSArch_e /// Converts arch to string
char* (*arch_to_str)(OSArch_e platform);
/// Returns current OS platform, returns enum of OSPlatform__*, e.g. OSPlatform__win,
/// OSPlatform__linux, OSPlatform__macos, etc..
(*current)(void);
OSPlatform_e /// Returns string name of current platform
char* (*current_str)(void);
/// Converts platform name to enum
(*from_str)(char* name);
OSPlatform_e /// Converts platform enum to name
char* (*to_str)(OSPlatform_e platform);
} platform;
// clang-format on
};
sbuf
Dynamic string builder class
Key features:
Dynamically grown strings
Supports CEX specific formats
Can be backed by allocator or static buffer
Error resilient - allows self as NULL
sbuf_c
- is an alias ofchar*
, always null terminated, compatible with any C stringsAllocator driven dynamic string
= sbuf.create(20, mem$);
sbuf_c s
// These may fail (you may use them with e$* checks or add final check)
.appendf(&s, "%s, CEX slice: %S\n", "456", str$s("slice"));
sbuf.append(&s, "some string");
sbuf
(err, sbuf.validate(&s)) {
e$except// Error handling
}
if (!sbuf.isvalid(&s)) {
// Error, just a boolean flag
}
// Some other stuff
[i] // getting i-th character of string
s(s); // C strings work, because sbuf_c is vanilla char*
strlen.len(&s); // faster way of getting length (uses metadata)
sbuf.grow(&s, new_capacity); // increase capacity
sbuf.capacity(&s); // current capacity, 0 if error occurred
sbuf.clear(&s); // reset dynamic string + null term
sbuf
// Frees the memory and sets s to NULL
.destroy(&s); sbuf
- Static buffer backed string
// NOTE: `s` address is different, because `buf` will contain header and metadata, use only `s`
char buf[64];
= sbuf.create_static(buf, arr$len(buf));
sbuf_c s
// You may check every operation if needed, but this more verbose
e$ret(sbuf.appendf(&s, "%s, CEX slice: %S\n", "456", str$s("slice")));
e$ret(sbuf.append(&s, "some string"));
// It's not mandatory, but will clean up buffer data at the end
.destroy(&s); sbuf
typedef char* sbuf_c
typedef struct sbuf_head_s
{
sbuf // Autogenerated by CEX
// clang-format off
/// Append string to the builder
Exc (*append)(sbuf_c* self, char* s);
/// Append format (using CEX formatting engine)
Exc (*appendf)(sbuf_c* self, char* format,...);
/// Append format va (using CEX formatting engine), always null-terminating
Exc (*appendfva)(sbuf_c* self, char* format, va_list va);
/// Returns string capacity from its metadata
(*capacity)(sbuf_c* self);
u32 /// Clears string
void (*clear)(sbuf_c* self);
/// Creates new dynamic string builder backed by allocator
(*create)(usize capacity, IAllocator allocator);
sbuf_c /// Creates dynamic string backed by static array
(*create_static)(char* buf, usize buf_size);
sbuf_c /// Destroys the string, deallocates the memory, or nullify static buffer.
(*destroy)(sbuf_c* self);
sbuf_c /// Returns false if string invalid
bool (*isvalid)(sbuf_c* self);
/// Returns string length from its metadata
(*len)(sbuf_c* self);
u32 /// Shrinks string length to new_length
Exc (*shrink)(sbuf_c* self, usize new_length);
/// Validate dynamic string state, with detailed Exception
Exception (*validate)(sbuf_c* self);
// clang-format on
};
str
CEX string principles:
str
namespace is build for compatibility with C strings- all string functions are NULL resilient
- all string functions can return NULL on error
- you don’t have to check every operation for NULL every time, just at the end
- all string format operations support CEX specific specificators (see below)
String slices:
Slices are backed by
(str_s){.buf = s, .len = NNN}
structSlices are passed by value and allocated on stack
Slices can be made from null-terminated strings, or buffers, or literals
str$s(“hello”) - use this for compile time defined slices/constants
Slices are not guaranteed to be null-terminated
Slices support operations which allowed by read-only string view representation
CEX formatting uses
%S
for slices:io.print("Hello %S\n", str$s("world"))
Working with slices:
(test_cstr)
test$case{
char* cstr = "hello";
= str.sstr(cstr);
str_s s (s.buf, cstr);
tassert_eq(s.len, 5);
tassert_eq(s.buf == cstr);
tassert(str.len(s.buf), 5);
tassert_eq}
- Getting substring as slices
.sub("123456", 0, 0); // slice: 123456
str.sub("123456", 1, 0); // slice: 23456
str.sub("123456", 1, -1); // slice: 2345
str.sub("123456", -3, -1); // slice: 345
str.sub("123456", -30, 2000); // slice: (str_s){.buf = NULL, .len = 0} (error, but no crash)
str
// works with slices too
= str.sstr("123456");
str_s s = str.slice.sub(s, 1, 2); str_s sub
- Splitting / iterating via tokens
// Working without mem allocation
= str.sstr("123,456");
s (str_s, it, str.slice.iter_split(s, ",", &it.iterator)) {
for$iter .printf("%S\n", it.val); // NOTE: it.val is non null-terminated slice
io}
// Mem allocating split
mem$scope(tmem$, _)
{
// NOTE: each `res` item will be allocated C-string, use tmem$ or deallocate independently
arr$(char*) res = str.split("123,456,789", ",", _);
(res != NULL); // NULL on error
tassert
for$each (v, res) {
.printf("%s\n", v); // NOTE: strings now cloned and null-terminated
io}
}
- Chaining string operations
mem$scope(tmem$, _)
{
char* s = str.fmt(_, "hi there"); // NULL on error
= str.replace(s, "hi", "hello", _); // NULL tolerant, NULL on error
s = str.fmt(_, "result is: %s", s); // NULL tolerant, NULL on error
s if (s == NULL) {
// TODO: oops error occurred, in one of three operations, but we don't need to check each one
}
(s, "result is: hello there");
tassert_eq}
- Pattern matching
// Pattern matching 101
// * - one or more characters
// ? - one character
// [abc] - one character a or b or c
// [!abc] - one character, but not a or b or c
// [abc+] - one or more characters a or b or c
// [a-cA-C0-9] - one character in a range of characters
// \\* - escaping literal '*'
// (abc|def|xyz) - matching combination of words abc or def or xyz
(str.match("test.txt", "*?txt"));
tassert(str.match("image.png", "image.[jp][pn]g"));
tassert(str.match("backup.txt", "[!a]*.txt"));
tassert(!str.match("D", "[a-cA-C0-9]"));
tassert(str.match("1234567890abcdefABCDEF", "[0-9a-fA-F+]"));
tassert(str.match("create", "(run|build|create|clean)"));
tassert
// Works with slices
= str$s("my_test __String.txt");
str_s src (str.slice.match(src, "*"));
tassert(str.slice.match(src, "*.txt*"));
tassert(str.slice.match(src, "my_test*.txt")); tassert
/// Parses string contents as value type based on generic numeric type of out_var_ptr
#define str$convert(str_or_slice, out_var_ptr)
/// Joins parts of strings using a separator str$join(allc, ",", "a", "b", "c") -> "a,b,c"
#define str$join(allocator, str_join_by, str_parts...)
/// creates str_s, instance from string literals/constants: str$s("my string")
#define str$s(string)
/// Represents char* slice (string view) + may not be null-term at len!
typedef struct str_s
{
str // Autogenerated by CEX
// clang-format off
/// Clones string using allocator, null tolerant, returns NULL on error.
char* (*clone)(char* s, IAllocator allc);
/// Makes a copy of initial `src`, into `dest` buffer constrained by `destlen`. NULL tolerant,
/// always null-terminated, overflow checked.
Exception (*copy)(char* dest, char* src, usize destlen);
/// Checks if string ends with prefix, returns false on error, NULL tolerant
bool (*ends_with)(char* s, char* suffix);
/// Compares two null-terminated strings (null tolerant)
bool (*eq)(char* a, char* b);
/// Compares two strings, case insensitive, null tolerant
bool (*eqi)(char* a, char* b);
/// Find a substring in a string, returns pointer to first element. NULL tolerant, and NULL on err.
char* (*find)(char* haystack, char* needle);
/// Find substring from the end , NULL tolerant, returns NULL on error.
char* (*findr)(char* haystack, char* needle);
/// Formats string and allocates it dynamically using allocator, supports CEX format engine
char* (*fmt)(IAllocator allc, char* format,...);
/// Joins string using a separator (join_by), NULL tolerant, returns NULL on error.
char* (*join)(char** str_arr, usize str_arr_len, char* join_by, IAllocator allc);
/// Calculates string length, NULL tolerant.
(*len)(char* s);
usize /// Returns new lower case string, returns NULL on error, null tolerant
char* (*lower)(char* s, IAllocator allc);
/// String pattern matching check (see ./cex help str$ for examples)
bool (*match)(char* s, char* pattern);
/// libc `qsort()` comparator functions, for arrays of `char*`, sorting alphabetical
int (*qscmp)(const void* a, const void* b);
/// libc `qsort()` comparator functions, for arrays of `char*`, sorting alphabetical case insensitive
int (*qscmpi)(const void* a, const void* b);
/// Replaces substring occurrence in a string
char* (*replace)(char* s, char* old_sub, char* new_sub, IAllocator allc);
/// Creates string slice from a buf+len
(*sbuf)(char* s, usize length);
str_s /// Splits string using split_by (allows many) chars, returns new dynamic array of split char*
/// tokens, allocates memory with allc, returns NULL on error. NULL tolerant. Items of array are
/// cloned, so you need free them independently or better use arena or tmem$.
arr$(char*) (*split)(char* s, char* split_by, IAllocator allc);
/// Splits string by lines, result allocated by allc, as dynamic array of cloned lines, Returns NULL
/// on error, NULL tolerant. Items of array are cloned, so you need free them independently or
/// better use arena or tmem$. Supports \n or \r\n.
arr$(char*) (*split_lines)(char* s, IAllocator allc);
/// Analog of sprintf() uses CEX sprintf engine. NULL tolerant, overflow safe.
Exc (*sprintf)(char* dest, usize dest_len, char* format,...);
/// Creates string slice of input C string (NULL tolerant, (str_s){0} on error)
(*sstr)(char* ccharptr);
str_s /// Checks if string starts with prefix, returns false on error, NULL tolerant
bool (*starts_with)(char* s, char* prefix);
/// Makes slices of `s` char* string, start/end are indexes, can be negative from the end, if end=0
/// mean full length of the string. `s` may be not null-terminated. function is NULL tolerant,
/// return (str_s){0} on error
(*sub)(char* s, isize start, isize end);
str_s /// Returns new upper case string, returns NULL on error, null tolerant
char* (*upper)(char* s, IAllocator allc);
/// Analog of vsprintf() uses CEX sprintf engine. NULL tolerant, overflow safe.
Exception (*vsprintf)(char* dest, usize dest_len, char* format, va_list va);
struct {
Exception (*to_f32)(char* s, f32* num);
Exception (*to_f32s)(str_s s, f32* num);
Exception (*to_f64)(char* s, f64* num);
Exception (*to_f64s)(str_s s, f64* num);
Exception (*to_i16)(char* s, i16* num);
Exception (*to_i16s)(str_s s, i16* num);
Exception (*to_i32)(char* s, i32* num);
Exception (*to_i32s)(str_s s, i32* num);
Exception (*to_i64)(char* s, i64* num);
Exception (*to_i64s)(str_s s, i64* num);
Exception (*to_i8)(char* s, i8* num);
Exception (*to_i8s)(str_s s, i8* num);
Exception (*to_u16)(char* s, u16* num);
Exception (*to_u16s)(str_s s, u16* num);
Exception (*to_u32)(char* s, u32* num);
Exception (*to_u32s)(str_s s, u32* num);
Exception (*to_u64)(char* s, u64* num);
Exception (*to_u64s)(str_s s, u64* num);
Exception (*to_u8)(char* s, u8* num);
Exception (*to_u8s)(str_s s, u8* num);
} convert;
struct {
/// Clone slice into new char* allocated by `allc`, null tolerant, returns NULL on error.
char* (*clone)(str_s s, IAllocator allc);
/// Makes a copy of initial `src` slice, into `dest` buffer constrained by `destlen`. NULL tolerant,
/// always null-terminated, overflow checked.
Exception (*copy)(char* dest, str_s src, usize destlen);
/// Checks if slice ends with prefix, returns (str_s){0} on error, NULL tolerant
bool (*ends_with)(str_s s, str_s suffix);
/// Compares two string slices, null tolerant
bool (*eq)(str_s a, str_s b);
/// Compares two string slices, null tolerant, case insensitive
bool (*eqi)(str_s a, str_s b);
/// Get index of first occurrence of `needle`, returns -1 on error.
(*index_of)(str_s s, str_s needle);
isize /// iterator over slice splits: for$iter (str_s, it, str.slice.iter_split(s, ",", &it.iterator)) {}
(*iter_split)(str_s s, char* split_by, cex_iterator_s* iterator);
str_s /// Removes white spaces from the beginning of slice
(*lstrip)(str_s s);
str_s /// Slice pattern matching check (see ./cex help str$ for examples)
bool (*match)(str_s s, char* pattern);
/// libc `qsort()` comparator function for alphabetical sorting of str_s arrays
int (*qscmp)(const void* a, const void* b);
/// libc `qsort()` comparator function for alphabetical case insensitive sorting of str_s arrays
int (*qscmpi)(const void* a, const void* b);
/// Replaces slice prefix (start part), or returns the same slice if it's not found
(*remove_prefix)(str_s s, str_s prefix);
str_s /// Replaces slice suffix (end part), or returns the same slice if it's not found
(*remove_suffix)(str_s s, str_s suffix);
str_s /// Removes white spaces from the end of slice
(*rstrip)(str_s s);
str_s /// Checks if slice starts with prefix, returns (str_s){0} on error, NULL tolerant
bool (*starts_with)(str_s s, str_s prefix);
/// Removes white spaces from both ends of slice
(*strip)(str_s s);
str_s /// Makes slices of `s` slice, start/end are indexes, can be negative from the end, if end=0 mean
/// full length of the string. `s` may be not null-terminated. function is NULL tolerant, return
/// (str_s){0} on error
(*sub)(str_s s, isize start, isize end);
str_s } slice;
// clang-format on
};
test$
Unit Testing engine:
- Running/building tests
./cex test create tests/test_mytest.c
./cex test run tests/test_mytest.c
./cex test run all
./cex test debug tests/test_mytest.c
./cex test clean all
./cex test --help
- Unit Test structure
() {
test$setup_case// Optional: runs before each test case
return EOK;
}
() {
test$teardown_case// Optional: runs after each test case
return EOK;
}
() {
test$setup_suite// Optional: runs once before full test suite initialized
return EOK;
}
() {
test$teardown_suite// Optional: runs once after full test suite ended
return EOK;
}
(my_test_case){
test$casee$ret(foo("raise")); // this test will fail if `foo()` raises Exception
return EOK; // Must return EOK for passed
}
(my_test_another_case){
test$case(1, 0); // tassert_ fails test, but not abort the program
tassert_eqreturn EOK; // Must return EOK for passed
}
(); // mandatory at the end of each test test$main
- Test checks
(my_test_case){
test$case// Generic type assertions, fails and print values of both arguments
(1, 1);
tassert_eq(str, "foo");
tassert_eq(num, 3.14);
tassert_eq(str_slice, str$s("expected") );
tassert_eq
(condition && "oops");
tassert(condition, "oops: %s", s);
tassertf
(EOK, raising_exc_foo(0));
tassert_er(Error.argument, raising_exc_foo(-1));
tassert_er
(PI, 3.14, 0.01); // 0.01 is float tolerance
tassert_eq_almost(3.4 * NAN, NAN); // NAN equality also works
tassert_eq
(a, b); // raw pointer comparison
tassert_eq_ptr(a, b); // raw buffer content comparison (a and b expected to be same size)
tassert_eq_mem
(a, b); // compare two arrays (static or dynamic)
tassert_eq_arr
(1, 0); // not equal
tassert_ne(a, b); // a <= b
tassert_le(a, b); // a < b
tassert_lt(a, b); // a > b
tassert_gt(a, b); // a >= b
tassert_ge
return EOK;
}
/// Unit-test test case
#define test$case(NAME)
/// main() function for test suite, you must place it into test file at the end
#define test$main()
/// Attribute for function which disables optimization for test cases or other functions
#define test$noopt
/// Optional: called before each test$case() starts
#define test$setup_case()
/// Optional: initializes at test suite once at start
#define test$setup_suite()
/// Optional: called after each test$case() ends
#define test$teardown_case()
/// Optional: shut down test suite once at the end
#define test$teardown_suite()
CEX lib
Role of CEX lib
CEX lib (see lib/
folder in repo) is designed to be a collection of random tools and libraries that are not so frequently used. Currently it’s in early alpha stage, API stability is not guaranteed, backward compatibility is not guaranteed. Feel free to contribute your ideas, if you think it could be useful.
Installing libraries
Installing and updating libs from the main CEX repo is pretty straightforward, and you can use:
cex libfetch lib/test/fff.h - fetch signle header lib from CEX repo
cex libfetch -U cex.h - update cex.h to most recent version
cex libfetch lib/random/ - fetch whole directory recursively from CEX lib
cex libfetch lib/ - fetch everything available in CEX lib
cex libfetch --git-label=v2.0 file.h - fetch using specific label or commit
cex libfetch -u https://github.com/m/lib.git file.h - fetch from arbitrary repo
cex help --example cexy.utils.git_lib_fetch - you can call it from your cex.c
cex libfetch --help - more help
Credits
CEX contains some code and ideas from the following projects, all of them licensed under MIT license (or Public Domain):
- nob.h - by Tsoding / Alexey Kutepov, MIT/Public domain, great idea of making self-contained build system, great youtube channel btw
- stb_ds.h - MIT/Public domain, by Sean Barrett, CEX arr/hm are refactored versions of STB data structures, great idea
- stb_sprintf.h - MIT/Public domain, by Sean Barrett, I refactored it, fixed all UB warnings from UBSAN, added CEX specific formatting
- minirent.h - Alexey Kutepov, MIT license, WIN32 compatibility lib
- subprocess.h - by Neil Henning, public domain, used in CEX as a daily driver for
os$cmd
and process communication - utest.h - by Neil Henning, public domain, CEX test$ runner borrowed some ideas of macro magic for making declarative test cases
- c3-lang - I borrowed some ideas about language features from C3, especially
mem$scope
,mem$
/tmem$
global allocators, scoped macros too.
License
MIT License
Copyright (c) 2024-2025 Aleksandr Vedeneev
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.