HeroParser CLI is an AI-native, high-performance command-line utility for working with tabular datasets, powered by the HeroParser library. It supports validation, schema inference, converting between formats (CSV, Excel, Fixed-Width, JSONL, HTB), profiling, querying, and AI translation.
HeroParser is a zero-allocation, SIMD-accelerated tabular data parser and writer for .NET 8, 9, and 10. Designed for extreme speed, memory efficiency, and Native AOT compatibility, it also offers first-class integrations for AI agents, vector embeddings, and LLM pipelines.
Why Choose HeroParser?
Extreme Performance: Engineered with AVX-512, AVX2, and ARM NEON SIMD optimizations to deliver ultra-high-throughput reading and writing.
AI-Native integrations: Built-in support for token-budgeted chunking, LLM output structured repair, vector embedding pipelines, and agent tool mapping.
Zero Dependencies & Low Footprint: Operates with zero external packages. Employs a fixed 112-byte heap memory footprint on the reading hot-path regardless of file size.
Unified Attributes: Annotate your C# classes once, and use them across CSV, Excel, Fixed-Width, and HTB APIs.
Performance & Memory
Tested under .NET 10.0 on an AMD Ryzen AI 9 HX PRO 370 CPU:
Read Throughput: SIMD-accelerated UTF-8 (byte[]) read paths on both quoted and unquoted data.
curl -fsSL https://raw.githubusercontent.com/KoalaFacts/HeroParser/main/install.sh | sh
On Windows, you can install via WinGet (pending registry merge):
winget install KoalaFacts.HeroParser
Simple Quick Starts
Define your record type. Decorate it with [GenerateBinder] to enable source-generated, reflection-free, and Native AOT-safe binding:
using HeroParser;
[GenerateBinder]
public class Product
{
public int Id { get; set; }
public string Name { get; set; } = "";
[Validate(RangeMin = 0)]
public decimal Price { get; set; }
[Parse(Format = "yyyy-MM-dd")]
public DateTime ReleaseDate { get; set; }
}
CSV
// Read a file in one line (zero-allocation binding)
List products = Csv.Read().FromFile("products.csv").ToList();
// Async stream millions of rows without buffering
await foreach (Product p in Csv.Read().FromFileAsync("products.csv"))
{
Console.WriteLine($"{p.Name}: {p.Price:C}");
}
// Write collection to a file
Csv.Write().ToFile("out.csv", products);
Excel (.xlsx)
Reads and writes Excel workbooks with zero external dependencies (utilizes only standard .NET compression and XML packages).
using HeroParser.Excels.Core;
// Read Excel files
List products = Excel.Read().FromFile("products.xlsx");
// Write Excel workbook with custom header styles, column styles, and auto-merged duplicate values
var headerStyle = ExcelStyle.Create()
.WithFont(f => f.WithName("Arial").WithSize(12).WithBold().WithColor("FFFFFF"))
.WithFill(fill => fill.WithSolidColor("007ACC")); // Blue background
Excel.Write()
.WithHeaderStyle(headerStyle)
.WithMergeDuplicates(p => p.Category) // Vertically merge contiguous duplicate Categories
.ToFile("out.xlsx", products);
Fixed-Width
Map properties to specific character boundaries using the [PositionalMap] attribute:
[GenerateBinder]
public class Employee
{
[PositionalMap(Start = 0, Length = 10)]
public string Id { get; set; } = "";
[PositionalMap(Start = 10, Length = 30)]
public string Name { get; set; } = "";
[PositionalMap(Start = 40, Length = 10, Alignment = FieldAlignment.Right)]
public decimal Salary { get; set; }
}
// Read positional records
var employees = FixedWidth.Read().FromFile("employees.dat").Records;
// Write positional records
FixedWidth.Write().ToFile("out.dat", employees);
JSON Lines (JSONL)
Perfect for AI fine-tuning datasets, streamed LLM responses, and database bulk loading.
// Read JSONL files (AOT-safe)
var records = Jsonl.Read().FromFile("products.jsonl").ToList();
// Write JSONL files
Jsonl.Write().ToFile("out.jsonl", records);
High-Throughput Tabular Binary (HTB)
A custom, high-speed binary serialization format optimized for zero allocations, platform independence, and vector embedding storage (supporting float[] arrays).
Why HTB?
Zero Heap Allocations: Utilizes Roslyn source generators ([GenerateBinder]) to map properties directly with zero-boxing and zero-reflection overhead.
Platform-Independent Endianness: Automatically handles big-endian byte-order reversal for floats, doubles, and ints for cross-architecture safety.
Allocation-Free CSV ↔ HTB Conversion: Stream-convert CSV directly to HTB (and vice-versa) with zero heap allocation overhead.
AOT & Trim Ready: Fully compatible with Native AOT compilation out-of-the-box.
// Read HTB binary files (AOT-safe)
List products = Htb.Read().FromFile("products.htb").ToList();
// Async stream HTB records
await foreach (Product p in Htb.Read().FromFileAsync("products.htb"))
{
Console.WriteLine($"{p.Name}: {p.Price:C}");
}
// Write records to an HTB file
Htb.Write().ToFile("out.htb", products);
// Direct, allocation-free CSV ↔ HTB conversions
Htb.ConvertFromCsv("products.csv", "products.htb", HtbSchema.FromType());
Console (High-Performance Terminal Widget Engine)
A zero-allocation, reflection-free, and 100% Native AOT-compatible library designed to replace or drop-in substitute Spectre.Console in high-performance terminal applications.
using HeroParser.Console;
// Render styled ANSI markup
AnsiConsole.MarkupLine("[bold green]Success:[/] Row validation completed in [yellow]4.2ms[/].");
// Render highly styled tables and panels with zero allocations
var table = new Table().Border(TableBorder.Rounded);
table.AddColumn("[blue]Filename[/]");
table.AddColumn("[blue]Records[/]");
table.AddRow("data.csv", "10,240");
table.AddRow("data.jsonl", "102,400");
AnsiConsole.Write(table);
Unified Attribute System
Annotate a single record class once, and read or write it across multiple formats:
HeroParser includes first-class support for LLM, vector search, and RAG pipelines:
1. LLM Structured Output Repair (LlmRepair)
Repairs truncated final rows (unclosed quotes/escapes) and strips markdown code blocks from raw LLM text streams.
using HeroParser.AI;
// Repaired on-the-fly and parsed directly into strongly-typed records
await foreach (var dev in LlmRepair.ReadFromTextAsync(rawLlmResponse))
{
Console.WriteLine($"{dev.Name} is a {dev.Role}");
}
Maps flat dictionaries of case-insensitive arguments returned by tool calling models into typed record models, executing validation constraints and raising rich validation feedback.
using HeroParser.AI;
Developer dev = SchemaMetadata.MapFromToolCall(arguments);
Chunks datasets into valid, token-bounded JSON array blocks, optimized for ingestion into RAG context windows.
await foreach (var jsonChunk in developers.ToJsonLlmChunksAsync(options)) { ... }
Key Features
SIMD-accelerated CSV parsing — AVX-512, AVX2, and ARM NEON instruction sets; PCLMULQDQ-based branchless quote tracking
Zero allocations — fixed 4 KB stack footprint regardless of column count or file size; ArrayPool for buffers
AOT/trimming ready — source generators emit reflection-free binders; annotated with [RequiresUnreferencedCode] where reflection is unavoidable
Async streaming — IAsyncEnumerable for CSV, Fixed-Width, Excel, JSONL, and HTB; true non-blocking I/O with sync fast paths
Excel without extra dependencies — reads and writes .xlsx using only System.IO.Compression and System.Xml
JSONL for AI/ML pipelines — Jsonl.Read() / Jsonl.Write() mirror the CSV builder pattern; AOT-safe via JsonTypeInfo; CsvToJsonlConverter projects tabular data into OpenAI/Anthropic fine-tuning shapes
HTB binary format — custom, high-speed, zero-allocation binary format featuring float-array embedding support, big-endian byte-order reversal, and direct CSV ↔ HTB conversion
Embedding-API batching — IAsyncEnumerable.BatchAsync(size) groups streamed records into fixed-size batches for OpenAI/Voyage/Cohere/Anthropic embedding calls
DataReader support — Csv.CreateDataReader(), FixedWidth.CreateDataReader(), Excel.CreateDataReader(), Jsonl.CreateDataReader() for database bulk loading via SqlBulkCopy
PipeReader integration — Csv.ReadFromPipeReaderAsync(pipe) for network streaming without buffering the entire payload
Multi-schema CSV — discriminator-based row routing to different record types; source-generated dispatch for ~2.85x faster throughput
Delimiter detection — auto-detect comma, semicolon, pipe, or tab from sample rows with a confidence score
Field validation — [Validate] constraints (NotNull, NotEmpty, Range, Pattern) collected lazily; result.ThrowIfAnyError() for fail-fast
CSV injection protection — configurable sanitization modes for user-data exports
Progress reporting — row/byte callbacks for large-file UX
Custom type converters — register converters for domain types on any reader or writer
Write capacity pre-allocation — backing buffer capacity is pre-allocated via estimated record counts when writing collections to strings, yielding up to 64% speedup and completely eliminating buffer resize copy overhead
Multi-framework — .NET 8, 9, 10; CI validates all three on Windows, Linux, and macOS
Detailed Documentation
For advanced features and full API guides, see the files under the docs folder:
CSV Guide — Fluent readers/writers, validation, PipeReader, and multi-schema dispatching.
Excel Guide — Multi-sheet workbooks, custom formatting, and progress tracking.
Fixed-Width Guide — Positional mapping, alignment, padding, and custom type converters.
HTB Guide — High-Throughput Tabular Binary format, fluent APIs, CSV ↔ HTB conversion, and Native AOT support.
Console Guide — Standard rendering, markup, widgets (Table, Panel, Rule), interactive prompts, and Spectre.Console compatibility.
CLI Guide — Global installation, interactive wizard mode, validation, conversion, and AI-native query/translate commands.
Benchmarks Guide — Execution environments, detailed CPU metrics, and comparisons.
Building & Testing
# Build all projects
dotnet build
# Run unit tests
dotnet test --filter Category=Unit
# Run integration tests
dotnet test --filter Category=Integration
# Run all tests
dotnet test
# Check code formatting
dotnet format --verify-no-changes
# Run benchmarks
dotnet run -c Release --project benchmarks/HeroParser.Benchmarks