xan médialab Sciences Po

csv data datascience diagram graph graphics plot statistics stats

Use this command to install xan:

winget install --id=medialab.xan -e

xan: Efficient CSV Processing at Your Command Line

Primary Purpose:
xan is a high-performance command-line tool designed for efficient processing of CSV files. Built with Rust, it excels in speed and memory efficiency, making it ideal for handling large datasets seamlessly.

Key Features:

Rust-Powered Efficiency: Leverages Rust's performance capabilities, a novel SIMD parser, and multithreading to ensure rapid processing even with gigabytes of data.
Versatile Command Set: Offers a wide range of commands for previewing, filtering, aggregating, sorting, and joining CSV files, allowing complex operations through composable chains.
Advanced Expression Language: Enables complex tasks with a tailored language optimized for CSV data, surpassing the speed of dynamically-typed languages like Python or JavaScript.
Format Flexibility: Supports various CSV-adjacent formats (e.g., .cdx, bioinformatics files) and converts to/from formats such as JSON, Excel, and numpy arrays.
Terminal Visualization: Displays CSV files directly in the terminal for easy exploration and creates basic visualizations like histograms and heatmaps.

Audience & Benefit:
Tailored for data analysts, researchers, and developers managing large or complex CSV datasets, xan provides a robust toolkit that enhances productivity. Its speed, efficiency, and versatile features enable users to process data swiftly and effectively, saving time while handling intricate tasks with ease.

README

`xan`, the CSV magician

xan is a command line tool that can be used to process CSV files directly from the shell.

It has been written in Rust to be as fast as possible, use as little memory as possible, and can very easily handle large CSV files (Gigabytes). It leverages a novel SIMD CSV parser and is also able to parallelize some computations (through multithreading) to make some tasks complete as fast as your computer can allow.

It can easily preview, filter, slice, aggregate, sort, join CSV files, and exposes a large collection of composable commands that can be chained together to perform a wide variety of typical tasks.

xan also offers its own expression language so you can perform complex tasks that cannot be done by relying on the simplest commands. This minimalistic language has been tailored for CSV data and is way faster than evaluating typical dynamically-typed languages such as Python, Lua, JavaScript etc.

Note that this tool is originally a fork of BurntSushi's xsv, but has been nearly entirely rewritten at that point, to fit SciencesPo's médialab use-cases, rooted in web data collection and analysis geared towards social sciences (you might think CSV is outdated by now, but read our love letter to the format before judging too quickly).

xan therefore goes beyond typical data manipulation and expose utilities related to lexicometry, graph theory and even scraping.

Beyond CSV data, xan is able to process a large variety of CSV-adjacent data formats from many different disciplines such as web archival (.cdx) or bioinformatics (.vcf, .gtf, .sam, .bed etc.). xan is also able to convert to & from many data formats such as json, excel files, numpy arrays etc. using xan to and xan from. See section for more detail.

view command	flatten command

*categorical histogram*	*scatterplot*

*categorical scatterplot*	*histograms*

*parallel processing*	*time series*

*small multiples (facet grid)*	*grouped view*

*correlation matrix heatmap*	*heatmap*

xan médialab Sciences Po

README

xan, the CSV magician

Summary

How to install

Cargo

Scoop (Windows)

Homebrew (macOS)

Arch Linux

NetBSD

Nix

Pixi (Linux, macOS, Windows)

Pre-built binaries

Installing completions

Quick tour

Downloading the corpus

Displaying the file's headers

Counting the number of rows

Previewing the file in the terminal

Reading a flattened representation of the first row

Searching for rows

Selecting some columns

Sorting the file

Deduplicating the file on some column

Computing frequency tables

Printing a histogram

Computing descriptive statistics

Evaluating an expression to filter a file

Evaluating an expression to create a new column based on other ones

Transform a column by evaluating an expression

Performing custom aggregation

Grouping rows and performing per-group aggregation

Available commands

General flags and IO model

Getting help

Regarding input & output formats

Working with headless CSV file

Regarding stdin

Regarding stdout

Supported file formats

Compressed files

Regarding color

Expression language reference

Cookbook

News

How to cite?

Frequently Asked Questions

How to display a vertical bar chart?

`xan`, the CSV magician