# Boost JSON Lines Processing with a High-Performance Rust Library

As a developer working with large datasets, I've often found myself wrestling with JSON Lines (JSONL) files. Whether it's processing log files, handling data exports, or working with streaming APIs, JSONL has become ubiquitous in the data world. However, I noticed a gap in the Rust ecosystem for efficient, async-first JSONL processing. That's why I decided to create **async-jsonl** - and I'm excited to share what I've built!

## What is JSON Lines (JSONL)?

Before diving into my library, let's quickly cover what JSONL actually is. JSON Lines is a text format where each line is a valid JSON object, separated by newline characters. It looks like this:

```json
{"id": 1, "name": "Alice", "age": 30}
{"id": 2, "name": "Bob", "age": 25}
{"id": 3, "name": "Charlie", "age": 35}
```

### Why JSONL is Awesome

1. **Streamable**: You can process one line at a time without loading the entire file into memory
    
2. **Appendable**: Easy to add new records to the end of a file
    
3. **Fault-tolerant**: One corrupted line doesn't break the entire dataset
    
4. **Tool-friendly**: Works great with unix tools like `cat`, `head`, `tail`, and `grep`
    
5. **Language-agnostic**: Supported across virtually every programming language
    

These advantages make JSONL perfect for log files, data pipelines, and any scenario where you're dealing with large volumes of structured data.

## Enter async-jsonl: Built for Performance and Simplicity

While working on various Rust projects that needed to process large JSONL files, I kept running into the same issues:

* Existing solutions weren't async-first
    
* Memory usage would explode with large files
    
* Error handling was either too strict (one bad line kills everything) or too loose
    
* Type safety was often sacrificed for convenience
    

So I built **async-jsonl** to solve these problems. Here's what makes it special:

### 🚀 **Async/Await from the Ground Up**

Built on Tokio, async-jsonl is designed for high-performance async I/O. **No blocking** the event loop here!

```rust
use async_jsonl::Jsonl;
use futures::StreamExt;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let jsonl = Jsonl::from_path("data.jsonl").await?;
    
    let lines: Vec<_> = jsonl.collect().await;
    for line_result in lines {
        let line = line_result?;
        println!("Raw JSON: {}", line);
    }
    
    Ok(())
}
```

### 💾 **Memory Efficient Streaming**

Process files of any size without loading everything into memory. The streaming approach means you can handle gigabyte files with minimal RAM usage.

```rust
use async_jsonl::{Jsonl, JsonlDeserialize};
use futures::StreamExt;
use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct LogEntry {
    timestamp: String,
    level: String,
    message: String,
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let jsonl = Jsonl::from_path("huge_log_file.jsonl").await?;
    let mut stream = jsonl.deserialize::<LogEntry>();
    
    // Process one record at a time - O(1) memory usage!
    while let Some(entry_result) = stream.next().await {
        let entry = entry_result?;
        if entry.level == "ERROR" {
            println!("🚨 Error at {}: {}", entry.timestamp, entry.message);
        }
    }
    
    Ok(())
}
```

### 🔒 **Type-Safe with Serde Integration**

Full integration with serde means you get compile-time type checking and zero-cost deserialization.

```rust
use async_jsonl::{Jsonl, JsonlDeserialize};
use futures::StreamExt;
use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct User {
    id: u64,
    name: String,
    email: String,
    #[serde(default)]
    is_verified: bool,
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let jsonl = Jsonl::from_path("users.jsonl").await?;
    let users = jsonl.deserialize::<User>();
    
    let results: Vec<_> = users.collect().await;
    for user_result in results {
        let user = user_result?;
        println!("{:?}", user);
    }
    
    Ok(())
}
```

### 🛡️ **Resilient Error Handling**

One of my favorite features: the library continues processing even when individual lines fail to parse. Perfect for real-world data that's not always perfect.

```rust
use async_jsonl::{Jsonl, JsonlValueDeserialize};
use futures::StreamExt;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let jsonl = Jsonl::from_path("messy_data.jsonl").await?;
    let mut stream = jsonl.deserialize_values();
    
    let mut valid_records = 0;
    let mut error_count = 0;
    
    while let Some(result) = stream.next().await {
        match result {
            Ok(value) => {
                valid_records += 1;
                // Process valid JSON
                println!("Valid record: {}", value);
            }
            Err(e) => {
                error_count += 1;
                eprintln!("Skipping invalid line: {}", e);
                // Keep going!
            }
        }
    }
    
    println!("Processed {} valid records, {} errors", valid_records, error_count);
    Ok(())
}
```

### 📖 **Flexible Input Sources**

Whether you're reading from files, memory, or any `AsyncRead` source, async-jsonl has you covered.

```rust
use async_jsonl::Jsonl;
use std::io::Cursor;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let data = r#"{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Charlie"}
"#;
    
    // Read from memory
    let reader = Cursor::new(data.as_bytes());
    let jsonl = Jsonl::new(reader);
    
    // Or read from a file
    let file_jsonl = Jsonl::from_path("data.jsonl").await?;
    
    Ok(())
}
```

### 🎯 **Advanced Features for Power Users**

The library also includes some advanced features I found myself needing repeatedly:

**Line Counting Without Full Processing:**
```rust
use async_jsonl::{Jsonl, JsonlReader};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let jsonl = Jsonl::from_path("massive_file.jsonl").await?;
    let count = jsonl.count().await?;
    
    println!("File contains {} records", count);
    Ok(())
}
```

## The Technical Journey

Building this library taught me a lot about async Rust and streaming APIs. Some key decisions I made:

1. **Tokio as the Foundation**: Built on `tokio::io` for maximum compatibility with the async ecosystem
    
2. **Stream-based Architecture**: Using futures's `Stream` trait for composable, lazy processing
    
3. **Zero-copy Where Possible**: Minimizing allocations while maintaining safety
    
4. **Comprehensive Error Types**: Using `anyhow` for ergonomic error handling without sacrificing information
    

## Real-World Performance

In my testing with large datasets (&gt;1GB JSONL files), async-jsonl consistently shows:

* **Memory usage**: Constant regardless of file size
    
* **Throughput**: Competitive with the fastest JSONL parsers in any language
    
* **CPU efficiency**: Low overhead thanks to Rust's zero-cost abstractions
    

## Try It Yourself!

The library is available on [crates.io](https://crates.io/crates/async-jsonl) and the source is on [GitHub](https://github.com/gpmcp/async-jsonl). Here's how to get started:

```ini
[dependencies]
async-jsonl = "0.3.1"
tokio = { version = "1.0", features = ["full"] }
futures = "0.3"
serde = { version = "1.0", features = ["derive"] }
anyhow = "1.0"
```

I'd love to hear how you use it in your projects! Whether you're processing logs, handling data pipelines, or working with APIs, async-jsonl is designed to make your life easier.

---

*What do you think? Have you worked with JSONL files before? What features would you find most useful in a JSONL processing library? Let me know in the comments below!*
