Summarize by Aili

One Billion Row Challenge in Golang - From 95s to 1.96s

https://r2p.dev/b/2024-03-18-1brc-go/

🌈 Abstract

The article discusses the author's experience in optimizing a Go program to solve the One Billion Row Challenge, which involves reading a 1 billion line file, aggregating the data, and printing a report with the results.

🙋 Q&A

[01] Optimizing the Go Program

1. What were the initial steps taken to optimize the Go program?

The author started with a naive approach and found that the initial implementation took around 95 seconds to run.
The author then explored various techniques to improve the performance, such as using a scanner buffer, bufio.Reader, and direct file reading.
The author found that using a large buffer size (around 4MB) for file.Read provided the best performance, reducing the time to around 1 second.

2. How did the author leverage goroutines and communication to further optimize the program?

The author created multiple goroutines to process the data in parallel and used channels to communicate between the main thread and the worker goroutines.
The author also implemented a custom "leftover logic" to handle incomplete lines at the end of each buffer read.
The author experimented with different numbers of workers and channel buffer sizes to find the optimal configuration, which resulted in a further reduction in the execution time.

3. What other optimizations did the author implement?

The author replaced the built-in bytes.Split function with a custom implementation to avoid memory allocations.
The author used a custom hash function (DJB2) instead of the built-in fnv.Hash64a to improve the performance of the map lookups.
The author replaced the use of strconv.ParseFloat with a custom function that converts the temperature values to integers, preserving the decimal point.
The author removed the need for separate name and temperature buffers by using slices of the read buffer directly.

4. How did the author address the issue of incomplete lines at the end of each buffer read?

The author introduced a "Trash Bin" goroutine that received the incomplete lines from the worker goroutines and merged them to form complete lines.
The Trash Bin goroutine used a mutex to ensure concurrency safety when accessing the file and merging the incomplete lines.

5. What was the final optimized configuration and performance of the program?

The final optimized configuration used 75 worker goroutines and a read buffer size of 4MB.
With this configuration, the author was able to achieve a consistent execution time of around 1.97 seconds, with a minimum of 1.956 seconds and a maximum of 1.986 seconds across 55 runs.

[02] Lessons Learned

1. What were the key insights the author gained from this optimization process?

The author was surprised by how slow the initial naive implementation was and how much optimization was required to achieve a reasonable performance.
The author acknowledged their lack of expertise in low-level optimizations and the importance of profiling and understanding the underlying system behavior.
The author found that techniques like custom data structures, hash functions, and byte-level processing can have a significant impact on performance, even in a high-level language like Go.

2. What were the author's thoughts on the challenge and the overall experience?

The author found the One Billion Row Challenge to be an interesting "toy exercise" to experiment with Go and test the limits of their programming skills.
The author was satisfied with the final optimized solution, which achieved a performance close to the best Java-based solutions, despite their self-proclaimed lack of expertise in low-level optimizations.
The author acknowledged that the optimizations were specific to the given dataset and may not generalize to other scenarios, highlighting the importance of understanding the problem domain and the data characteristics.

Constraint: Respond in English

Shared by Daniel Chen ·

Install fromChrome Web Store