Have you heard about FIX protocol? It is a financial exchange protocol. It is used extensively as a de-facto format to process in many areas and the format itself it is kind of many dictionary key-value pairs.
So, can you make a quick parser to process FIX files? I did write a mini FIX parser in Java and it uses FlatCollections for tokenizing and the final numbers are really great. But let's clear the ground: most of the ideas are in fact not mine, and they are based on talks about "Mechanical Sympathy" (I recommend presentations of Martin Thomson) meaning that if you understand the hardware (or at least the compilers and the internal costs of it) you can achieve really of high numbers.
So I looked around to QuickFix library, a standard and opensource (complete) implementation of FIX protocol, but it also has some problems of how the code is running so I took all example of FIX protocol sample files. Files: around 450 files combined at 475KB of ASCII files and I setup my internal benchmark as following: considering that I will have them in memory, how quick can I parse them, give full tag to user and it is good enough info to recreate the data. As the code for one file should be really quick (if there is no allocation in file row splitting, which I already did), I made the following "benchmark": how many times in a second I can iterate these files (if they are already saved in memory), split them into rows and tokenize them. The short answer: between 700 to 895 iterations (using one core of Intel Core i7-4710HQ CPU @ 2.50GHz). The variation I think is related with CPU's Turbo. I am not aware of code having hidden allocations (so is allocation free). If there are few allocations (which were done before usage Flat Collections) you will get in 500-700 iterations range (or 2.5 Gbps processing speed)
So, if you have (on average) 800 iterations per second, you can parse around 380 MB/s FIX messages (or around 3 Gbps) using just one core of one laptop using Java (Java 8u61/Windows). If you want another statistic, most messages are few tens of bytes, so, it is safe assume that this parsing code scans 20 million messages/second.
I don't endorse switching your QuickFix to this minimal Fix implementation, but who knows, if you need a good starting point (and who knows, support ;) ) to write a very fast Quick parser, this is a good point to start.
So, if you want to look inside the implementation: