------------------------------------------------------------------------ ByteString : Fast, packed strings of bytes ------------------------------------------------------------------------ This library provides the Data.ByteString library -- strict and lazy byte arrays manipulable as strings -- providing very time and space efficient string and IO operations. For very large data requirements, or constraints on heap size, Data.ByteString.Lazy is provided, a lazy list of bytestring chunks. Efficient processing of multi-gigabyte data can be achieved this way. Requirements: > Cabal > GHC 6.4 or greater, or hugs Building: > runhaskell Setup.lhs configure --prefix=/f/g > runhaskell Setup.lhs build > runhaskell Setup.lhs install After installation, you can run the testsuite as follows: > cd tests ; make or > cd tests ; make hugs For the full test and benchmark suite, you need GHC and Hugs: > cd tests ; make everything Authors: ByteString was derived from the GHC PackedString library, originally written by Bryan O'Sullivan, and then by Simon Marlow. It was adapted, and greatly extended for darcs by David Roundy, and others. Don Stewart cleaned up and further extended the implementation. Duncan Coutts wrote much of the .Lazy code. Don, Duncan and Roman Leshchinskiy wrote the fusion system. ------------------------------------------------------------------------ Performance, some random numbers (with GHC): This table compares the performance of common operations ByteString, from various string libraries. Size of test data: 21256k, Linux 3.2Ghz P4 FPS7 SPS PS [a] ++ 0.028 ! ! 1.288 length 0.000 0.000 0.000 0.131 pack 0.303 0.502 0.337 - unpack 3.319* 1.630 7.445 - compare 0.000 0.000 0.000 0.000 index 0.000 0.000 0.000 0.000 map 2.762* 2.917 4.813 7.286 filter 0.304 2.805 0.954 0.305 take 0.000 0.000 0.024 0.005 drop 0.000 0.000 11.768 0.130 takeWhile 0.000 1.498 0.000 0.000 dropWhile 0.000 1.985 8.447 0.130 span 0.000 9.289 11.144 0.131 break 0.000 9.383 11.268 0.133 lines 0.052 1.114 1.367 2.790 unlines 0.048 ! ! 10.950 words 1.344 2.128 5.644 4.184 unwords 0.016 ! ! 1.305 reverse 0.024 12.997 13.018 1.622 concat 0.000 12.701 11.459 1.163 cons 0.016 2.064 8.358 0.131 empty 0.000 0.000 0.000 0.000 head 0.000 0.000 0.000 0.000 tail 0.000 0.000 14.490 0.130 elem 0.000 1.490 0.001 0.000 last 0.000 - - 0.143 init 0.000 - - 1.147 inits 0.414 - - ! tails 0.460 - - 1.136 intersperse 0.040 - - 10.517 any 0.000 - - 0.000 all 0.000 - - 0.000 sort 0.168 - - ! maximum 0.024 - - 0.183 minimum 0.025 - - 0.185 replicate 0.000 - - 0.053 findIndex 0.096 find 0.120 - - 0.000 elemIndex 0.000 - - 0.000 elemIndicies 0.008 - - 0.314 foldl 0.148 spanEnd 0.000 snoc 0.016 filterChar 0.031 filterNotChar 0.124 join 0.016 split 0.032 findIndices 0.408 splitAt 0.000 lineIndices 0.029 breakOn 0.000 breakSpace 0.000 splitWith 0.329 dropSpace 0.000 dropSpaceEnd 0.000 joinWithChar 0.017 join / 0.016 zip 0.960 zipWith 0.892 isSubstringOf 0.039 isPrefixOf 0.000 isSuffixOf 0.000 count 0.021 Key: FPS6 = FPS 0.6 SPS = Simon Marlow's packedstring prototype PS = Data.PackedString [a] = [Char] - = no function exists ! = stack or memory exhaustion ------------------------------------------------------------------------ == Stress testing really big strings Doing some stress testing of FPS, here are some results for 0.5G strings. 3.2Ghz box, 2G physical mem. Size of test data: 524288k Size of test data: 524288k Char8 Word8 Effectively O(1) or O(m) where m < n all 0.000 0.000 any 0.000 0.004 break 0.000 0.000 breakChar 0.000 0.000 breakSpace 0.000 compare 0.000 concat 0.000 drop 0.000 dropSpace 0.000 dropSpaceEnd 0.000 dropWhile 0.000 0.000 elem 0.000 0.000 elemIndex 0.000 0.000 elemIndexLast 0.000 0.000 empty 0.000 head 0.000 0.000 index 0.000 0.000 init 0.000 last 0.000 0.000 length 0.000 notElem 0.000 0.000 span 0.000 0.000 spanChar 0.000 0.000 spanEnd 0.000 0.000 splitAt 0.000 tail 0.000 take 0.000 takeWhile 0.000 0.000 isPrefixOf 0.000 isSuffixOf 0.000 addr1 0.000 addr2 0.000 O(n) ++ 0.676 map 6.080 5.868 cons 0.396 0.396 snoc 0.400 0.400 find 3.240 split 1.204 1.200 lines 2.000 foldl 3.804 unwords 0.552 reverse 0.884 findIndex 3.128 filterChar 0.756 0.732 filter/='f' 8.265 7.012 filterNotChar 4.456 3.388 join 0.400 sort 4.344 maximum 0.776 0.764 minimum 0.772 0.776 replicate 0.008 0.000 elemIndices 0.240 0.240 lineIndices 1.092 joinWithChar 0.400 0.400 isSubstringOf 0.052 count 0.748 slow O(n) words 38.722 group 77.261 groupBy 96.226 inits 32.430 tails 23.225 findIndices 13.841 15.825 splitWith 18.445 19.225 zip 33.926 zipWith 33.562