Generating a large test file with Scala like the Linux dd command

I was writing functionality to merge download chunks yesterday. This functionality takes a file that is downloaded in chunks and mergers these chunks back into a single file. Writing the test class I did not want my test to depend on downloading a file in chunks, so I needed a way to quickly generate a number of chunks that could be merged. Under Linux the dd command is available that generates a file of a given size that contains random contents. I wrote a small helper class that does exactly this, with the extra twist that it splits the file into a number of chunks.

This is the code:

class GenerateTestChunks {

  import Splitter._
  def generate(f: File, size: Long, numChunks: Int, chunkSize: Int): LinkedHashSet[Chunk] = {
    val chunks = LinkedHashSet[Chunk]()
    val seed = System.currentTimeMillis()
    val random = new Random(seed)
    val data = new Array[Byte](chunkSize)
    for(i <- 1 to numChunks) {
      val chunkFile = new File(f.getParentFile, f.getName + f"-$i%06d$CHUNK_FILE_EXT")
      val startChunk = (i - 1) * chunkSize
      writeFile(chunkFile, data)
      chunks += new Chunk(i, new URL(""), chunkFile, startChunk, chunkSize)
  private def writeFile(f: File, data: Array[Byte]) = {
    var out: FileOutputStream = null

    try {
      out = FileUtils.openOutputStream(f)
    } finally {

Note: The contents of the chunks is generated using the method nextBytes of the class Random. It fills a buffer with the given size with randomly generated bytes (see highlighted line 15). Very simple, but effective.

This code could very easily be adapted to generate a single very large file, like the Linux dd command. Of course you can easily change the code to Java if that is your preference.

To keep things simple and comprehensible only the code for the chunk generator is shown. If you are interested in the rest of the code please feel free to checkout my Github.

Hope this is of help to someone. Happy coding.

Leave a Reply

Your email address will not be published. Required fields are marked *