英文:
Read each line of file using MappedByteBuffer in java
问题
我正在处理一个任务,需要读取一个巨大的文件(大小约为1.5GB,大约有16000000条记录)。我面前有两个选项:
- 使用BufferedReader,可以逐行获取每一行数据。
- 使用MappedByteBuffer与FileChannel和RandomAccessFile。
在原始测试中,使用选项1读取记录数量需要大约2900毫秒,而选项2需要大约1450毫秒。
测试程序如下:
选项1:
public static void reviewBufferedReader () {
long lineNumber = 0;
String line = null;
try (BufferedReader b = Files.newBufferedReader(Paths.get("D:\\Temp Data Files\\Data1.txt"), StandardCharsets.UTF_8)) {
executor = RecordsDistributionExecutor.getInstance();
while ((line = b.readLine()) != null) {
lineNumber++;
}
} catch (Exception e) {
System.err.println("Error in reviewBufferedReader : "+e.getMessage());
} finally {
}
System.out.println("Total no. of lines: "+lineNumber);
}
选项2:
public static void reviewFileChannelWithMappedByteBuffer () {
long lineNumber = 0;
try (RandomAccessFile raFile = new RandomAccessFile("D:\\Temp Data Files\\Data2.txt", "r");
FileChannel inChannel = raFile.getChannel();){
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
buffer.load();
char c;
for(int i = 0; i< buffer.limit(); i++) {
c = (char) buffer.get();
if ('\n' == c) {
lineNumber++;
}
}
buffer.clear(); // do something with the data and clear/compact it.
} catch (Exception e) {
System.err.println("Error in reviewFileChannelWithMappedByteBuffer : "+e.getMessage());
} finally {
}
System.out.println("Total no. of lines: "+lineNumber);
}
正如我所说,选项2在基本测试中花费的时间较少。
我有一个问题,是否可以像选项1那样,在选项2中逐行读取文件数据。
谢谢,
Atul
英文:
I am working on a task where I have to read a huge file (~1.5 GB in size and approximately 16000000 records). I have 2 options in front of me:
- use BufferReader where I can get the each row in string.
- use MappedByteBuffer with FileChannel and RandomAccessFile.
In raw test, to read no. of records with option 1, it takes around 2900 ms and in option 2, it takes around 1450 ms.
The test programs are as follows:
Option 1:
public static void reviewBufferedReader () {
long lineNumber = 0;
String line = null;
try (BufferedReader b = Files.newBufferedReader(Paths.get("D:\\\\Temp Data Files\\Data1.txt"), StandardCharsets.UTF_8)) {
executor = RecordsDistributionExecutor.getInstance();
while ((line = b.readLine()) != null) {
lineNumber++;
}
} catch (Exception e) {
System.err.println("Error in reviewBufferedReader : "+e.getMessage());
} finally {
}
System.out.println("Total no. of lines: "+lineNumber);
}
Option 2:
public static void reviewFileChannelWithMappedByteBuffer () {
long lineNumber = 0;
try (RandomAccessFile raFile = new RandomAccessFile("D:\\\\Temp Data Files\\Data2.txt", "r");
FileChannel inChannel = raFile.getChannel();){
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
buffer.load();
char c;
for(int i = 0; i< buffer.limit(); i++) {
c = (char) buffer.get();
if ('\n' == c) {
lineNumber++;
}
}
buffer.clear(); // do something with the data and clear/compact it.
} catch (Exception e) {
System.err.println("Error in reviewFileChannelWithMappedByteBuffer : "+e.getMessage());
} finally {
}
System.out.println("Total no. of lines: "+lineNumber);
}
As I said, Option 2 takes less time with basic testing.
I have a question, is it possible to read the file data line by line in option 2 like option 1.
Thanks,
Atul
专注分享java语言的经验与见解,让所有开发者获益!
评论