🏆

深入理解IO模型

类型
Java
学习时间
Jul 4, 2020
状态
进行中
参考资料
网站
封面
Snipaste_2021-08-30_16-40-24.jpg

文件系统的IO

文件系统的IO

内存IO

java普通io和buffer IO

普通IO
test目录下执行脚本 ./mysh 0 (0 代表走 最基本的file写的逻辑) ,同时开启另外一个shell窗口监控ll -h生成的out.txt的文件大小增加速度,如下肉眼可见的缓慢速度(KB级别)
notion image
 
打开strace追踪生成的文件,找到文件最大的为主线程代码
 
-rw-r--r-- 1 root root 4.1K Jun 27 12:12 OSFileIO.class -rw-r--r-- 1 root root 4.4K Jun 27 11:37 OSFileIO.java -rwxr-xr-x 1 root root 123 Jun 27 11:11 mysh* -rw-r--r-- 1 root root 14K Jun 27 12:12 out.7754 -rw-r--r-- 1 root root 4.4M Jun 27 12:15 out.7755 -rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7756 -rw-r--r-- 1 root root 1.3K Jun 27 12:12 out.7757 -rw-r--r-- 1 root root 1.1K Jun 27 12:12 out.7758 -rw-r--r-- 1 root root 1.4K Jun 27 12:12 out.7759 -rw-r--r-- 1 root root 506K Jun 27 12:15 out.7760 -rw-r--r-- 1 root root 41K Jun 27 12:15 out.7761 -rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7762 -rw-r--r-- 1 root root 1.4K Jun 27 12:12 out.7763 -rw-r--r-- 1 root root 1.3K Jun 27 12:12 out.7764 -rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7765 -rw-r--r-- 1 root root 41K Jun 27 12:15 out.7766 -rw-r--r-- 1 root root 12K Jun 27 12:15 out.7767 -rw-r--r-- 1 root root 13K Jun 27 12:15 out.7768 -rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7769 -rw-r--r-- 1 root root 1.2K Jun 27 12:12 out.7770 -rw-r--r-- 1 root root 794K Jun 27 12:15 out.7771 -rw-r--r-- 1 root root 1.9K Jun 27 12:15 out.7772 -rw-r--r-- 1 root root 183K Jun 27 12:15 out.txt #主线程追踪文件最大,这里是 out.7755
vim out.7755 set nu 显示行号,发现每一次system call 会写入10个字节的数据
1307 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0 1308 write(4, "123456789\n", 10) = 10 1309 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=691940400}, F UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) 1310 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0 1311 write(4, "123456789\n", 10) = 10 1312 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=702383900}, F UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) 1313 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0 1314 write(4, "123456789\n", 10) = 10 1315 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=712889000}, F UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) 1316 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0 1317 write(4, "123456789\n", 10) = 10 1318 futex(0x7f0980023978, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12089, tv_nsec=723286200}, F UTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) 1319 futex(0x7f0980023928, FUTEX_WAKE_PRIVATE, 1) = 0 1320 write(4, "123456789\n", 10) = 10
bufferIO
test目录下执行脚本 ./mysh 1 (0 代表走 bufferIO的逻辑) ,同时开启另外一个shell窗口监控ll -h生成的out.txt的文件大小速度明显变大(MB级别),发现系统调用一次写多8190个字节
notion image
strace结果
7420 futex(0x7f4d80023928, FUTEX_WAKE_PRIVATE, 1) = 0 7421 write(4, "123456789\n123456789\n123456789\n12"..., 8190) = 8190
总结
buffer的io将写入的内容存入数组,达到一定容量后再将这批数据,通过一次system call write 写入,而普通io是每写入一次都进行一次system call,system call需要进行用户态到内核态的切换,非常耗时,导致两者读写速度差几个数量级

java nio 包的 ByteBuffer

notion image

api使用

主要成员字段
//指针标记 private int mark = -1; //指针的当前位置 private int position = 0; //翻转后界限 private int limit; //最大容量 private int capacity; //当为堆外内存的时候,内存的地址 long address;
主要成员方法
//返回当前缓冲区的最大容量 public final int capacity() {return capacity;} //返回当前的指针位置 public final int position() {return position;} //返回当前的读写界限 public final int limit() {return limit;} //标记当前指针位置 public final Buffer mark() { mark = position; return this; } //恢复当前指针位置 public final Buffer reset() { int m = mark; if (m < 0) throw new InvalidMarkException(); position = m; return this; } //清空缓冲区,注意这里并不会清空数据,只是将各项指标初始化,后续再写入数据就直接覆盖 public final Buffer clear() { position = 0; limit = capacity; mark = -1; return this; } //切换读写模式 public final Buffer flip() { limit = position; position = 0; mark = -1; return this; } //重新从头进行读写,初始化指针和标记位置 public final Buffer rewind() { position = 0; mark = -1; return this; } //剩余可读可写的数量 public final int remaining() {return limit - position;} //当前是否可读/可写 public final boolean hasRemaining() {return position < limit;} //是不是只读的 public abstract boolean isReadOnly(); //是不是支持数组访问 public abstract boolean hasArray(); //获取当前缓存的字节数组(当hasArray返回为true的时候) public abstract Object array(); //是不是堆外缓冲区也就是直接缓冲区 public abstract boolean isDirect(); //取消缓冲区 final void discardMark() {mark = -1;} //压缩缓存的字节数组,并将position指向压缩后数组最后元素的下一位 public abstract ByteBuffer compact();
测试案例
@Test public void whatByteBuffer(){ // ByteBuffer buffer = ByteBuffer.allocate(1024); 堆内内存 ByteBuffer buffer = ByteBuffer.allocateDirect(1024);//堆外内存,由Unsafe类和VM类调用JNI实现 System.out.println("postition: " + buffer.position()); System.out.println("limit: " + buffer.limit()); System.out.println("capacity: " + buffer.capacity()); System.out.println("mark: " + buffer); buffer.put("123".getBytes());//实际存放的是"1","2","3"对应的ASCII值 System.out.println("-------------put:123......"); System.out.println("mark: " + buffer); buffer.flip(); //读写交替 System.out.println("-------------flip......"); System.out.println("mark: " + buffer); buffer.get(); System.out.println("-------------get......"); System.out.println("mark: " + buffer); buffer.compact(); System.out.println("-------------compact......"); System.out.println("mark: " + buffer); buffer.clear(); System.out.println("-------------clear......"); System.out.println("mark: " + buffer); } //postition: 0 limit: 1024 capacity: 1024 mark: java.nio.DirectByteBuffer[pos=0 lim=1024 cap=1024] -------------put:123...... mark: java.nio.DirectByteBuffer[pos=3 lim=1024 cap=1024] -------------flip...... mark: java.nio.DirectByteBuffer[pos=0 lim=3 cap=1024] -------------get...... mark: java.nio.DirectByteBuffer[pos=1 lim=3 cap=1024] -------------compact...... mark: java.nio.DirectByteBuffer[pos=2 lim=1024 cap=1024] -------------clear...... mark: java.nio.DirectByteBuffer[pos=0 lim=1024 cap=1024]
ps put "123" 其实转成了对应的ASCII码存储
notion image
案例流程演示
初始化1024的字节数组
初始化1024的字节数组
put 3个字节进去
put 3个字节进去
flip 写入转成读取
flip 写入转成读取
get 方法调用
get 方法调用
compact 压缩空白位  ,进入写模式
compact 压缩空白位 ,进入写模式
clear 清空
clear 清空
 

DirectByteBuffer

ByteBuffer buffer = ByteBuffer.allocateDirect(1024) // public static ByteBuffer allocateDirect(int capacity) { return new DirectByteBuffer(capacity); }
主要通过unsafe类分配堆外内存
堆外内存存在于JVM管控之外的内存区域,Java中对堆外内存的操作,依赖于Unsafe提供的操作堆外内存的native方法。

使用堆外内存的原因

  • 对垃圾回收停顿的改善。由于堆外内存是直接受操作系统管理而不是JVM,所以当我们使用堆外内存时,即可保持较小的堆内内存规模。从而在GC时减少回收停顿对于应用的影响。
  • 提升程序I/O操作的性能。通常在I/O通信过程中,会存在堆内内存到堆外内存的数据拷贝操作,对于需要频繁进行内存间数据拷贝且生命周期较短的暂存数据,都建议存储到堆外内存。
// Primary constructor // DirectByteBuffer(int cap) { // package-private super(-1, 0, cap, cap); boolean pa = VM.isDirectMemoryPageAligned(); int ps = Bits.pageSize(); long size = Math.max(1L, (long)cap + (pa ? ps : 0)); Bits.reserveMemory(size, cap); long base = 0; try { base = unsafe.allocateMemory(size); } catch (OutOfMemoryError x) { Bits.unreserveMemory(size, cap); throw x; } unsafe.setMemory(base, size, (byte) 0); if (pa && (base % ps != 0)) { // Round up to page boundary address = base + ps - (base & (ps - 1)); } else { address = base; } cleaner = Cleaner.create(this, new Deallocator(base, size, cap)); att = null; }
Cleaner继承自Java四大引用类型之一的虚引用PhantomReference(众所周知,无法通过虚引用获取与之关联的对象实例,且当对象仅被虚引用引用时,在任何发生GC的时候,其均可被回收),通常PhantomReference与引用队列ReferenceQueue结合使用,可以实现虚引用关联对象被垃圾回收时能够进行系统通知、资源清理等功能。如下图所示,当某个被Cleaner引用的对象将被回收时,JVM垃圾收集器会将此对象的引用放入到对象引用中的pending链表中,等待Reference-Handler进行相关处理。其中,Reference-Handler为一个拥有最高优先级的守护线程,会循环不断的处理pending链表中的对象引用,执行Cleaner的clean方法进行相关清理工作。
notion image
所以当DirectByteBuffer仅被Cleaner引用(即为虚引用)时,其可以在任意GC时段被回收。当DirectByteBuffer实例对象被回收时,在Reference-Handler线程操作中,会调用Cleaner的clean方法根据创建Cleaner时传入的Deallocator来进行堆外内存的释放。
 

RandomAccessFile 随机读写

RandomAccessFile既可以读取文件内容,也可以向文件输出数据。同时,RandomAccessFile支持“随机访问”的方式,程序快可以直接跳转到文件的任意地方来读写数据。
andomAccessFile允许自由定义文件记录指针,RandomAccessFile可以不从开始的地方开始输出,因此RandomAccessFile可以向已存在的文件后追加内容。如果程序需要向已存在的文件后追加内容,则应该使用RandomAccessFile。
常用方法
/** * Returns the unique {@link java.nio.channels.FileChannel FileChannel} * object associated with this file. * * <p> The {@link java.nio.channels.FileChannel#position() * position} of the returned channel will always be equal to * this object's file-pointer offset as returned by the {@link * #getFilePointer getFilePointer} method. Changing this object's * file-pointer offset, whether explicitly or by reading or writing bytes, * will change the position of the channel, and vice versa. Changing the * file's length via this object will change the length seen via the file * channel, and vice versa. * * @return the file channel associated with this file * * @since 1.4 * @spec JSR-51 */ public final FileChannel getChannel() { synchronized (this) { if (channel == null) { channel = FileChannelImpl.open(fd, path, true, rw, this); } return channel; } }
/** * Sets the file-pointer offset, measured from the beginning of this * file, at which the next read or write occurs. The offset may be * set beyond the end of the file. Setting the offset beyond the end * of the file does not change the file length. The file length will * change only by writing after the offset has been set beyond the end * of the file. * * @param pos the offset position, measured in bytes from the * beginning of the file, at which to set the file * pointer. * @exception IOException if {@code pos} is less than * {@code 0} or if an I/O error occurs. */ public void seek(long pos) throws IOException { if (pos < 0) { throw new IOException("Negative seek offset"); } else { seek0(pos); } }
案例
//测试文件NIO public static void testRandomAccessFileWrite() throws Exception { RandomAccessFile raf = new RandomAccessFile(path, "rw"); raf.write("hello world\n".getBytes()); raf.write("hello java\n".getBytes()); System.out.println("write------------"); System.in.read(); //指定离开始处偏移4位的位置写 raf.seek(4); raf.write("ooxx".getBytes()); System.out.println("seek---------"); System.in.read(); FileChannel rafchannel = raf.getChannel(); //mmap 堆外 和文件映射的 byte not objtect MappedByteBuffer map = rafchannel.map(FileChannel.MapMode.READ_WRITE, 0, 4096); map.put("@@@".getBytes()); //不是系统调用 但是数据会到达 内核的pagecache //曾经我们是需要out.write() 这样的系统调用,才能让程序的data 进入内核的pagecache //曾经必须有用户态内核态切换 //mmap的内存映射,依然是内核的pagecache体系所约束的!!! //换言之,丢数据 //github上找一些 其他C程序员写的jni扩展库,使用linux内核的Direct IO //直接IO是忽略linux的pagecache //是把pagecache 交给了程序自己开辟一个字节数组当作pagecache,动用代码逻辑来维护一致性/dirty。。。一系列复杂问题 System.out.println("map--put--------"); System.in.read(); // map.force(); // flush raf.seek(0); ByteBuffer buffer = ByteBuffer.allocate(8192); // ByteBuffer buffer = ByteBuffer.allocateDirect(1024); int read = rafchannel.read(buffer); //写入到ByteBuffer 相当于buffer.put() System.out.println(buffer); buffer.flip(); System.out.println(buffer); for (int i = 0; i < buffer.limit(); i++) { Thread.sleep(200); System.out.print(((char)buffer.get(i))); } }
执行文件脚本
第一个read阻塞住,此时内容已经写到pagecache中
root@Code:~/develop/test# ./mysh* 2 write------------
root@Code:~/develop/test# cat out.txt && pcstat out.txt hello world hello java +---------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |---------+----------------+------------+-----------+---------| | out.txt | 31 | 1 | 1 | 100.000 | +---------+----------------+------------+-----------+---------+
随便输入一行放开read阻塞
root@Code:~/develop/test# ./mysh* 2 write------------ 啊 seek--------- map--put-------- java.nio.HeapByteBuffer[pos=4096 lim=8192 cap=8192] java.nio.HeapByteBuffer[pos=0 lim=4096 cap=8192] @@@looxxrld hello java
root@Code:~/develop/test# cat out.txt && pcstat out.txt @@@looxxshibing hello java +---------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |---------+----------------+------------+-----------+---------| | out.txt | 4096 | 1 | 1 | 100.000 | +---------+----------------+------------+-----------+---------+

mmap内存映射

上述用filechannel.map做了直接内存映射如下所示 mmap系统调用会打开一个mem的FD描述符,此时可以通过channel直接修改文件不用再走系统调用的读写操作,而是直接通过mmap的映射找到对应pagecache进行操作
notion image

网络IO

网络IO