背景:我们常用 HashMap 作为我们Java开发时的 K-V 数据存储结构(如 id-person,这个 ID 对应这个人)。我们知道他们的数据结构么,它的 Hash 值是什么意义。Hash 冲突是怎么解决的。我们带着这 2 个问题将 HashMap 做个整体剖析。(其实还有一个问题是,它怎么进行动态扩容的)
一、HashMap 的数据结构是什么。
下面是 HashMap 中的源码。其实 HashMap 的本质是 Node 数组;K-V 结构对应的基础数据结构就是以下源码
transient HashMap.Node<K, V>[] table; static class Node<K,V> implements Map.Entry<K,V> { //Hash 值 final int hash; //Key 值 final K key; //Value 值 V value; //当前 Node 对应的下个 Node(用于解决 Hash 冲突;稍后讲解) Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } }
根据上面代码我们知悉,HashMap 中基本的数据结构是 Node。
Map persons = new HashMap<String,String>(); //put 方法会新建一个 Node //hash=k.hashCode() ^ k >>> 16 ; hash 是数组所在位置 //K="1",V="jack";next=null;(无 Hash 冲突情况) persons.put("1","jack"); persons.put("2","john");
上述的 Node 数据结构中的 hash 值的意义是将 Node 均匀的放置在数组[]中。获得 hash 值后使用在 table[(n – 1) & hash] 位置置值。
/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */ static final int hash(Object var0) { int var1; return var0 == null?0:(var1 = var0.hashCode()) ^ var1 >>> 16; }
二、HashMap 怎么处理 hash 冲突
上述说道,Node 中的 hash 值是用 hash 算法得到的,目的是均匀放置,那如果 put 的过于频繁,会造成不同的 key-value 计算出了同一个 hash,这个时候 hash 冲突了,那么这 2 个 Node 都放置到了数组的同一个位置。HashMap 是怎么处理这个问题的呢?
解决方案:HashMap 的数据结构是:数组 Node[]与链表 Node 中有 next Node.
(1)如果上述的 persons.put(“1”,”jack”);persons.put(“2”,”john”); 同时计算到的 hash 值都为 123,那么 jack 先放在第一列的第一个位置 Node-jack,persons.put(“2”,”john”);执行时会将 Node-jack 的 next(Node) = Node(john),Jack 的下个节点将指向 Node(john)。
(2)那么取的时候呢,persons.get(“2”),这个时候取得的 hash 值是 123,即 table[123],这时 table[123]其实是 Node-jack,Key 值不相等,取 Node-jack 的 next 下个 Node,即 Node-John,这时 Key 值相等了,然后返回对应的 person.
三、HashMap 怎么进行动态扩容
JDK1.7 的逻辑相对简单,JDK1.8 使用了红黑树 TreeMap 相对复杂,现在用 1.7 的进行讲解。本质上其实一样。
void resize(int newCapacity) { //传入新的容量 Entry[] oldTable = table; //引用扩容前的 Entry 数组 int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { //扩容前的数组大小如果已经达到最大(2^30)了 threshold = Integer.MAX_VALUE; //修改阈值为 int 的最大值(2^31-1),这样以后就不会扩容了 return; } Entry[] newTable = new Entry[newCapacity]; //初始化一个新的 Entry 数组 transfer(newTable); //!!将数据转移到新的 Entry 数组里 table = newTable; //HashMap 的 table 属性引用新的 Entry 数组 threshold = (int) (newCapacity * loadFactor);//修改阈值 } void transfer(Entry[] newTable) { Entry[] src = table; //src 引用了旧的 Entry 数组 int newCapacity = newTable.length; for (int j = 0; j < src.length; j++) { //遍历旧的 Entry 数组 Entry<K, V> e = src[j]; //取得旧 Entry 数组的每个元素 if (e != null) { src[j] = null;//释放旧 Entry 数组的对象引用(for 循环后,旧的 Entry 数组不再引用任何对象) do { Entry<K, V> next = e.next; int i = indexFor(e.hash, newCapacity); //!!重新计算每个元素在数组中的位置 e.next = newTable[i]; //标记[1] newTable[i] = e; //将元素放在数组上 e = next; //访问下一个 Entry 链上的元素 } while (e != null); } } } static int indexFor(int h, int length) { return h & (length - 1); }
扩容的方式是新建一个 newTab,是 oldTab 的 2 倍。遍历 oldTab,将 oldTab 赋值进对应位置的 newTab。与 ArrayList 中的扩容逻辑基本一致,只不过 ArrayList 是当前容量+(当前容量>>1)。
JDK1.8 实现 resize()方式
/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */ final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab != null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); if (loTail != null) { loTail.next = null; newTab[j] = loHead; } if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }