【数据结构】(11) Map 和 Set

一、Map 和 Set 的简介

1、Set 和 Map

Map 和 Set 是集合类框架学习的最后一部分。Map 和 Set 都是接口，需要通过 TreeSet、HashSet 和 TreeMap、HashMap 实例化。注意，Set 实现了 Collection，Map 并没有。

Set 存放的是键（Key），必须唯一；而 Map 存放的是键值对（Key-Value），Value可以不唯一。

2、Set 和 Map 的应用场景

应用于需要搜索的场景。我们常用的搜索方法有：

遍历。（大数据量时，效率低，O(N)）。
二分查找。（效率较高，O(logN)，但要求数据有序）。

以上两种适用于静态数据的搜索，比如，用二分查找搜索动态数据，每次更改数据值都会使数据重新排序，导致效率低下。动态数据的搜索场景适合用 Set 和 Map，一是效率高，最高达 O(logN) 或 O(1)；二是动态扩展性能良好，能自动调整内部结构，保持原有性质。

3、Map 的使用

3.1、Map.Entry<K, V>

内部类，相当于二叉树中的 Node。

属性：

提供的方法：获取 key、value，设置 value，重写的 equals、toString、hashCode。

3.2、常用方法

补充：HashMap 不是线程安全的，currentHashMap 是线程安全的。

4、Set 的使用

4.1、常用方法

4.2、其它

Set 的底层是 Map，TreeSet 中默认 value 是 Object 对象插入TreeMap。

LinkedHashSet 是在 HashSet 的基础上，维护一个双向链表记录元素的插入顺序或访问顺序，使 HashSet 可以按照这个顺序迭代。

二、二叉搜索树

1、什么是二叉搜索树

左子树所有结点小于根结点。
右子树所有结点大于根结点。
所有子树都满足以上条件。

2、二叉搜索树的性能

如果搜索 key，从 root 开始判断，比 root.val 小直接往左走，比 root.val 小直接往右走。对于完全二叉搜索树来说，最差时间复杂度是树高 O(logN)；对于极度不平衡的二叉搜索树来说，最差时间复杂度是 O(N)，相当于遍历了所有结点。

因此，我们希望二叉搜索树尽量平衡。在源码中，TreeMap 和 TreeSet 的底层是用红黑树实现的，红黑树能通过各种操作使二叉搜索树平衡（高度差≤1）。

3、实现简单的二叉搜索树（不考虑平衡）

3.1、属性

public class BinarySearchTree {
    static class TreeNode {
        int val;
        TreeNode left;
        TreeNode right;
        
        public TreeNode(int val) {
            this.val = val;
        }
    }
    private TreeNode root;
    ......
}

3.2、查找

子树为空，结束查找，未找到，返回 null。
key 与子树根值相同，返回根。
key 与子树根值不相同：若 key 比子树根值小，继续查找子树的左子树；若 key 比子树根值大，继续查找子树的右子树。

    public TreeNode search(int key) {
        TreeNode curRoot = root; // 当前子树根节点
        while (curRoot!= null) {
            if (key == curRoot.val) { // 找到
                return curRoot;
            }
            else if (key < curRoot.val) { // 继续查找左子树
                curRoot = curRoot.left;
            }
            else {
                curRoot = curRoot.right; // 继续查找右子树
            }
        }
        return null; // 没有找到
    }

3.3、插入

树为空，新结点作为根。
比子树根小，插入到左子树中。
比子树根大，插入到右子树中。
子树为空，插入为其父结点的孩子节点。若空树是父节点的左子树（key 比父节点小），插入为左孩子；若空树是父节点的右子树（key 比父节点大），插入为右孩子。

    public void insert(int val) {
        TreeNode newNode = new TreeNode(val);
        if (root == null) { // 树为空，直接插入为根节点
            root = newNode;
            return;
        }
        TreeNode preRoot = null; // 记录父节点
        TreeNode curRoot = root;
        while (curRoot!= null) {
            if (val == curRoot.val) { // 已经存在，不再插入
                return;
            } else if (val < curRoot.val) { // 插入左子树
                    preRoot = curRoot;
                curRoot = curRoot.left;
            } else { // 插入右子树
                    preRoot = curRoot;
                curRoot = curRoot.right;
            }
        }
        if (val < preRoot.val) { // 插入为父节点的左子树
            preRoot.left = newNode;
        } else { // 插入为父节点的右子树
            preRoot.right = newNode;
        }
    }

3.4、删除

空树，不需要删除。
search，找到需要删除的节点 node。
如果 node 没有左孩子，右孩子替换 node 。node 为树根，右孩子作为树根；node 不为树根，node 的父节点接 node 的右孩子。
如果 node 没有右孩子，左孩子替换 node 。node 为树根，左孩子作为树根；node 不为树根，node 的父节点接 node 的左孩子。
如果 node 有左右孩子，找到 node 的左子树中的最大节点（最深的一个右孩子）或者右子树中的最小节点（最深的一个左孩子），替换 node。

情况1，delete 不是 parent：parent 的 left 接 target 的 right。

情况2，delete 是 parent：parent 的 right 接 target 的 right。

    public void delete(int val) {
//        if (root == null) { // 树为空，不再删除
//            return;
//        }
        // 找到待删除节点
        TreeNode preRoot = null; // 记录待删除节点的父节点
        TreeNode curRoot = root;
        while (curRoot!= null) {
            if (val == curRoot.val) { // 找到待删除节点
                break;
            } else if (val < curRoot.val) { // 继续查找左子树
                preRoot = curRoot;
                curRoot = curRoot.left;
            } else { // 继续查找右子树
                preRoot = curRoot;
                curRoot = curRoot.right;
            }
        }
        if (curRoot == null) { // 待删除节点不存在，包含树空的情况
            return;
        }
        deleteNode(curRoot, preRoot);
    }
    
    private void deleteNode(TreeNode node, TreeNode preNode) {
        // 待删除节点左子树为空
        if (node.left == null) {
            if(root == node) { // 待删除节点是根节点
                root = node.right;
            } else if (preNode.left == node) { // 待删除节点是父节点的左子树
                preNode.left = node.right;
            } else { // 待删除节点是父节点的右子树
                preNode.right = node.right;
            }
            return;
        }
        // 待删除节点右子树为空
        if (node.right == null) {
            if(root == node) { // 待删除节点是根节点
                root = node.left;
            } else if (preNode.left == node) { // 待删除节点是父节点的左子树
                preNode.left = node.left;
            } else { // 待删除节点是父节点的右子树
                preNode.right = node.left;
            }
            return;
        }
        // 待删除节点左右子树均不为空
        // 找到右子树的最小值，用最小值替换待删除节点，并删除最小值
        TreeNode parent = node;
        TreeNode target = node.right;
        // 找到右子树中最深的左孩子 target，即最小值
        while (target.left != null) {
            parent = target;
            target = target.left;
        }
        // 最小值替换待删除节点
        node.val = target.val;
        // 情况1：target 是右子树的根
        if (parent == node) {
            parent.right = target.right;
        } else { // 情况2：target 是右子树的孩子
            parent.left = target.right;
        }
    }

3.5、性能分析

插入、删除操作都要经历搜索操作，因此查找效率 O(logN) 代表了二叉搜索树各个操作的性能。

三、哈希表

3.1、什么是哈希表

通过哈希函数，输入 key 计算 value 的存放位置，通过这种哈希方法构造的结构就叫哈希表（散列表）。因为只需要计算一次哈希函数，所以删除、插入、搜索操作都是 O(1)。

3.2、哈希冲突

3.2.1、什么是哈希冲突

不同的 key，通过哈希函数，得到相同的映射。哈希冲突是必然发生的，我们需要尽可能降低哈希冲突发生的概率。

3.2.2、如何避免哈希冲突

设计合理的哈希函数：地址值域应包含映射值域；映射值分布均匀；比较简单。

常见的哈希函数：

① 直接定址法：hash(key) = a*key+b，根据 key 的分布确定线性函数。

② 除留余数法：hash(key) = key % p，p ≤ m 且尽量接近 m 的质数，m 是地址范围大小。

调节负载因子：负载因子 = 填入表中的元素个数 / 散列表长度。冲突率与负载因子的增减趋势一致，想降低冲突率，就要降低负载因子。数据是必须要添加的，因此只能增加散列表长度。在源码中，负载因子超过一定值，就会自动扩容哈希表。

3.2.3、如何解决哈希冲突

当冲突发生，我们要解决冲突，让每个元素都能填入。

闭散列地址法（开放）：冲突发生，表未满，找下一个空位置。

① 线性探测法：从冲突位置开始，依次往后找第一个空位置，插入元素。

缺点：容易造成冲突元素堆积；不能随意删除，需要设置伪删除标记。

② 二次探测法：Hash_i(key) = (Hash_0 ± i^2) % m，m 为散列表大小，i 为冲突次数。

缺点：虽然冲突堆积更分散，但还是会造成堆积。

散列表优点：不使用额外数据结构（链表）。

开散列（哈希桶 / 链地址法）：同映射的 key 为一个集合（桶），用链表串起来。源码用的哈希桶。
冲突严重时，一个桶的搜索效率不佳（太多元素堆积）。当桶超过一定长度，可以将这个桶转为搜索树或者哈希表。

3.3、实现简单的哈希桶

3.3.1、属性

public class HashBucket {
    static class Node {
        int key;
        int value;
        Node next;

        public Node(int key, int value) {
            this.key = key;
            this.value = value;
        }
    }

    public Node[] arr = new Node[10];
    public int usedSize; // 已有元素数量
    public static final float LOAD_FACTOR = 0.75f; // 装载因子阈值
    ......
}

3.3.2、添加元素

    // 哈希函数
    public int hash(int key) {
        return key % arr.length;
    }

    // 计算负载因子
    public float loadFactor() {
        return (float) usedSize / arr.length;
    }

    // 扩容
    public void resize() {
        Node[] newArr = new Node[arr.length * 2]; // 扩容为原数组的两倍
        // 遍历原数组，将元素添加到新数组中
        for (Node node : arr) {
            Node curr = node;
            while (curr != null) {
                int newIndex = hash(curr.key); // 计算新索引位置
                // 头插法，将元素添加到新数组中
                Node next = curr.next;
                curr.next = newArr[newIndex];
                newArr[newIndex] = curr;
                curr = next;
            }
        }
        arr = newArr;
    }

    // 添加元素
    public void add(int key, int value) {
        int index = hash(key); // 计算索引位置
        Node newNode = new Node(key, value);
        // 如果该位置为空，则直接添加
        if (arr[index] == null) {
            arr[index] = newNode;
        } else { // 如果该位置不为空，则遍历链表，找到对应的key，更新value
            Node curr = arr[index];
            while (curr.next!= null) {
                if (curr.key == key) {
                    curr.value = value;
                    return;
                }
                curr = curr.next;
            }
            // 如果遍历完链表，没有找到对应的key，则添加到链表尾部
            curr.next = newNode;
        }
        usedSize++;
        // 如果已有元素数量超过负载因子阈值，则扩容，目的是降低冲突概率
        if (loadFactor() >= LOAD_FACTOR) {
            resize();
        }
    }

3.3.3、根据 key 查找 value

    public int get(int key) {
        int index = hash(key); // 计算索引位置
        Node curr = arr[index];
        // 遍历链表，找到对应的key，返回value
        while (curr!= null) {
            if (curr.key == key) {
                return curr.value;
            }
            curr = curr.next;
        }
        return -1; // 没有找到对应的key
    }

3.3.4、删除元素

    public void delete(int key) {
        int index = hash(key); // 计算索引位置
        Node curr = arr[index];
        Node prev = null;
        // 遍历链表，找到对应的key，删除节点
        while (curr!= null) {
            if (curr.key == key) {
                if (prev == null) { // 如果是头节点，则直接删除
                    arr[index] = curr.next;
                } else { // 如果不是头节点，则将前节点的next指针指向当前节点的next指针
                    prev.next = curr.next;
                }
                usedSize--;
                return;
            }
            prev = curr;
            curr = curr.next;
        }
    }

3.3.5、泛型版

public class HashBucket2<K, V> {
    static class Node<K, V> {
        K key;
        V value;
        Node<K, V> next;

        public Node(K key, V value) {
            this.key = key;
            this.value = value;
        }
    }

    public Node<K, V>[] arr = (Node<K, V>[])new Node[10];

    public int usedSize; // 已有元素数量
    public static final float LOAD_FACTOR = 0.75f; // 装载因子阈值

    // 哈希函数
    public int hash(K key) {
        return key.hashCode() % arr.length;
    }

    // 计算负载因子
    public float loadFactor() {
        return (float) usedSize / arr.length;
    }

    // 扩容
    public void resize() {
        Node<K, V>[] newArr = (Node<K, V>[])new Node[arr.length * 2]; // 扩容为原数组的两倍
        // 遍历原数组，将元素添加到新数组中
        for (Node<K, V> node : arr) {
            Node<K, V> curr = node;
            while (curr != null) {
                int newIndex = hash(curr.key); // 计算新索引位置
                // 头插法，将元素添加到新数组中
                Node<K, V> next = curr.next;
                curr.next = newArr[newIndex];
                newArr[newIndex] = curr;
                curr = next;
            }
        }
        arr = newArr;
    }

    // 添加元素
    public void add(K key, V value) {
        int index = hash(key); // 计算索引位置
        Node<K, V> newNode = new Node<>(key, value);
        // 如果该位置为空，则直接添加
        if (arr[index] == null) {
            arr[index] = newNode;
        } else { // 如果该位置不为空，则遍历链表，找到对应的key，更新value
            Node<K, V> curr = arr[index];
            while (curr.next!= null) {
                if (curr.key.equals(key)) {
                    curr.value = value;
                    return;
                }
                curr = curr.next;
            }
            // 如果遍历完链表，没有找到对应的key，则添加到链表尾部
            curr.next = newNode;
        }
        usedSize++;
        // 如果已有元素数量超过负载因子阈值，则扩容，目的是降低冲突概率
        if (loadFactor() >= LOAD_FACTOR) {
            resize();
        }
    }

    // 根据key查找value
    public V get(K key) {
        int index = hash(key); // 计算索引位置
        Node<K, V> curr = arr[index];
        // 遍历链表，找到对应的key，返回value
        while (curr!= null) {
            if (curr.key.equals(key)) {
                return curr.value;
            }
            curr = curr.next;
        }
        return null; // 没有找到对应的key
    }

    // 删除元素
    public void delete(K key) {
        int index = hash(key); // 计算索引位置
        Node<K, V> curr = arr[index];
        Node<K, V> prev = null;
        // 遍历链表，找到对应的key，删除节点
        while (curr!= null) {
            if (curr.key.equals(key)) {
                if (prev == null) { // 如果是头节点，则直接删除
                    arr[index] = curr.next;
                } else { // 如果不是头节点，则将前节点的next指针指向当前节点的next指针
                    prev.next = curr.next;
                }
                usedSize--;
                return;
            }
            prev = curr;
            curr = curr.next;
        }
    }
}

重点：

类型改成泛型。
key 的哈希编码使用 hashCode 计算。
key 的比较使用 equals。

四、二叉搜索树和哈希表的对比

	TreeSet / TreeMap	HashSet / HashMap
key 大小是否有序	有序	无序
底层实现	红黑树	哈希桶
比较与重写	key 必须能够比较（自定义类重写 compareTo 或者传入自定义比较器，重写了 compare）。	自定义类必须重写 equals 和 hashCode

五、源码阅读

5.1、属性

5.2、哈希编码

5.3、构造函数

5.4、添加一个键值对

更错：(n-1) & hash 等价于 hash % arr.length。

5.5、扩容

5.6、树化

5.7、关键点总结

通过扩容代码得知，调用无参构造方法，第一次 put 添加元素，会分配 16 大小的内存。
通过扩容代码得知，正常扩容的情况，每次 2 倍扩容。
通过添加键值对的代码得知，链表转化为红黑树的第一个条件是，链表长度达到 8。
通过树化的代码得知，链表转化为红黑树的第二个条件是，数组容量达到 64。

六、OJ 题练习

6.1、只出现一次的数

136. 只出现一次的数字 - 力扣（LeetCode）

思路：遍历所有元素，不存在就添加，存在就删除，最后剩下的就是单身狗。使用 HashSet，它的搜索、删除、添加操作都是 O(1) 的复杂度，遍历一边就是 O(N)。

class Solution {
    public int singleNumber(int[] nums) {
        HashSet<Integer> set = new HashSet<>();
        // 遍历数组
        for(int num : nums) {
            // 如果不存在于 set，添加
            if(!set.contains(num)) {
                set.add(num);
            } else { // 存在则删除
                set.remove(num);
            }
        }
        // set 中剩下的那个就是单身狗
        return set.toArray(new Integer[0])[0];
    }
}

6.2、随机链表的复制

138. 随机链表的复制 - 力扣（LeetCode）

思路：

1、遍历一遍链表，记录当前节点的上一个节点，创建好当前节点后，用记录好的上一个节点与当前节点连接起来。但是 random 是随机的，无法在第一遍中复制。如果再遍历一次，依然无法直接复制 random，除非先找到旧 random，再计算当前节点与 random 节点的距离，最后根据距离找到新节点的 random 节点。非常麻烦。

2、将旧、新节点绑定为键值对（遍历一次），再通过搜索旧节点（next 和 random）找到绑定的新节点（第二次遍历）。使用HashMap实现。注：节点的地址是唯一的，可作key。

/*
// Definition for a Node.
class Node {
    int val;
    Node next;
    Node random;

    public Node(int val) {
        this.val = val;
        this.next = null;
        this.random = null;
    }
}
*/

class Solution {
    public Node copyRandomList(Node head) {
        Map<Node, Node> map = new HashMap<>();
        // 遍历旧链表，创建新节点，并与旧节点绑定
        Node tmp = head;
        while(tmp != null) {
            Node newNode = new Node(tmp.val);
            map.put(tmp, newNode);
            tmp = tmp.next;
        }
        // 遍历旧链表，搜索 map 中的旧 next 和 random 节点，找到对应的新节点，并连接
        tmp = head;
        Node newHead = map.get(tmp);
        Node newTmp = newHead;
        while(tmp != null) {
            newTmp.next = map.get(tmp.next);
            newTmp.random = map.get(tmp.random);
            tmp = tmp.next;
            newTmp = newTmp.next;
        }
        return newHead;
    }
}

6.3、宝石和石头

771. 宝石与石头 - 力扣（LeetCode）

思路：遍历宝石，放入 set（宝石唯一，不需要重复，且便于后面进行搜索）。遍历石头，存在于 set 就计数。

class Solution {
    public int numJewelsInStones(String jewels, String stones) {
        Set<Character> set = new HashSet<>();
        int cnt = 0;
        // 宝石放入 set
        for(int i = 0; i < jewels.length(); i++) {
            set.add(jewels.charAt(i));
        }
        // 遍历石头，是宝石的就计数
        for(int i = 0; i < stones.length(); i++) {
            char ch = stones.charAt(i);
            if(set.contains(ch)) {
                cnt++;
            }
        }
        return cnt;
    }
}

6.4、坏键盘

旧键盘 (20)__牛客网

<br/> 是换行，不管。第一行是应该输入，第二行是实际输入。

思路：英文字母坏键只输出大写，将输入转为大写。用 set 存放实际输入，让字符无重复，便于搜索（类似于宝石/实际输入和石头/应该输入）。遍历应该输入，不存在于 set 的就是坏键。不同于宝石和石头的是，坏键不能重复，用 set 存（需要搜索坏键是否已存）。

import java.util.Scanner;
import java.util.Set;
import java.util.HashSet;

public class Main {
    public static void main(String[] args) {
        Scanner in = new Scanner(System.in);
        while (in.hasNextLine()) { // 注意 while 处理多个 case
            String str1 = in.nextLine(); // 应该输入
            String str2 = in.nextLine(); // 实际输入
            // 实际输入转大写放入 set
            Set<Character> set = new HashSet<>();
            for(char ch : str2.toUpperCase().toCharArray()) {
                set.add(ch);
            }
            // 遍历应该输入（转大写），在 set 中搜索是否存在，不存在的是坏键盘，放入无重复的 brokenSet
            Set<Character> brokenSet = new HashSet<>();
            for(char ch : str1.toUpperCase().toCharArray()) {
                if(!set.contains(ch) && !brokenSet.contains(ch)) {
                    brokenSet.add(ch);
                    System.out.print(ch);
                }
            }
        }
    }
}