Pg批量更新数据产生的死锁问题

两个服务同时批量更新相同Pg表的相同记录时,产生如下错误:

[2024-01-18 09:17:52,172  host: 173.16.34.10  ERROR  alarm  [org.springframework.kafka.KafkaListenerEndpointContainer#3-0-C-1]  com.zhongbao.alarm.service.impl.AlarmServiceImpl  - <ZG-COMMON-JS-ALARM>批量更新失败:]
org.springframework.dao.DeadlockLoserDataAccessException:
### Error updating database.  Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 2345 waits for ShareLock on transaction 177149283; blocked by process 30377.
Process 30377 waits for ShareLock on transaction 177149284; blocked by process 2345.
  Hint: See server log for query details.
  Where: while updating tuple (2672,3) in relation "alarm_data_20240117"
### The error may exist in class path resource [mapper/AlarmDataMapper.xml]
### The error may involve com.zhongbao.alarm.mapper.AlarmDataMapper.updateAlarmBatch-Inline
### The error occurred while setting parameters

死锁的产生原因

两个更新事务(如批量update)在已获取锁(行锁)的情况下,都尝试去获取对方的锁(行锁),会互相等待,形成死锁 !死锁的详细信息,出现在服务器日志中。

推测:
1、基于update …的批量更新,在提交给Pg后,Pg自动生成批量更新的事务;
2、更新事务欲修改行数据时,会获取锁来锁定该行,直到事务执行完毕后释放所有的行锁(Pg会自动为更新添加行锁);
3、多个更新事务因修改记录相互等待对方的释放锁时,即发生死锁;
4、死锁后,直到其中一个事务超期结束,其锁被释放,死锁才打开;

查看Pg所有锁: SELECT * FROM pg_locks;
查看Pg的行锁: CREATE EXTENSION pgrowlocks; SELECT * FROM pgrowlocks('表名');

死锁的解决办法

The correct way to perform such operations is to lock resources in the same order.
For example: in this case, accounts(表) can be locked in ascending order of their numbers.即批量更改表数据,需按照相同字段的相同排序去批量修改,保证修改顺序即可。

   /// 顺序更新数据,以防止死锁
   oldAlarmDataList.sort(new Comparator<AlarmData>() {
       @Override
       public int compare(AlarmData o1, AlarmData o2) {
           return o1.getSid().compareTo(o2.getSid());
       }
   });
   startTime = System.currentTimeMillis();
   this.alarmAllMapper.updateAlarmAllBatch(oldAlarmDataList);
   log.info("更新消息数量:{}, 耗时:{}",
           oldAlarmDataList.size(), System.currentTimeMillis() - startTime);

死锁模拟(作者没模拟出来,MD…)

=> create table accounts (
	acc_no  integer,
	amount  integer
);
=> insert into accounts values(0,0),(1,1),(2,2),(3,3),(4,4);
=> create index on accounts(amount desc);
/// 自增函数,延迟1秒后加100
=> CREATE FUNCTION inc_slow(n numeric) RETURNS numeric AS $$
  SELECT pg_sleep(1);
  SELECT n + 100.00;
$$ LANGUAGE SQL;
/// 安装扩展函数
=> CREATE EXTENSION pgrowlocks;
/ 第一会话 顺序修改表记录
/// Since tuples on the table page are located in ascending order of the amount (exactly how we added them),
/// they will also be updated in the same order. 
/// explain命令输出结果中的"Seq Scan"表示顺序扫描,即全表扫描。
=> EXPLAIN (costs off) UPDATE accounts SET amount = inc_slow(amount);

|           QUERY PLAN         
|  ----------------------------
|   Update on accounts
|     ->  Seq Scan on accounts
|  (2 rows)
/ 第二会话 逆序修改表记录
/// 禁止顺序扫描
=> SET enable_seqscan = off;
/// 注意:这里的100如果写成100.00会因数据类型不一致导致索引失效 !!!
/// explain命令输出结果中的"Index Scan"表示索引扫描。
=> EXPLAIN (costs off)
UPDATE accounts SET amount = inc_slow(amount) WHERE amount > 100; 

|                         QUERY PLAN                       
|  --------------------------------------------------------
|   Update on accounts
|     ->  Index Scan using accounts_amount_idx on accounts
|           Index Cond: (amount > 100.00)
|  (3 rows)
/// 查看sql线程(pid)对记录的加锁情况
=> SELECT * FROM pgrowlocks('accounts');

当两个sql线程同时运行,其更新的记录"碰头"后,会产生死锁 !

其他

特别注意: 在执行"SET enable_seqscan = off;“之前,虽然已添加索引,但执行"UPDATE accounts SET amount = inc_slow(amount) WHERE amount > 100;”
时候,Pg依然进行的全表扫描!因此推测:对应较简单的查询,即使建立并使用索引过滤,Pg依然会进行全表扫描。

参考文档
https://blog.csdn.net/pg_hgdb/article/details/100653715