两个服务同时批量更新相同Pg表的相同记录时,产生如下错误:
[2024-01-18 09:17:52,172 host: 173.16.34.10 ERROR alarm [org.springframework.kafka.KafkaListenerEndpointContainer#3-0-C-1] com.zhongbao.alarm.service.impl.AlarmServiceImpl - <ZG-COMMON-JS-ALARM>批量更新失败:] org.springframework.dao.DeadlockLoserDataAccessException: ### Error updating database. Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected Detail: Process 2345 waits for ShareLock on transaction 177149283; blocked by process 30377. Process 30377 waits for ShareLock on transaction 177149284; blocked by process 2345. Hint: See server log for query details. Where: while updating tuple (2672,3) in relation "alarm_data_20240117" ### The error may exist in class path resource [mapper/AlarmDataMapper.xml] ### The error may involve com.zhongbao.alarm.mapper.AlarmDataMapper.updateAlarmBatch-Inline ### The error occurred while setting parameters
死锁的产生原因
两个更新事务(如批量update)在已获取锁(行锁)的情况下,都尝试去获取对方的锁(行锁),会互相等待,形成死锁 !死锁的详细信息,出现在服务器日志中。
推测:
1、基于update …的批量更新,在提交给Pg后,Pg自动生成批量更新的事务;
2、更新事务欲修改行数据时,会获取锁来锁定该行,直到事务执行完毕后释放所有的行锁(Pg会自动为更新添加行锁);
3、多个更新事务因修改记录相互等待对方的释放锁时,即发生死锁;
4、死锁后,直到其中一个事务超期结束,其锁被释放,死锁才打开;
查看Pg所有锁:
查看Pg的行锁:
死锁的解决办法
The correct way to perform such operations is to lock resources in the same order.
For example: in this case, accounts(表) can be locked in ascending order of their numbers.即批量更改表数据,需按照相同字段的相同排序去批量修改,保证修改顺序即可。
/// 顺序更新数据,以防止死锁 oldAlarmDataList.sort(new Comparator<AlarmData>() { @Override public int compare(AlarmData o1, AlarmData o2) { return o1.getSid().compareTo(o2.getSid()); } }); startTime = System.currentTimeMillis(); this.alarmAllMapper.updateAlarmAllBatch(oldAlarmDataList); log.info("更新消息数量:{}, 耗时:{}", oldAlarmDataList.size(), System.currentTimeMillis() - startTime);
死锁模拟(作者没模拟出来,MD…)
=> create table accounts ( acc_no integer, amount integer );
=> insert into accounts values(0,0),(1,1),(2,2),(3,3),(4,4);
=> create index on accounts(amount desc);
/// 自增函数,延迟1秒后加100 => CREATE FUNCTION inc_slow(n numeric) RETURNS numeric AS $$ SELECT pg_sleep(1); SELECT n + 100.00; $$ LANGUAGE SQL;
/// 安装扩展函数 => CREATE EXTENSION pgrowlocks;
/ 第一会话 顺序修改表记录 /// Since tuples on the table page are located in ascending order of the amount (exactly how we added them), /// they will also be updated in the same order. /// explain命令输出结果中的"Seq Scan"表示顺序扫描,即全表扫描。 => EXPLAIN (costs off) UPDATE accounts SET amount = inc_slow(amount); | QUERY PLAN | ---------------------------- | Update on accounts | -> Seq Scan on accounts | (2 rows)
/ 第二会话 逆序修改表记录 /// 禁止顺序扫描 => SET enable_seqscan = off;
/// 注意:这里的100如果写成100.00会因数据类型不一致导致索引失效 !!! /// explain命令输出结果中的"Index Scan"表示索引扫描。 => EXPLAIN (costs off) UPDATE accounts SET amount = inc_slow(amount) WHERE amount > 100; | QUERY PLAN | -------------------------------------------------------- | Update on accounts | -> Index Scan using accounts_amount_idx on accounts | Index Cond: (amount > 100.00) | (3 rows)
/// 查看sql线程(pid)对记录的加锁情况 => SELECT * FROM pgrowlocks('accounts');
当两个sql线程同时运行,其更新的记录"碰头"后,会产生死锁 !
其他
特别注意: 在执行"SET enable_seqscan = off;“之前,虽然已添加索引,但执行"UPDATE accounts SET amount = inc_slow(amount) WHERE amount > 100;”
时候,Pg依然进行的全表扫描!因此推测:对应较简单的查询,即使建立并使用索引过滤,Pg依然会进行全表扫描。
参考文档
https://blog.csdn.net/pg_hgdb/article/details/100653715