PostgreSQL 跟踪checkpointer出现死锁

2024-04-02 19:04:59 843人浏览安东尼

摘要

gdb跟踪checkpointer进程,出现死锁,Mark一下. 跟踪checkpointer进程,查看共享内存中的信(heckpointerShmem->requests)

gdb跟踪checkpointer进程,出现死锁,Mark一下.

跟踪checkpointer进程,查看共享内存中的信(heckpointerShmem->requests)

(gdb) p CheckpointerShmem->requests[150] ... $16 = {rnode = {spcNode = 1663, dbNode = 16402, relNode = 26185}, forknum = MaiN_FORKNUM, segno = 0} (gdb) p CheckpointerShmem->requests[200] Cannot access memory at address 0xf9fb18 ...

然后,请求checkpoint的进程报错

testdb=# update t_wal_ckpt set c2 = 'C2#'||substr(c2,4,40); UPDATE 8192 testdb=# checkpoint; 2019-01-07 12:30:32.114 CST [1418] PANIC:  stuck spinlock detected at RequestCheckpoint, checkpointer.c:1050 2019-01-07 12:30:32.114 CST [1418] STATEMENT:  checkpoint; 2019-01-07 12:30:37.081 CST [1390] PANIC:  stuck spinlock detected at FirstCallSinceLastCheckpoint, checkpointer.c:1376 2019-01-07 12:30:38.610 CST [1370] LOG:  background writer process (PID 1390) was terminated by signal 6: Aborted 2019-01-07 12:30:38.610 CST [1370] LOG:  terminating any other active server processes 2019-01-07 12:30:38.611 CST [1392] WARNING:  terminating connection because of crash of another server process 2019-01-07 12:30:38.611 CST [1392] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnORMally and possibly corrupted shared memory. 2019-01-07 12:30:38.611 CST [1392] HINT:  In a moment you should be able to reconnect to the database and repeat your command. 2019-01-07 12:30:38.613 CST [1558] WARNING:  terminating connection because of crash of another server process 2019-01-07 12:30:38.613 CST [1558] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2019-01-07 12:30:38.613 CST [1558] HINT:  In a moment you should be able to reconnect to the database and repeat your command. PANIC:  stuck spinlock detected at RequestCheckpoint, checkpointer.c:1050 server closed the connection unexpectedly     This probably means the server terminated abnormally     before or while processing the request. The connection to the server was lost. Attempting reset: 2019-01-07 12:30:54.041 CST [1560] FATAL:  the database system is in recovery mode Failed. !>

尝试重新连接,发现DB已coredump.

[xdb@localhost ~]$  [xdb@localhost ~]$ psql -d testdb 2019-01-07 14:10:16.114 CST [1629] FATAL:  the database system is in recovery mode psql: FATAL:  the database system is in recovery mode

执行恢复

[xdb@localhost ~]$ pg_ctl start pg_ctl: another server might be running; trying to start server anyway waiting for server to start....2019-01-07 14:11:50.821 CST [1632] FATAL:  lock file "postmaster.pid" already exists 2019-01-07 14:11:50.821 CST [1632] HINT:  Is another postmaster (PID 1370) running in data directory "/data/xdb/pg111db"?  stopped waiting pg_ctl: could not start server Examine the log output. [xdb@localhost ~]$ find /data/xdb -name postmaster.pid /data/xdb/pg111db/postmaster.pid [xdb@localhost ~]$ rm -rf /data/xdb/pg111db/postmaster.pid [xdb@localhost ~]$ pg_ctl start waiting for server to start....2019-01-07 14:12:44.578 CST [1639] LOG:  could not bind IPv6 address "::1": Address already in use [xdb@localhost ~]$ ps -ef|grep postgres xdb       1370     1  0 12:01 pts/0    00:00:02 /appdb/atlasdb/pg11.1/bin/postgres xdb       1389  1370  0 12:01 ?        00:00:00 [postgres] <defunct> xdb       1641  1332  0 14:12 pts/0    00:00:00 grep --color=auto postgres [xdb@localhost ~]$ kill -9 1370 [xdb@localhost ~]$ pg_ctl start waiting for server to start....2019-01-07 14:13:33.125 CST [1648] LOG:  listening on IPv6 address "::1", port 5432 2019-01-07 14:13:33.125 CST [1648] LOG:  listening on IPv4 address "127.0.0.1", port 5432 2019-01-07 14:13:33.142 CST [1648] LOG:  listening on Unix Socket "/tmp/.s.PGSQL.5432" .2019-01-07 14:13:34.361 CST [1649] LOG:  database system was interrupted; last known up at 2019-01-07 12:26:22 CST 2019-01-07 14:13:34.818 CST [1649] LOG:  database system was not properly shut down; automatic recovery in progress 2019-01-07 14:13:34.863 CST [1649] LOG:  redo starts at 1/48F9ED08 .2019-01-07 14:13:35.467 CST [1649] LOG:  invalid record length at 1/4914FF58: wanted 24, Got 0 2019-01-07 14:13:35.467 CST [1649] LOG:  redo done at 1/4914FF30 2019-01-07 14:13:35.467 CST [1649] LOG:  last completed transaction was at log time 2019-01-07 12:28:37.521542+08 2019-01-07 14:13:35.977 CST [1648] LOG:  database system is ready to accept connections  done server started

经分析,是因为共享内存结构中的CheckpointerShmem->ckpt_lck导致的.
在跟踪checkpointer进程时,执行

SpinLockRelease(&CheckpointerShmem->ckpt_lck);

释放lock后,不再出现上述问题.

您可能感兴趣的文档:

--结束END--

本文标题: PostgreSQL 跟踪checkpointer出现死锁

本文链接: https://lsjlt.com/news/49510.html(转载时请注明来源链接)

有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

猜你喜欢

PostgreSQL 跟踪checkpointer出现死锁

gdb跟踪checkpointer进程,出现死锁,Mark一下. 跟踪checkpointer进程,查看共享内存中的信(heckpointerShmem->requests)...

99+

2024-04-02
PostgreSQL出现死锁该如何解决

目录什么是数据库死锁定位死锁死锁可能原因及解决办法1、索引使用不当导致的死锁问题2、不同事务之间的访问顺序问题避免死锁的建议附：数据库中常见的死锁原因与解决方案总结什么是数据库死锁 ...

99+

2024-04-02
日常SQL数据库死锁跟踪及处理

DECLARE @spid INTDECLARE @blk INTDECLARE @count INTDECLARE @index INTDECLARE @lock TINYINT SET @lock=0 ...

99+

2024-04-02
Google跟踪代码管理器出现404错误

如果在使用Google跟踪代码管理器时遇到了404错误，可能有以下几种原因和解决方法：1. 跟踪代码管理器容器被删除：检查一下跟踪代...

99+

2023-09-26

404错误
mysql出现死锁如何解决

mysql出现死锁的解决方法打开mysql服务器监控，终止系统中的一个或多个死锁进程，直至打破循环环路，使系统从死锁状态中解除出来。通过在表上建立一个聚集索引，实现解决死锁；从一个或多个进程中抢占足够数量的资源，分配给死锁进程，以打破死锁状...

99+

2024-04-02
MySQL中出现死锁的原因有哪些

这篇文章给大家介绍MySQL中出现死锁的原因有哪些，内容非常详细，感兴趣的小伙伴们可以参考借鉴，希望对大家能有所帮助。　　MySQL死锁问题原因有哪些　　1、MySQL常用存储引擎的锁机制　　MyISAM和...

99+

2024-04-02
Java项目中出现死锁如何解决

Java项目中出现死锁如何解决？针对这个问题，这篇文章详细介绍了相对应的分析和解答，希望可以帮助更多想解决这个问题的小伙伴找到更简单易行的方法。死锁是这样一种情形：多个线程同时被阻塞，它们中的一个或者全部都在等待某个资源被释放。由于线程被无...

99+

2023-05-31

java 死锁 ava
mysql出现死锁的原因及解决方案

本文主要给大家介绍mysql出现死锁的原因及解决方案，文章内容都是笔者用心摘选和编辑的，具有一定的针对性，对大家的参考意义还是比较大的，下面跟笔者一起了解下mysql出现死锁的原因及解决方案吧。mysql都...

99+

2024-04-02
mysql出现死锁的必要条件是什么

今天小编给大家分享一下mysql出现死锁的必要条件是什么的相关知识点，内容详细，逻辑清晰，相信大部分人都还太了解这方面的知识，所以分享这篇文章给大家参考一下，希望大家阅读完这篇文章后有所收获，下面我们一起来...

99+

2023-05-25

mysql
postgresql数据库中出现锁表如何解决

一、出现场景锁表通常发生在 DML（ insert 、update 、delete ）语句中，例如：程序 A 对 A 表的 a 数据进行修改，修改过程中产生错误，没有 commit 也没有 rollback ，这个时候程序 B...

99+

2023-08-30

数据库 postgresql mysql oracle
Java中避免出现死锁的方法有哪些

今天就跟大家聊聊有关Java中避免出现死锁的方法有哪些，可能很多人都不太了解，为了让大家更加了解，小编给大家总结了以下内容，希望大家根据这篇文章可以有所收获。避免死锁的技术：加锁顺序加锁时限死锁检测加锁顺序当多个线程需要相同的一些锁，但是按...

99+

2023-05-31

java 死锁 ava
怎么写一组会出现死锁的ABAP程序

这篇文章主要介绍“怎么写一组会出现死锁的ABAP程序”，在日常操作中，相信很多人在怎么写一组会出现死锁的ABAP程序问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”怎么写一组会出现死锁的ABAP程序”的疑惑有所...

99+

2023-06-03
解决Java执行Cmd命令出现的死锁问题

目录问题原因及解决方法方法封装参考问题之前研究了Java通过执行cmd命令从而触发Android打包的思路,但是发现Android打包成功之后,后面的代码逻辑就不走了(连输出都没有...

99+

2024-04-02
mysql kill进程后出现killed死锁问题及解决

目录mysql kill进程后出现killed死锁经常会出现这样的场景场景通过下列语句查询事务情况查看表锁信息总结mysql kill进程后出现killed死锁经常会出现这样的场景有一张3亿的表，现在要对这张表进行删...

99+

2024-01-29

mysql kill进程 mysql出现killed死锁 mysql killed死锁
发现操作系统的数据库出现死锁如何处理

where q.address = s.sql_addressand q.hash_value = s.sql_hash_valueand s.paddr = p.addrand exists (...

99+

2024-04-02
golang 中合并排序的递归/并行实现中出现死锁

php小编西瓜发现，在golang中使用递归或并行实现合并排序时，有可能出现死锁的问题。合并排序是一种常用的排序算法，可以有效地将一个大数组分解成多个小数组进行排序，然后再合并起来。然...

99+

2024-02-10
为什么下面的 go 程序会出现死锁错误“致命错误：所有 goroutine 都在睡眠 - 死锁！”

在Go语言中，死锁是一个常见的错误，当所有的goroutine都处于睡眠状态时，就会出现致命错误："致命错误：所有goroutine都在睡眠 - 死锁！"。这种情况通常发生在多个gor...

99+

2024-02-09

go语言
在 go 例程中使用两个 fmt.println 时会出现死锁吗？

在 Go 语言中，使用两个 fmt.Println() 打印函数会导致死锁吗？这是一个常见的问题，让我们来解答一下。首先，要了解死锁的概念。死锁是指两个或多个进程互相等待对方完成的情况...

99+

2024-02-09
使用 WaitGroups 和 Buffered Channels 的 Go 代码中出现死锁的原因是什么？

php小编百草在这篇文章中将解答一个常见问题：“使用 WaitGroups 和 Buffered Channels 的 Go 代码中出现死锁的原因是什么？”在Go语言中，WaitGro...

99+

2024-02-09

go语言
当填充通道的函数调用未嵌入 Goroutine 中时，为什么会出现死锁？

当填充通道的函数调用未嵌入Goroutine中时，会出现死锁的原因是因为通道的发送和接收操作是阻塞的。如果在主Goroutine中调用填充通道的函数，并且该函数内部没有将填充操作放入新...

99+

2024-02-10