磁盤(pán)寫(xiě)滿導(dǎo)致MySQL復(fù)制失敗的解決方案
案例場(chǎng)景
今天在線上發(fā)現(xiàn)一個(gè)問(wèn)題,由于監(jiān)控沒(méi)有覆蓋到,某臺(tái)機(jī)器的磁盤(pán)被寫(xiě)滿了,導(dǎo)致線上MySQL主從復(fù)制出現(xiàn)問(wèn)題。問(wèn)題如下:
localhost.(none)>showslavestatus\G ***************************1.row*************************** Slave_IO_State: Master_Host:10.xx.xx.xx Master_User:replica Master_Port:5511 Connect_Retry:60 Master_Log_File: Read_Master_Log_Pos:4 Relay_Log_File:relay-bin.001605 Relay_Log_Pos:9489761 Relay_Master_Log_File: Slave_IO_Running:No Slave_SQL_Running:No Last_Errno:13121 Last_Error:Relaylogreadfailure:Couldnotparserelaylogevententry. Thepossiblereasonsare:themaster'sbinarylogiscorrupted(youcancheckthisbyrunning 'mysqlbinlog'onthebinarylog),theslave'srelaylogiscorrupted(youcancheckthisby running'mysqlbinlog'ontherelaylog),anetworkproblem,theserverwasunabletofetcha keyringkeyrequiredtoopenanencryptedrelaylogfile,orabuginthemaster'sor slave'sMySQLcode.Ifyouwanttocheckthemaster'sbinarylogorslave'srelaylog, youwillbeabletoknowtheirnamesbyissuing'SHOWSLAVESTATUS'onthisslave.
于是查看error log,發(fā)現(xiàn)error log中的內(nèi)容如下:
2021-03-31T11:34:39.367173+08:0011[Warning][MY-010897][Repl]StoringMySQLusernameor passwordinformationinthemasterinforepositoryisnotsecureandisthereforenot recommended.PleaseconsiderusingtheUSERandPASSWORDconnectionoptionsforSTARTSLAVE; seethe'STARTSLAVESyntax'intheMySQLManualformoreinformation. 2021-03-31T11:34:39.368161+08:0012[ERROR][MY-010596][Repl]Errorreadingrelaylog eventforchannel'':binlogtruncatedinthemiddleofevent;consideroutofdiskspace 2021-03-31T11:34:39.368191+08:0012[ERROR][MY-013121][Repl]SlaveSQLforchannel'':Relay logreadfailure:Couldnotparserelaylogevententry.Thepossiblereasonsare:themaster's binarylogiscorrupted(youcancheckthisbyrunning'mysqlbinlog'onthebinarylog),the slave'srelaylogiscorrupted(youcancheckthisbyrunning'mysqlbinlog'ontherelaylog), anetworkproblem,theserverwasunabletofetchakeyringkeyrequiredtoopenanencrypted relaylogfile,orabuginthemaster'sorslave'sMySQLcode.Ifyouwanttocheckthe master'sbinarylogorslave'srelaylog,youwillbeabletoknowtheirnamesbyissuing'SHOW SLAVESTATUS'onthisslave.Error_code:MY-013121 2021-03-31T11:34:39.368205+08:0012[ERROR][MY-010586][Repl]Errorrunningquery,slaveSQL threadaborted.Fixtheproblem,andrestarttheslaveSQLthreadwith"SLAVESTART".We stoppedatlog'mysql-bin.000446'position9489626
從描述中可以看到,error log是比較智能的,發(fā)現(xiàn)了磁盤(pán)問(wèn)題,并提示我們需要"consideroutofdiskspace"
解決問(wèn)題
登錄服務(wù)器,很快就發(fā)現(xiàn)是MySQL所在的服務(wù)器磁盤(pán)使用率達(dá)到100%了,問(wèn)題原因跟error log中的內(nèi)容一致。
現(xiàn)在就解決這個(gè)問(wèn)題?;镜乃悸肪褪乔謇泶疟P(pán)文件,然后重新搭建復(fù)制關(guān)系,這個(gè)過(guò)程似乎比較簡(jiǎn)單,但是實(shí)際操作中,在搭建復(fù)制關(guān)系的時(shí)候出現(xiàn)了下面的報(bào)錯(cuò):
###基于gtid的復(fù)制,想重新搭建復(fù)制關(guān)系 localhost.(none)>resetslave; ERROR1371(HY000):Failedpurgingoldrelaylogs:Failedduringlogreset localhost.(none)>resetslaveall; ERROR1371(HY000):Failedpurgingoldrelaylogs:Failedduringlogreset
第一步:因?yàn)閺?fù)制是基于gtid進(jìn)行的,所以直接記錄show slave status的狀態(tài)后,就可以重新reset slave,并利用change master語(yǔ)句來(lái)重建復(fù)制關(guān)系了。
但是卻出現(xiàn)上面的報(bào)錯(cuò),從報(bào)錯(cuò)信息看是mysql無(wú)法完成purge relay log的操作,這看起來(lái)不科學(xué)。好吧,既然你自己不能完成purge relay logs的操作,那就讓我來(lái)幫你吧。
第二步:手工rm -f 刪除所有的relay log,發(fā)現(xiàn)報(bào)錯(cuò)變成了:
localhost.(none)>resetslaveall; ERROR1374(HY000):I/Oerrorreadinglogindexfile
嗯,好吧,問(wèn)題沒(méi)有得到解決。
然后思考了下,既然不能通過(guò)手工reset slave 來(lái)清理relay log,直接stop
slave 然后change master行不行呢?
第三步:直接stop slave,然后change master,不執(zhí)行reset slave all的語(yǔ)句,結(jié)果如下:
localhost.(none)>changemastertomaster_host='10.13.224.31', ->master_user='replica', ->master_password='eHnNCaQE3ND', ->master_port=5510, ->master_auto_position=1; ERROR1371(HY000):Failedpurgingoldrelaylogs:Failedduringlogreset
得,問(wèn)題依舊。
第四步:反正復(fù)制已經(jīng)報(bào)錯(cuò)斷開(kāi)了,執(zhí)行個(gè)start slave看看,結(jié)果戲劇性的一幕出現(xiàn)了:
localhost.(none)>startslave; ERROR2006(HY000):MySQLserverhasgoneaway Noconnection.Tryingtoreconnect... Connectionid:262 Currentdatabase:***NONE*** QueryOK,0rowsaffected(0.01sec) localhost.(none)> [root@~]#
執(zhí)行start slave之后,實(shí)例直接掛了。
到這里,復(fù)制徹底斷開(kāi)了,從庫(kù)實(shí)例已經(jīng)掛了。
第五步:看看實(shí)例還能不能重啟,嘗試重啟實(shí)例,發(fā)現(xiàn)實(shí)例還能起來(lái)。實(shí)例重新起來(lái)后,查看復(fù)制關(guān)系,結(jié)果如下:
localhost.(none)>showslavestatus\G ***************************1.row*************************** Slave_IO_State:Queueingmastereventtotherelaylog Master_Host:10.xx.xx.xx Master_User:replica Master_Port:5511 Connect_Retry:60 Master_Log_File: Read_Master_Log_Pos:4 Relay_Log_File:relay-bin.001605 Relay_Log_Pos:9489761 Relay_Master_Log_File: Slave_IO_Running:Yes Slave_SQL_Running:No Last_Errno:13121 Last_Error:Relaylogreadfailure:Couldnotparserelaylogevententry. Thepossiblereasonsare:themaster'sbinarylogiscorrupted(youcancheckthisbyrunning 'mysqlbinlog'onthebinarylog),theslave'srelaylogiscorrupted(youcancheckthisby running'mysqlbinlog'ontherelaylog),anetworkproblem,theserverwasunabletofetcha keyringkeyrequiredtoopenanencryptedrelaylogfile,orabuginthemaster'sorslave's MySQLcode.Ifyouwanttocheckthemaster'sbinarylogorslave'srelaylog,youwillbeable toknowtheirnamesbyissuing'SHOWSLAVESTATUS'onthisslave. Skip_Counter:0
復(fù)制關(guān)系依舊報(bào)錯(cuò)。
第六步:重新reset slave all看看,結(jié)果成功了。
localhost.(none)>stopslave; QueryOK,0rowsaffected(0.00sec) localhost.(none)>resetslaveall; QueryOK,0rowsaffected(0.03sec)
第七步:重新搭建復(fù)制關(guān)系并啟動(dòng)復(fù)制
localhost.(none)>changemastertomaster_host='10.xx.xx.xx', ->master_user='replica', ->master_password='xxxxx', ->master_port=5511, ->master_auto_position=1; QueryOK,0rowsaffected,2warnings(0.01sec) localhost.(none)>startslave; QueryOK,0rowsaffected(0.00sec) localhost.(none)>showslavestatus\G ***************************1.row*************************** Slave_IO_State:Waitingformastertosendevent Master_Host:10.xx.xx.xx Master_User:replica Master_Port:5511 Connect_Retry:60 ... Slave_IO_Running:Yes Slave_SQL_Running:Yes
發(fā)現(xiàn)實(shí)例的復(fù)制關(guān)系可以建立起來(lái)了。
一點(diǎn)總結(jié)
當(dāng)磁盤(pán)寫(xiě)滿的情況發(fā)生之后,mysql服務(wù)無(wú)法向元信息表中寫(xiě)數(shù)據(jù),relay log也可能已經(jīng)不完整了,如果直接清理了服務(wù)器上的磁盤(pán)數(shù)據(jù),再去重新change master修改主從復(fù)制關(guān)系,可能會(huì)出現(xiàn)報(bào)錯(cuò),不能直接修復(fù),因?yàn)檫@不是一個(gè)正常的主從復(fù)制關(guān)系斷裂場(chǎng)景。
所以,正確的做法應(yīng)該是:
1、清理服務(wù)器的磁盤(pán)
2、重啟復(fù)制關(guān)系斷開(kāi)的那個(gè)從庫(kù)
3、重新reset slave all、change master來(lái)搭建主從復(fù)制關(guān)系即可
如果有更好的方法,還請(qǐng)不吝賜教。
以上就是磁盤(pán)寫(xiě)滿導(dǎo)致MySQL復(fù)制失敗的解決方案的詳細(xì)內(nèi)容,更多關(guān)于MySQL復(fù)制失敗的解決方案的資料請(qǐng)關(guān)注本站其它相關(guān)文章!
版權(quán)聲明:本站文章來(lái)源標(biāo)注為YINGSOO的內(nèi)容版權(quán)均為本站所有,歡迎引用、轉(zhuǎn)載,請(qǐng)保持原文完整并注明來(lái)源及原文鏈接。禁止復(fù)制或仿造本網(wǎng)站,禁止在非www.sddonglingsh.com所屬的服務(wù)器上建立鏡像,否則將依法追究法律責(zé)任。本站部分內(nèi)容來(lái)源于網(wǎng)友推薦、互聯(lián)網(wǎng)收集整理而來(lái),僅供學(xué)習(xí)參考,不代表本站立場(chǎng),如有內(nèi)容涉嫌侵權(quán),請(qǐng)聯(lián)系alex-e#qq.com處理。