Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis master-slave mode encounters errors after reaching a certain scale. #83

Open
Salvare330 opened this issue Oct 31, 2024 · 4 comments

Comments

@Salvare330
Copy link

Redis master-slave mode encounters errors after reaching a certain scale(RDBfile 60MB+).
slave:
1765:S 25 Sep 2024 18:06:10.296 * Full resync from master: d87a61aa0f144c8837c48ee045de84d72485ab1f:2824627352
1765:S 25 Sep 2024 18:06:10.412 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
1765:S 25 Sep 2024 18:06:10.565 # I/O error trying to sync with MASTER: Connection reset by peer
1765:S 25 Sep 2024 18:06:10.577 * Reconnecting to MASTER 10.113.7.15:6479 after failure
1765:S 25 Sep 2024 18:06:10.582 * MASTER <-> REPLICA sync started
1765:S 25 Sep 2024 18:06:10.586 * Non blocking connect for SYNC fired the event.
1765:S 25 Sep 2024 18:06:10.589 * Master replied to PING, replication can continue...
1765:S 25 Sep 2024 18:06:10.594 * Partial resynchronization not possible (no cached master)
1765:S 25 Sep 2024 18:06:15.556 * Full resync from master: d87a61aa0f144c8837c48ee045de84d72485ab1f:2824845204
1765:S 25 Sep 2024 18:06:15.672 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
1765:S 25 Sep 2024 18:06:15.815 # I/O error trying to sync with MASTER: Connection reset by peer
1765:S 25 Sep 2024 18:06:15.828 * Reconnecting to MASTER 10.113.7.15:6479 after failure
1765:S 25 Sep 2024 18:06:15.833 * MASTER <-> REPLICA sync started
1765:S 25 Sep 2024 18:06:15.836 * Non blocking connect for SYNC fired the event.
1765:S 25 Sep 2024 18:06:15.841 * Master replied to PING, replication can continue...
1765:S 25 Sep 2024 18:06:15.845 * Partial resynchronization not possible (no cached master)

@odumetz
Copy link

odumetz commented Dec 18, 2024

+1 having the same issue. Have tried so many configuration changes but replication is no longer working at all (increase replication log size, replication buffers, tried diskless replication, etc etc etc). Connection to the replica keeps getting lost. DB is not that large (full sync is 30MB)
Master Log:
[014308] 18 Dec 22:44:21.580 * Replica 10.10.28.20:6379 asks for synchronization
[014308] 18 Dec 22:44:21.581 * Full resync requested by replica 10.10.28.20:6379
[014308] 18 Dec 22:44:21.582 * Delay next BGSAVE for diskless SYNC
[014308] 18 Dec 22:44:26.717 * Starting BGSAVE for SYNC with target: replicas sockets
[014308] 18 Dec 22:44:26.748 * Background RDB transfer started by pid 2084
[014308] 18 Dec 22:44:26.964 # fork operation complete
[014308] 18 Dec 22:44:26.991 # Background transfer error
[014308] 18 Dec 22:44:26.991 # SYNC failed. BGSAVE child returned an error
[014308] 18 Dec 22:44:26.991 * Connection with replica 10.10.28.20:6379 lost.

Replica log:
[008748] 18 Dec 22:44:16.071 * MASTER <-> REPLICA sync started
[008748] 18 Dec 22:44:16.071 * Non blocking connect for SYNC fired the event.
[008748] 18 Dec 22:44:16.071 * Master replied to PING, replication can continue...
[008748] 18 Dec 22:44:16.071 * Partial resynchronization not possible (no cached master)
[008748] 18 Dec 22:44:21.259 * Full resync from master: 24eb1a270e365e170f2ac2a7c1cb458c618cb0f2:4953856
[008748] 18 Dec 22:44:21.466 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
[008748] 18 Dec 22:44:21.547 # I/O error trying to sync with MASTER: Unknown error
[008748] 18 Dec 22:44:21.568 * Reconnecting to MASTER 10.10.28.21:6379 after failure

@odumetz
Copy link

odumetz commented Dec 18, 2024

..Is this a bug that was known/fixed in 7.4.0 from 7.2.5? If yes, safe to swap in the new executables? (same aof/rdb format, etc.?)
I got replication to work partly by not using diskless replication removing buffer limits, but the .exe crashes when doing a full sync, after fetching the .rdb
Thanks!

@zkteco-home
Copy link
Owner

please tell me details to reproduce issue,including your conf file,how to run it to raise error?

@odumetz
Copy link

odumetz commented Jan 6, 2025

Thanks for replying - Conf is identical on master and replica (replica has additional replicaof statement).
Problem happens with master running, when replica starts fresh (no rdb or oaf files) or when adding a new replica. I even tried grabbing a copy of the rdb+aof files from the master to startup the replica, same behavior. rdb and aof sat at around 15M.
The .exe crashes, so I copied over event log info, WER dump and corresponding redis logs.
redis-conf.txt
redis-server-crash.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants