Hello,
I’ve been trying to debug to get more information about constantly failing archive_command. But I am unable to go further.
The errors and warnings started unexpectedly as, no changes in database were made by us. Latest maintenance was 2 days before the start of the problem.
Errors in logs are (repeatedly, maybe hundreds of times per day)
INFO 2025-03-05T18:15:50.174684Z 2025-03-05 18:15:49.681 UTC [34]: [1-1] db=,user= LOG: archive command failed with exit code 1
(....)
INFO The failed archive command was:
/utils/replication_log_processor
-disable_log_to_disk
-action=archive
-file_name=00000001000000E900000045
-local_file_path=pg_wal/00000001000000E900000045
-async_archive
-remote_storage
(....)
WARNING archiving write-ahead log file "00000001000000E900000045" failed too many times, will try again later
Metrics during the error period, are unreliable as well: (PINK marked period)
Queries were slow or failing during that period.
Instance configs:
availableMaintenanceVersions:
- POSTGRES_16_8.R20250112.01_14
backendType: SECOND_GEN
createTime: '2024-08-11T00:20:25.545Z'
databaseInstalledVersion: POSTGRES_16_6
databaseVersion: POSTGRES_16
gceZone: southamerica-east1-b
geminiConfig:
activeQueryEnabled: false
entitled: false
googleVacuumMgmtEnabled: false
indexAdvisorEnabled: false
oomSessionCancelEnabled: false
includeReplicasForMajorVersionUpgrade: false
instanceType: CLOUD_SQL_INSTANCE
kind: sql#instance
maintenanceVersion: POSTGRES_16_6.R20250112.01_03
region: southamerica-east1
satisfiesPzi: false
settings:
activationPolicy: ALWAYS
availabilityType: ZONAL
backupConfiguration:
backupRetentionSettings:
retainedBackups: 7
retentionUnit: COUNT
enabled: true
kind: sql#backupConfiguration
location: us
pointInTimeRecoveryEnabled: true
replicationLogArchivingEnabled: true
startTime: 02:00
transactionLogRetentionDays: 7
transactionalLogStorageState: CLOUD_STORAGE
connectorEnforcement: NOT_REQUIRED
dataDiskSizeGb: '10'
dataDiskType: PD_SSD
databaseFlags:
- name: cloudsql.logical_decoding
value: on
deletionProtectionEnabled: true
edition: ENTERPRISE
insightsConfig:
queryInsightsEnabled: true
queryPlansPerMinute: 5
queryStringLength: 1024
kind: sql#settings
locationPreference:
kind: sql#locationPreference
zone: southamerica-east1-b
maintenanceWindow:
day: 7
hour: 3
kind: sql#maintenanceWindow
updateTrack: stable
pricingPlan: PER_USE
replicationLagMaxSeconds: 31536000
replicationType: SYNCHRONOUS
settingsVersion: '61'
storageAutoResize: true
storageAutoResizeLimit: '0'
tier: db-f1-micro
userLabels:
env: prod
sqlNetworkArchitecture: NEW_NETWORK_ARCHITECTURE
state: RUNNABLE
upgradableDatabaseVersions:
- displayName: PostgreSQL 17
majorVersion: POSTGRES_17
name: POSTGRES_17
And the output of these commands:
SHOW archive_mode;
> archive_mode: always
SHOW archive_command;
> /utils/replication_log_processor -disable_log_to_disk -action=archive -file_name=%f -local_file_path=%p -async_archive -remote_storage
Fixed by disabling Point-in-time recovery and restarting the instance. I will enable PITR today again, but I would like to understand if there is any issue I must account for, before enabling it again.
