Database dumps are created automatically every sunday evening and contain all data from the prior week. Each dump creates two files: players.parquet and matches.parquet. The files are stored in a directory structure that looks like this: /media/db_dumps/date_range=2023-04-23_2023-04-29/matches.parquet
* Please note that this data is not infallible. There is some dirty data that gets through my scrubbing. Notably there are some matches that have odd numbers of people (1, 3, 5, 7), which should never be the case. I would recommend doing your own data cleaning before usage.
Data Models
Database dumps are split into two files: matches and players. Each is its own data model. Below is a quick description of each model and its fields:
Match
map - (string) Map the match was played on
started_timestamp - (datetime) When the match started, UTC
duration - (timestamp) Timestamp of how long the game took (in-game duration)
game_id - (string) id of the game, this should map to the game_id/match_id values on other websites
avg_elo - (float) Average elo of all players in the match
num_players - (int) Number of players involved in the game
team_0_elo - (float) Average rating of the players on team 0
team_1_elo - (float) Average rating of the players on team 1
replay_enhanced - (boolean) Whether a replay was downloaded for this match and used to enhance player information
leaderboard - (string) Leaderboard the match was played on
mirror - (boolean) Whether all players are playing the same civ
patch - (int) The patch the match was played on (inferred from started_timestamp)
raw_match_type - (int) Passthrough of match_type from community api, indicates type of match (random map ladders: 6 = 1v1, 7 = 2v2, 8 = 3v3, 9 = 4v4, controller only random map: 66 = 1v1, 67 = 2v2, 68 = 3v3, 69 = 4v4)
game_type - (string) Type of game (random_map, empire_wars, deathmatch, etc...)
game_speed - (string) Game speed (slow = 1x, casual = 1.5x, normal = 1.7x, fast = 2x
starting_age - (string) Starting age of the match (dark, feudal, castle, imperial, post-imperial)
Player
civ - (string) Which civ the player used
winner - (boolean) If the player won
game_id - (str) id of the game, corresponds to a game id in the matches file and should map to game_id/match_id values on other websites
team - (int) Which team the player is on
feudal_age_uptime - (float) When the player advanced to feudal age, only available for replay enhanced matches
castle_age_uptime - (float) When the player advanced to castle age, only available for replay enhanced matches
imperial_age_uptime - (float) When the player advanced to imperial age, only available for replay enhanced matches
opening - (string) Opening used by the player for the match, inferred from replay_summary_raw, only available for replay enhanced matches
old_rating - (int) The player's rating at the start of the match
new_rating - (int) The player's rating at the end of the match
profile_id - (int) Player's profile id, unique per player, can be used to cross-reference the player on other sites
match_rating_diff - (float) Difference in rating between the two teams in the match, from this player's perspective
replay_summary_raw - (string/json) JSON field providing a summary units, buildings, and tech research in each age, only available for replay enhanced matches