Travelers ranked based on arena mode data

Forum rules
Please DO NOT POST BUGS on this forum. Please report (and vote) bugs on : https://boardgamearena.com/#!bugs
Post Reply
User avatar
ldj
Posts: 2
Joined: 02 April 2020, 01:49

Travelers ranked based on arena mode data

Post by ldj »

I recently wrote code to scrape and parse bga Tokaido replays, and I wrote an article describing the process and results. I collected all season 5 arena games with one of the top 10 players in that season, which was 1508 total results for those players. I got the following ranking, where the number indicates the average number of players beaten:
  1. Kinko (2.17)
  2. Chuubei (2.13)
  3. Hirotada (1.99)
  4. Yoshiyasu (1.90)
  5. Hiroshige (1.84)
  6. Zen-emon (1.78)
  7. Umegai (1.75)
  8. Satsuki (1.74)
  9. Mitsukuni (1.66)
  10. Sasayakko (1.05)
Note that these are the results for the just those 10 top players. You can find additional statistics and the data in the article. I found it surprising that Hiroshige performed well and that Sasayakko performed so terribly. Chuubei and Satsuki also performed better than their popularity would suggest.

Now that I have a way to collect bga replays, I hope to publish similar analyses in the future, such as for crossroads arena.
User avatar
Chauff
Posts: 97
Joined: 27 March 2020, 11:30

Re: Travelers ranked based on arena mode data

Post by Chauff »

That's great work, I really enjoyed reading it and I hope you'll be able to do some more. The last player disadvantage in this setup was interesting too.

I don't know the site's policy about scraping, but I think you can dm an admin to talk about it :)
User avatar
Een
Posts: 3854
Joined: 16 June 2010, 19:52

Re: Travelers ranked based on arena mode data

Post by Een »

Diclaimer: grumpy admin here.

Liam, this is a well written, quality article, BUT you neglected two important aspects in your methodology.

1) Studying the terms of service of the website before launching into this project. Scraping is forbidden by the BGA terms https://boardgamearena.com/legal?section=tosv (section VII. A. second bullet is explicit about this)

2) Ethics.

You write "Unfortunately, bga caps the number of replays each account may access. The account replay limit seems to be around 30 and refreshes after a few days. This restriction is quite disappointing, since it makes many potential large data science projects using the site difficult."

What's disappointing to me is that you didn't take the time to consider the ecology of your project in regards to the ecosystem you were tapping for data. You considered that it was just free and open, unlimited and boundless. Well, it's not. Retrieving archives from storage and serving them has a cost. This cost may be acceptable for a personal limited usage reviewing games on BGA as intended by the service, while being completely unacceptable for scraping. I had to spend a lot of time I could have used on improving other aspects of the service battling some shameless or (just thoughtless) massive scraping impacting negatively the service in the last few months. Also, did you consider that publishing a scraping tutorial like you did would potentially increase unwanted scraping and abuse? As a result, we'll probably need to put in place even more restrictive limitations. Neither did you mention that you took the time to contact the publisher to make sure that the game rights owners approved of this project analysing their game.

In a nutshell, you neglected negative externalities and passed them on to BGA.

Please update your article to mention:
- that scraping is not sanctioned by BGA's terms of service and may get users engaging into it banned from the service.
- that to be ethical, any data science project should be authorized by people potentially impacted by it.

Your article gives me the feeling of a bright, well meaning person, so I trust that you'll recognize that you focused too much on the technical angle of your project and didn't take the time to consider properly other aspects.
User avatar
ldj
Posts: 2
Joined: 02 April 2020, 01:49

Re: Travelers ranked based on arena mode data

Post by ldj »

After reading your response, I see that I made the mistake of undertaking the project while ignoring its implications and the reasoning behind restricting replay access. I apologize if I made your work more difficult. I definitely should have discussed this with a bga admin before completing the project. I have updated the article to mention that scraping is against the ToS and to encourage those interested in similar projects to first contact a bga admin. If it is deemed necessary, I have no issue taking down the article or code. Thanks for the well-crafted explanation.
User avatar
Een
Posts: 3854
Joined: 16 June 2010, 19:52

Re: Travelers ranked based on arena mode data

Post by Een »

Thanks for your answer and taking into account our feedback.

Your content is educational, and as long as there is a clear warning that scraping will be considered abuse unless specifically authorized (I would appreciate if you could make it a highlighted block in your article, in its current state along the text it can easily be missed by someone skimming), which is also educational, I don't feel like it's necessary to take it down (but someone else involved might feel differently, in which case I'll let you know).
User avatar
Kallsup
Posts: 13
Joined: 05 February 2017, 22:52

Re: Travelers ranked based on arena mode data

Post by Kallsup »

Awesome work! I have had plans to do this too, but never got to it.

Nice to see you mention the survey that was done as well. The biggest difference between your stats and and they survey results is that Hiroshige lands at fifth place which is surprising, but I'd also say that he's gotten a resurgence in the last year.

Another nice dataset to analyze would be a season of the tournament "tokaido championship" that ran for 5 seasons, which would curate the players behind the data even more.
Post Reply

Return to “Tokaido”