Saturday, September 26, 2015

Changelog for 2015-09-26 (Hotfix)

Fixes

  • Fixed updated library causing internal server errors in certain cases.

Changelog for 2015-09-26

Changes

  • Added section to WoTManager translation guide about Import functionality.
  • Removed ability for clans to manually reload members. Clan member changes are automatically loaded each hour, and whenever a clan member signs into the site, their clan data is automatically updated. There's no need to have this, and it can lead to race conditions (for join/leave events).
  • Added a minimum combination check to Code Types, to prevent a random choices and random length combination that gives a very low number of combinations (because this can cause code creation to fail due to not enough combinations).
  • Added Note of when automatic code creation occurs to the Code Settings tab.
  • Added Delete link on Codes lists.
  • Added slight fade to Code Action links on Attendance and Valid Public Codes pages.
  • When creating a new Match Type, One Battle per Match now defaults to true as this setting produces more logical results, especially when paired with setting (or having it auto-determined via a replay) the Result for a Match.
  • Added better error page for timeouts.
  • Added better error page for Security Violations typically caused by having more than one tab open editing an item.
  • Added better error page for File Not Found (for replay downloading).

Fixes

  • Added check to hopefully prevent issues with duplicate join events and a single (invalid) leave event because created, seemingly due to short-term mismatches in data returned by different WG API methods.
  • Fixed Twitter widget not loading due to CSP conflicts with changes to the way Twitter loads the widget.
  • View code checking show (details) instead of edit permission for codes listing. Note this was purely an error in the client side display code and had no impact on security.
  • Fixed attempting to use a lowercase Prefix that has already been taken causing an internal server error.
  • Moved Features/Help images to Clan Tools (instead of using Imgur).

Backend Changes

  • Removed unused libraries.
  • Updated various libraries.
  • Increased maximum concurrent request handlers.

Also, Friday morning I finished setting up the second server. However, there was a small but noticeable increase in latency (the actual delay appears to vary based on the size of the data returned). This is the expected result, due to the database connection now being remote which adds network latency, plus the (temporal) cost of encryption for that connection.

As such, given that the current server--once I fixed the backend scheduler clobbering it--appears to be more than capable of supporting the current load, I see no reason to degrade performance just because.

Thursday, September 24, 2015

Performance Issues [Update 2]

Basically, I believe I've located the source of the performance issues and have addressed it, in light of everything not grinding to a halt (or even coming closing according to my monitoring) the past two nights despite having more traffic than Monday and about equal to Sunday.

If you're interested in the details, read on.

There's a backend scheduler that runs tasks (load battles, refresh clan membership, run queued tasks). The scheduler automatically creates threads to do this, which is good because it means these things can run concurrently (not actually, but that doesn't really matter for this).

The way this all ends up interacting, at most the following could be running in theory:

  • 2 Battle Loaders
  • Hourly refresh for all clans.
  • Daily general maintenance tasks (for the ASIA server around primetime for NA).

All of which is, I suspect, alone isn't the issue. However, there's another factor, those queued tasks.

There are basically three possible ways a task might be added to the queue: Manually triggered request to reload clan members, Payout Calculations, and User Stats loading.

That final one I think was the crux of the issue. It's triggered whenever a user signs in if their data hasn't been loaded recently. For players in a participating clan, this isn't triggered unless they joined the clan then signed in between two automatic updates. For players who aren't in a participating clan, this is basically always going to be triggered due to the likely distance between sign-in events.

Prior to Tuesday, there was no (sensible) upper limit for concurrent queue tasks, so if a number of people signed in at the same time and triggered player data refreshes for each of them, then every 5 seconds a new thread would be created to run one of those updates. This often will work fine because the refresh is quite quick, but it still does take some time due to needing to send two requests to the WG API. Add in some server load and they can start backing up.

Worst still, they don't go away when the (web) server is restarted, so they basically start clobbering the (entire) server as soon as the web server is started, and the more they back up, the more they clobber. This eventually leads to a massive amount of thrashing, leading to very high I/O wait times and I/O usage and grinds everything to a halt.

After adding a sensible limit to keep this in check, along with some other limits as detailed over in this blog post, the massive performance issues have completely stopped from what I can tell. Though, I suspect in peak load response times a bit slower as a reduced the number concurrent requests on Sunday in an ineffective stopgap measure.

Barring any unexpected complications, I'll switch to the new server for serving web requests tonight, and along with it will increase concurrent limit.

Tuesday, September 22, 2015

Performance Issues [Update 1]

Update 2

I've changed the way the backend scheduler works to prevent an excessive number of tasks from running at the same time. For the moment it's extremely limited with battle loading having one channel of work (mutex) and everything else sharing another channel of work. Queued operations, specifically Payout Calculation, may take a while to run due to this. However, I'm hopeful I'll be able to relax the restrictions soon as I'm in the process of setting up a second server, but for the time being this will hopefully prevent the server from falling over as has happened in the past two days.

Presently, I'm working on finishing the setup process for the new server. After that's setup and handling requests I'll reassess the performance situation and go from there.

Images

Due to reports of Imgur being compromised (which I used to host the images on the site, for the features/help sections), I've temporarily disabled images. I think I will host them on the Clan Tools server to avoid future issues such as this.

Monday, September 21, 2015

Performance Issues

Update 2

Update 1

First, I want to apologize for the issues the site was having last night. I know how frustrating it is when a service you rely on is slow or completely unuseable, and I'm sorry that's something you had to deal with.

I am looking for what is causing the problem, and more importantly solution(s). However, there doesn't seem to be a clear answer to the former question, which makes the latter a shot in the dark.



The issues, to me, don't make sense given my understanding of performance. There's clearly an bottleneck somewhere, but I can't figure out where.


Last night, the peak 1 minute load was 6.28 which was at one specific point, beyond that load was between 5.5 and 4.0. The server has 6 cores, and to my understanding 1.0 load represents 100% CPU usage for a single core, thus 6.0 is 100% usage for 6 cores.

Thus, outside of one minute, CPU load was below 100% across all cores.


Memory usage never went above 2GB (out of 3), and thus was well within reason as well.


Inspecting the queue, which shouldn't be causing timeouts anymore regardless, didn't show any backed up requests.


The only oddity I found was that sendmail was apparently stuck in a recursion loop, but I wouldn't think the amount of I/O it was causing would have been significant enough to introduce slowdowns. However, I don't have any log information regarding I/O, which I have since addressed.



Of course, none of the above is a solution, so at present I'm working to make a few key changes to reduce load on the server. I hope to have these changes implemented by tonight. Though, again, it really is just an educated guess at this point if this will have any impact.

Wednesday, September 16, 2015

Changelog for 2015-09-16

Changes

  • Re-enabled WoTcs.com clan history views since they appear to be working again.
  • Added exception for replay uploads to (hopefully) work around an issue related to Wine.

Fixes

  • Fixed WoTLabs signature on Player Lookup using incorrect server name, resulting in the signatures displayed being for NA players of equal names.
  • On Player Lookup, fixed no Tier Header being visible in Clan Wars Tanks if a tier section only had close to tanks.

Sunday, September 13, 2015

Changelog for 2015-09-12

Changes

  • Added Import feature for WoTManager exported attendance data (Clan Home > Import).

Fixes

  • Fixed Internal Server Error when trying to view the forum of a clan which doesn't exist in the system.
  • Fixed incorrect warning when creating new Subforums.
  • Fixed Events display. Now, only events directly regarding the clan should be displayed, be those join or leave events for clan members (to or from your clan, not any other clans). Additionally, current member name changes will also be displayed.

Friday, September 11, 2015

Changelog for 2015-09-11

New Code Type Options

There are several new code type options, two of the most interesting ones are designed to address the same request from clans in two different ways.

Make Valid Codes Public [public as in visible to all clan members]

This, as the name suggests, make all valid codes of that Code Type available to be seen by all clan members by visiting a specific page on Clan Tools. You can get to this page from the Attendance page (Clan Home > Attendance > Valid Public Codes) or Code page (Clan Home > Codes > Valid Public Codes).

Available on the Code Settings tab.

Auto Create Forward

This setting on the other hand allows having auto-creation of codes beyond the current day, up to 7 days. This makes it easy to copy a week's worth of codes for listing someplace else.

Available on the Code Settings tab.

The Other New Options

  • Auto-Create Start Time Day Offset and Auto-Create End Time Day Offset - These settings allow the auto-created codes to have start and end dates greater than one day.
  • Auto-Create Loose Duplicate Prevention - This setting reduces what it takes to prevent a code from being auto-created from having to have the exact same code type, for date, and from and to datetimes (down to the nanosecond) to just the same code type and for date. This setting, in general, deals with edge cases, so you probably don't need to worry about it. This is enabled by default for new code types; existing code types have it disabled to match prior behavior.

The above three settings are available on the Advanced Settings tab.

Other Changes

  • Related to the above, the Valid Codes list on the Attendance page now only displays codes that are valid or will be in 12 hours or less, to prevent having 7 days of future codes from showing up.
  • Added notice to Player and Clan Lookups regarding WoTcs.com no longer working, which Clan Tools used for member change history. I am exploring other options, no promises for the moment.
  • For Player and Clan lookup, the localization is now forced to english for all servers.
  • Added Total footer for Clan Tanks Lists and clan Player Stronghold Stats.
  • Improved warning message for Clan Lookup when members have accounts which WG API's doesn't treat as existing.
  • Take advantage of the clan members list on the WG API knowing about players to display all players, even those who other parts of the API don't treat as existing.

Fixes

  • Fixed an issue where users would be redirected to an invalid page in some cases.
  • Fixed checkboxes on clan members list being displayed pointlessly in some cases.
  • Fixed "X members accounts are locked" warning being duplicated when changing filters.

Wednesday, September 9, 2015

Influx of Clans

As you may already know, WoTManager is sadly shutting down. Further, WoTManager has graciously decided to direct their users to Clan Tools as a replacement.

This has already led to an influx of new and interested clans, which has the potential to cause performance issues.

As such, I just want to be clear that I am monitoring performance, and if there is a degradation in performance, I will address it by increasing the available resources.

Also, if you or your clan are noticing such performance drops, please contact me (https://clantools.us/contact) as the monitoring tools I have only show so much. Thanks.

Tuesday, September 8, 2015

Changelog for 2015-09-09

Changes

  • Removed restriction on Away Entry: From field requiring it be set to the current date or later for clan administrators.
  • Removed restriction on deleting Away Entries for clan administrators.

Fixes

  • Fixed internal server error when attempting to save an away entry just providing an end date.
  • Fixed invalid back handling in requests from external sources.

Monday, September 7, 2015

Changelog for 2015-09-07

Changes

  • On a User's profile page (e.g. mine), the user's clan will always be displayed, regardless of if they are a member of a participaing clan. If they aren't, the WG API is used to load clan information.
  • Changed the format of the User's clan details to match what is shown on the User Lookup.
  • Added Features page to give an overview of key Clan Tools features.
  • Added support for clans from ASIA server.
  • Added UTC Offset to Time, Date, and DateTime fields (e.g. Berlin (UTC+02:00) ).
  • Added new guide to help documentation: Transitioning from WoTManager

Fixes

  • Fixed display issues on the Activity Report Members List.

Friday, September 4, 2015

Changelog 2015-09-04

Negative Industrial Resource Values (is fixed)

Because WG doesn't remember a user's IR data when they leave their clan, and Clan Tools does, a negative value would be reported whenever a player left a clan then rejoined it. Clan Tools now checks the player's join date to detect when this has occured and record the correct delta.

Existing negative IR has also been corrected. If you notice any issues, please contact me: https://clantools.us/contact

Another issue was also fixed, which was Clan Tools for any new member would ignore any IR gained in the first day with the clan.

In Garage (is gone)

To be clear, this has no real impact. The interface is a bit cleaner in a few places (known available, which was often zero, is gone and any display of in garage is gone). Tank Locking data will still work as per usual.

As to why? It was unused. In large part because Wargaming requiring an API token from the individual player to access in garage data, every player in a clan would need to provide an API token to Clan Tools. Possible, but it hadn't happened yet.

I've also optimized the clan refreshing method to require one less API request per a clan and reduce the amount of data transmitted.

Timeout Errors

Intermittent timeout errors was an issue that would spring up every once in a while and didn't make much sense to me when I had looked into it previously. However, I recently had a realization and I believe I've fixed the issue, which appeared to be caused by requests that got stuck waiting due to high server load.

Server Load

On the topic of server load, I've also reduced the maximum number of simulations requests which can be served. This may seem to be a negative, however I suspect that the grinding to a halt of the entire site encountered a few sundays prior was caused by very high server load which pushed memory usage beyond the amount of available RAM and into the swap file (which slowed everything down). My monitoring indicates that most of the instances were rarely used anyway, so I don't expect this to be noticeable the vast majority of the time either way.

Feedback on this is welcome though; do you notice a change during times of high load?