Aug 19, 2018 - Notes from the SOBotics war room - Building Boson

Boson, as described by the project, is both a framework to create any general purpose tracking bot for Stack Exchange, and a tracking bot built on top of the framework to report content that matches a given set of filters. It can be configured to report to various rooms in several formats. Though it sounds simple enough, the journey of designing an all purpose generic bot wasn’t that easy.

Initial talks, where did we get the idea from?

While discussing possible solutions for moderators to detect heated arguments in posts, the SOBotics team added an answer about how we can use the Stack Exchange API to create a very small bot to scan all comments, and the problem would be solved in a matter of seconds. Catija left a comment on our post:

take it in your own hands and develop some nice moderation bot. - Alas… as my SO rep might indicate - I am not a developer. Something like Heat Detector would probably work… though I’m not sure what it’s capable of. It’d be nice to have something that would alert me (daily? every six hours?) of new comments on posts that have been inactive for longer than a week, say? Or maybe something that I could configure to specifically add the posts I’m concerned about following

This was the spark that ignited the flame in our bot designing back rooms. There is an entire world out there in the other Stack Exchange sites where there are users who are not programmers. We had been restricted very much to just Stack Overflow (even though we had a partial side expansion called SEBotics), and had not interacted much with the non-developers from other sites.

While that immediate situation was solved by creating another bot (not by the SOBotics team), our plans extended beyond that single point and we wanted to explore the opportunities of having non-programmers who could spin up bots tailored to their needs, without requiring them to have any knowledge of what is running behind the scenes.

The idea was simple: Create an interface where users could get to pick and choose what they wanted to track, where they wanted to track it, and how they wanted to track it; and the bot would do all the work for them. The programming challenges, however, were many.

Laying data out, what to do and what not to do?

Tracking anything, anywhere is a fancy idea, but in order to quantify what anything and anywhere was, we had to put in some boundaries. There were Questions, Comments, Answers and Posts which were commonly queried by the users. They all had similar content, (a body, an owner, a creation date, etc). We decided to start working on those first and then expand. The design plan was then extended to cater to edit suggestions and revisions, as there were a lot of issues related to vandalism, and Belisarius was picking up a lot of that. Soon, thanks to a meta post, we were informed that users were interested in looking out for the creation of new tags. That had to be accommodated for too. The design plan soon grew larger, in order to satisfy the initial idea to track anything, and it excited us even more, as we realized that there was immense potential usage of the bot, if we managed to add in everything that users wanted to track.

Filtering the content was the next aspect which needed to be looked at. Just dumping all the data on the user would be quite useless, as they could just use the RSS feeds provided by Stack Exchange directly - why use the bot, then? So we decided to create multiple filters that could be applied to the content, such user reputation, post length, ID of the post, creation date, etc. It was at this moment that it was suggested that we split the project into two parts: a framework and a bot. This would give programmers the freedom to spin off their own bots using the framework, if they didn’t like the options provided by the bot.

The API quota was another issue that we were quite stressed about. This ended up reminding us about apicache, one of the (now abandoned) SOBotics projects to cache all the API requests in such a way that other bots could use data from that instead of calling the API directly. This resulted in Boson having a configurable option to collect data from any API which would return data in the same format as returned by the Stack Exchange API.

Relaying the data to the rooms was also another contentious issue. Many questions were raised. Should we be one-boxing the posts? Should we be displaying the name of the author? Should we list all of them in a single message?. This was solved in a similar way as filters. We created multiple printers to display the output. Two examples are the one-box printer and the multiple posts printer.

Similar to these, there were many other issues. For example, talking to trackers, multiple trackers in the same room, multiple trackers consuming the same data, and so on. We decided to follow the same path for most of the other issues, which was to generalize the different options available and let the user decide what their tracker needs to use.

Having a dashboard for the bot was another suggestion raised, to which we all agreed was a good idea at the mere mention of it. The search provided by Stack Exchange is not that great, and using that to track our reports would certainly be very hard. At the same time, while we were outlining these ideas for an universal bot, there was also increased discussions towards creating dashboards for our other projects (such as Hydrant for HeatDetector and Hippo for Belisarius). Announcing this out in the public helped us draw the attention of more developers.

Sensing the need for multiple dashboards (3 at that moment), we decided to create a generic dashboard - a dashboard that could be utilized by any bot to display their reports. That was the birth of Higgs.

The work on Higgs was started earlier, as we had to clear the backlog of getting the dashboards ready for the other bots. Hydrant, named after the multi-headed snake Hydra, was the first dashboard to be tested and it was well received by the regulars. Using this opportunity, we collected feedback about how Higgs could be made as user-friendly as possible.

Back on the drawing board, there had been multiple discussions about what dashboards to be created for Boson, and when. Do we create a separate dashboard every time a new bot was spun up by a user? Or do we create a generic dashboard where we dump all the reports generated by the bot? Or do we set a parameter which was configurable to create a new dashboard?
Lots of arguments and counter-arguments were thrown around. Finally it was decided to create a single dashboard and provide a unique identifier for each of the numerous trackers which could be created using the bot.

Now, another challenge is, what exactly do we display on the dashboard? We have a shiny new dash, but what to put in there? This question pulled us back into the room. Boson can be used to track potentially anything, including tags, badges, etc, which would not be that useful, if we displayed that on the dash. Another issue is that the dashboard would have a large content area, and a title. A few of the types we track, might not have one of those fields, so what do we display in those cases? Numerous similar questions were raised, and we decided to use the least obtrusive option possible in each case.

User friendliness, can a non-programmer use it as they need?

One concept which we always kept in mind was the non-programmer user friendliness. One way to help users to achieve what they need is by having a very intuitive user interface. Given that chat can only be used to relay messages, it was decided that we should be using a fully developed command line parser (complete with man pages) to receive commands from the users. The plan was to take the argument parser which is used generally for creating command line applications, cut it open, and replace the standard out (which is usually the terminal) with chat. In this way, most of the users who are familiar with any command line tool would find the bot relatively easy to use.

Another concept which is familiar with users is that of Installation Wizards. Many of the users who do not use command line on a day to day basis, would likely have more knowledge about wizards and would certainly find them more user friendly than command line. Therefore, we decided that having a wizard setup for the trackers would be an option as well. However this would be more time consuming than the command line, as the user had to go through and reply to the list of all the features present in the bot. The usage of either the wizard or the command line would totally depend on the user who is operating the bot and what they feel comfortable with.

Finally, the idea of a web-based front end, which could be made to generate the command to run a tracker on Boson was also discussed. However that would require another complete standalone service which would not be connected with Boson. Therefore the idea was left out of the minimum viable product.

Another aspect of user friendliness is that once the user has entered the required data, they need a visual confirmation that the system has registered the data accurately. This was achieved by making the bot add in a message with the list of parameters which it has parsed from the input data. Additionally, adding a name to the tracker would help the users to identify their trackers easily. Therefore an option to add a name to the tracker was also provided.

Present state of affairs, is it as nice as it sounds?

Boson is still in a very conceptual state, and most of the work is still on the drawing table. There are lot more challenges (including the API quota) which we would be facing once it’s released for general use. We have listed some of these problems and are still looking out for other potential problems.

As of now, Higgs is ready to be used and Boson is partially ready. The APIs are yet to be formulated, and will soon be ready.

All in all, work has begun on the entire project and is progressing at an active pace. We still need help and advice on it, as well as ideas about any other data filters which would be useful to the project. Please drop into chat and have a talk with us, and together, let’s make Stack Exchange a nicer place for all!

Jul 31, 2018 - Malfunctioning bots and how we deal with them

Bots, no matter how robust they are, are sometimes prone to errors and malfunctions. There’s a chance that any bot can go rogue by sending multiple chat messages at once, or adding multiple automatic flags on the same post. It doesn’t matter if you’re a room owner, bot developer, or a regular user of the room - we all need to be geared up and prepared for these emergency situations.

Does the situation warrant any action?

First, you must decide whether action really needs to be taken:

  • Are the reports all (or nearly all) unhelpful or redundant, such as being false-positives or duplicates? If the reports are useful, they’re probably OK.
  • Does the bot appear to have a developer working on it at the moment? If someone’s developing, the report spam is probably just a bug or a test, and the developer likely knows about it and is working to fix it.
  • Is the bot posting a truly large volume of reports, such that chatting and interacting with other bots becomes difficult?
    • Posting the same message 3 times in a row (like Smokey’s “conflicting feedback” reports) does not require drastic action; it’s just a minor glitch that maybe is worth talking to the bot developer or filing a GH issue over.
    • Posting the same message 30 times in a row or reporting nearly every post on the site makes chatting and interacting with other bots difficult, and requires intervention.

How do I act, if it warrants action?

If action is required, the first priority is to stop the reports in the short-term. If the quickest way to do that is to disable the bot entirely, so be it — the end result is the bot being offline until a developer can fix the problem and re-enable it.

Don’t perform any of the below steps unless you’re confident you know what you’re doing and you’re sure no one will get angry at you for doing it. If you’re not 100% confident in what you’re about to do, delegate the problem to someone else or try a different step.

If you’re not a RO, bot developer, or moderator, ping someone who is (if anyone’s around). If no one is available, proceed with caution.

If a developer of the bot is around, ping them as they likely have more knowledge of the bot’s internal workings.

If the bot stops malfunctioning, stop trying to fix it (even if you don’t think you did anything to fix it). Sometimes the problem will eventually resolve itself, such as the time someone posted 27 identical answers and Guttenberg reported every pair it found. If you know the problem will go away shortly, it’s probably best to just wait it out.

With the above disclaimers in mind, proceed through the following checklist:

  • First, find the bot’s documentation.
    • Different bots function differently and accept different commands, so knowing your available options is critical.
  • If there’s an obvious reason for the malfunction and an easy way to fix it, do so immediately.
    • For example, if a recent blacklist change is over-zealous, try to revert that blacklist change.
  • Try a reboot. More often than not, rebooting fixes the problem entirely.
  • If the bot is not responding to a reboot, it may be overloaded processing commands or sending chat messages. Some bots such as SwiftChatSE-based bots have a kill command that will immediately crash the bot, bypassing any work that’s queued up or cleanup tasks that are usually performed during a shutdown.
    • If the bot auto-restarts after crashes, it may come back up after a kill command.
  • If you have access to the bot on Redunda, try a failover to another instance.
  • If the bot responds to a reboot or kill, but comes back online and starts malfunctioning again, use the shutdown command.
    • If the bot is behaving, a shutdown should cause that instance to cease operating.
    • If multiple instances are available, Redunda should launch another instance, which may or may not experience the same problem. If the new instance works, problem solved; otherwise, shut down this instance too.
  • If all else fails, use Stack Exchanges moderation tools to silence the bot. The least drastic way to do this is to put the room in gallery mode and remove the bot’s write access. An alternative is to kick the bot, which will temporarily ban the bot from the chat room, however if the bot is kicked enough times automatic moderator flags will be raised and restrictions will be applied to the bot’s account. If you’re not a RO or mod, ping a RO immediately. If there is no only available to ping raise a moderator flag by clicking the message dropdown and choosing “flag for moderator.”

What to do after the bot has been silenced?

At this point, either the bot has been fixed or disabled. If you’re a RO or mod, clean up the chat transcript by moving the erroneous reports to a trash room. If you’re not a RO, ping someone who is.

The developer of the bot needs to be informed of the malfunction, as well as what you did and why. If you had to disable the bot, a developer is needed to fix and re-enable the bot. If you were able to fix it yourself or the problem went away on its own, explaining what happened is courteous and helpful so that the developer is aware and can prevent it from happening again in the future.

Jul 13, 2018 - Stack Exchange is removing OpenID. How does this affect us, and what do we have to do about it?

As you might have heard, Stack Exchange will be removing OpenID login on July 25, 2018. Because our bots depend on OpenID to log in to chat, we had to reverse engineer the new login.

Let’s have a look at our Java library, ChatExchange, to show you how the new login works:

Step 0: Prepare to store cookies

We’ll need to store some of the cookies we receive when logging in. In our case, we’re using a HashMap<String, String>. We pass it on to our own HttpClient. If you implement this on your own, make sure tha you store all the cookies and send them along with your requests!

Step 1: Get the login form

Every site in the Stack Exchange network (except for stackexchange.com itself - we’ll discuss that later) should now have this new login form located at /users/login:

login screen

Along with the fields for email and password, it has a hidden field called fkey, which is filled with a server generated value. We need to post this key along with the credentials. In order to be able to get this key, we first need to send a GET request to /users/login and read the fkey:

Response response = httpClient.get("https://" + host + "/users/login", cookies);
String fkey = response.parse().select("input[name='fkey']").val();

Step 2: Submit the form

Now we just need to post the credentials and the fkey to /users/login:

response = httpClient.post("https://" + host + "/users/login", cookies, "email", email, "password", password, "fkey", fkey);

Step 3: Check if you’re now logged in

To check if the login worked, we’re sending a GET-request to /users/current, which redirects to your profile when you’re logged in. If we can find a HTML element with the class js-inbox-button in the response, we’re logged in. Make sure that you send the cookies you’ve previously saved.

Response checkResponse = httpClient.get("https://" + host + "/users/current", cookies);
if (checkResponse.parse().getElementsByClass("js-inbox-button").first() == null) {
	throw new IllegalStateException("Unable to login to Stack Exchange.");
}

And now the edge cases…

Up until now, the implementation was quite easy and worked well for the Stack Overflow chat. And then there was chat.stackexchange.com…

Login to chat.stackexchange.com

As mentioned earlier, stackexchange.com is a special case. It does not have the login form that other sites use. To solve this problem, thesecretmaster ♦ had an idea. (actually two, but I prefer explaining the easy way ;-) )

meta.stackexchange.com has the same login as stackoverflow.com and the other sites. thesecretmaster ♦ figured out that we can simply use the cookies from meta.stackexchange.com and send them to chat.stackexchange.com. To implement this, we just needed to send steps 1 and 2 to meta.SE and step 3 to stackexchange.com.

What happens if the user does not have an account on the site they try to use?

With some accounts, our code just didn’t work on chat.stackexchange.com. The problem was that since we now take a little detour, the user account for the bot has to have an account on meta.SE. My bot didn’t have one. Creating an account is quite easy. The POST-request in step 2 will return a message and a button, if the user does not have an account yet. If we don’t click that button, the user won’t be logged in.

Identifying that we received that message is quite easy, although the ID of the <form>-element is not intuitive: logout-user

The bigger issue is in actually sending that form. Since we can’t just click the element in Java and didn’t know which of the hidden fields in that form is acutally being used, we had to read them all and post the contents to the action-attribute of the element:

Element formElement = response.parse().getElementById("logout-user");
if (formElement != null) {
  Elements formInputs = formElement.getElementsByTag("input");
  List<String> formData = new ArrayList<>();

  for (Element input : formInputs) {
    String key = input.attr("name");
    String value = input.val();

    if (key == null || key.isEmpty())
      continue;

    formData.add(key);
    formData.add(value);
  } // for formInputs

  String[] formDataArray = formData.toArray(new String[formData.size()]);

  String formUrl = "https://" + host + formElement.attr("action");

  Response formResponse = httpClient.post(formUrl, cookies, formDataArray);
  
  if (formResponse.parse().getElementsByClass("js-inbox-button").first() == null) {
    throw new IllegalStateException("Unable to create account on " + host + "! Please create the account manually.");
  } // if
} // if

If you have further questions about implementing this, feel free to join us in our chatroom.