Businessman or engineer working on business process automation or algorithm with flowchart on computer screen
PHOTO: Shutterstock

Fully-automated job-matching site tilr, relies on several algorithms to deliver the best possible candidate to an employer. It works because the platform matches people to jobs based on their skills and not their titles, CEO Carisa Miklusak explained. “Titles can screen people out. But when you look at people based on an amalgamation of skills, the distance they are willing to travel for a job, and compensation, among other factors, we can find a good match.” Clients trust tilr’s algorithm so much that the employee typically shows up for work without much vetting from the company (tilr does do an interview), she said. 

Miklusak, of course, is confident in the company’s system as well, but it is a confidence born of that old maxim "trust but verify." The company has been auditing its algorithms since its launch to make sure they are achieving the intended outcome of selecting the best candidate. 

Tilr is one of a growing number of companies that are taking the time to routinely vet the algorithms that are the backbone of their various systems. Sometimes these audits are a formal review conducted by a panel of experts; in other cases they are more informally and even manually conducted. The reason behind this trend is to make sure algorithms remain fair and don’t develop blindspots as more and more data is fed into the system. 

For example, tilr is a heavy user of zip codes and relies on its algorithm audits to make sure job applicants from certain zip codes aren’t overlooked. For instance, in areas where cars aren’t traditionally used for transportation, candidates could be inadvertently weeded out. “If there was a zip code we were concerned with, as part of the audit we would look at how many jobs are being served there,” Miklusak said. “Perhaps we would decide to weigh that particular zip code differently as a result.” 

Diversity from the Beginning

Some people are so fierce in their belief that algorithms must be monitored that they prefer to start at the very beginning of the process, and not wait for an algorithm to be created. Eli Finkelshteyn, CEO of Constructor.io, said that the real solution lies in getting better at diversity in engineering hiring. Keeping bias out of algorithms should be part of the corporate culture, he said. “If you get to the point where you need a separate outside auditor you know there is something wrong.”

Constructor.io does algorithm audits, but as part of the code reviews by engineers and through quality assurance (QA) activities, Finkelshteyn said, “It is baked into the corporate process.”

Some companies don’t want to slow down production with code reviews or QA. “Our philosophy is to move slowly and not break things and not have implicit bias,” he added.

Communication is at the crux of this approach, Finkelshteyn said. Company engineers need to have frank and sometimes difficult conversations about such issues as gender or race before an algorithm is written. They need to be explicit about what sorts of things machine learning should or should not be based on, and the importance of geographic location. “This is a communication challenge — not necessarily an engineering one,” he said. “We train people coming in on how we feel about these issues as a company.”

Related Article: 12 DevOps Tools for 2019 Worth Checking Out

Auditing for Accuracy

Besides bias, algorithms should also be audited for accuracy, said Albert Brown, SVP of engineering at Veritone.

Algorithms are based on assumptions, known or otherwise, he explained. These assumptions, or patterns, are baked into the data used during the training or building of that algorithm in addition to the assumptions in the algorithm itself. “As the data changes and if the algorithm was not thought through carefully at the start, the accuracy can go from acceptable to 'worse than a coin flip,'” Brown said. “Companies should audit the algorithms to catch missing datasets and to recognize when the algorithm is not working effectively."

Consider, for example, a user experience that is created based on machine learning results, said Pavel Dmitriev, vice president of data science at Outreach. A machine learning algorithm can result in a company highlighting certain words on a web page on which the user is offered a product. An algorithm audit would check the relevancy of the product to the term, and to the context of the page, he said. 

An audit would also ensure the data used to train the algorithms respects user privacy, Dmitriev continued. “With laws such as GDPR and CCPA, respecting user privacy is no longer an optional aspect of developing machine learning algorithms. The novelty and complexity of these laws may lead to developers of machine learning algorithms inadvertently violating them.”

Underestimating the Impact

Companies such as Constructor.io or tilr are in the minority with their approach to algorithms. Many, especially startups, are in a race to get a product to market and do not take the time to carefully vet and then periodically review an algorithm. 

Only companies in already highly regulated or scrutinized industries are putting standardized processes, policies and responsible staff to work on this issue, said Tom Debus, managing partner of Integration Alpha GmbH. “Many of the others treat their data science and artificial intelligence (AI) too much like their traditional tech projects and still underestimate the social and commercial impact that a runaway algorithm might have.”

Indeed, an algorithm audit is perceived as unimportant in so many companies that Cindi Howson, chief data strategy officer at ThoughtSpot, advises advocates not to call it an audit at all. “Call it design review or ethics proofing, but not auditing or else nobody will buy into this,” she said.

Howson is a firm believer in the need for algorithm audits and would like to see an AI code of ethics for data scientists become mandatory, similar to a doctor’s Hippocratic Oath. “Part of this code of ethics would require a design review for biases in the training data set, with reviews of the model by a diverse team. This design review should include the contrarians and ethical hackers that may pose the questions, ‘Here is what I intended my algorithm to do. How might it be abused?’ ‘What harm might it cause on a portion of society?’”

Debus offered a more regimented approach to the issue. Peer code reviews are largely recognized as a best practice to a standardized algorithm development lifecycle, he said. “More sophisticated and more expensive is the practice to include manual false positive and false negative testing as well as unintended bias testing into maintenance and model calibration processes,” he said. “That way an algorithm that started out fair and unbiased is audited at regular intervals to verify that no unintended or undiscovered bias has crept into the model reinforcement loops.”