Natural Language Processing (NLP) using Blackstone


Tristan Koh, a member of alt+Law, demonstrates the uses of the Python package Blackstone on Singaporean case law.

To access the full article with accompanying code, click here: Tristan Koh: Blackstone (

This summary and the corresponding full article demonstrates the uses of the Python package Blackstone on Singaporean case law. Specifically, I apply the Blackstone Python package on the seminal tort case Spandeck Engineering v DSTA and use it to classify every sentence of Spandeck into one of five categories (more below).

By doing so, I hope to encourage others to get their hands dirty with basic programming and data science, especially law students that are interested in legal technology. Even for students whose interest lies in the law of technology rather than technology of law, I personally believe that one cannot simply discuss "technology" in the abstract when formulating legal rules that govern such technology.

At the same time, I empathise with those who may be apprehensive of programming / coding, as I was one and a half years ago. Hence, through the accompanying notebook, I aim to explain each step in the code as simply as possible, to demonstrate that one does not need to be particularly talented to self-learn programming.

Blackstone’s legal text categorisation feature classifies legal text into five categories:

  1. AXIOM - The text appears to postulate a well-established principle
  2. CONCLUSION - The text appears to make a finding, holding, determination or conclusion
  3. ISSUE - The text appears to discuss an issue or question
  4. LEGAL_TEST - The test appears to discuss a legal test
  5. UNCAT - The text does not fall into one of the four categories above

As mentioned, I apply Blackstone to the Spandeck case. I then evaluate the results of Blackstone’s predictions.

What is Blackstone?

Blackstone is a Python NLP package. This means that:

  1. Blackstone is code written in Python, the programming language.
  2. It enables a computer to analyse legal texts in a manner that corresponds to humans’ understanding of language.

The second characteristic is special in the context of traditional data science techniques. Usually, raw data has to be manually cleaned such that it becomes quantitative and structured (ie. think of an Excel spreadsheet) before data analytics can be conducted on the data. However, With Blackstone, the data scientist only needs to feed the text into the application, and the computer will automatically be able to “understand” the text by converting the text into quantitative variables. The data scientist can then directly perform data analytics without having to pre-process the text themself.

How does Blackstone work?

How Blackstone processes the raw text into something that a computer can understand is rather complicated (also I am not fully familiar with the mathematical details), involving neural networks and linear algebra. However, a simple illustration can be used. Humans achieve semantic understanding (ie. the understanding of a word is appropriate to its context) of a single word in a sentence with reference to its context; namely by looking at the words surrounding the word that is to be understood. Similarly, blackstone captures semantic meaning by counting the instances a single word appears in a sentence, given the presence of other words. This process is repeated for all instances of each word in the text, which builds up a predictive model that predicts the probability of the particular word occurring given that there are these X words around it. In essence, Blackstone captures the semantic meaning of the text by using the number of occurrences of words as a proxy.

For a more technical explanation, click here: What Are Word Embeddings for Text? (

What are Blackstone’s capabilities?

Apart from being able to classify legal text into certain categories (which is the focus of this article), Blackstone’s other main capabilities include recognition of case names and citations, legislation and references to specific sections of the legislation, name of courts and judges.

The full technical description of Blackstone can be accessed here:

Why is Blackstone potentially useful to legal practitioners?

One immediate application of Blackstone is for legal research: It gives legal practitioners useful tools to automate the extraction of key features of a case, such as cases cited, legal tests that were used and conclusions that the court came to. Such data could provide useful trends in case law.

Another use case is semi-automated legal case summaries, which I talk more about at the end of the attached full article.

With this introduction, I invite you, the reader, to view the code and my analysis of the predictions of Blackstone on Spandeck! To access the full article with accompanying code, click here: Tristan Koh: Blackstone (

Note: The information contained in this site is provided for informational purposes only and should not be construed as legal advice on any subject matter. You should not act or refrain from acting on the basis of any content included in this site without seeking legal or other professional advice.

To access the full article with accompanying code, click here: Tristan Koh: Blackstone (