SEC Filings Classification


The Problem

Our client was an academic researcher from a well-known business school.

Every publicly traded corporation is required to publish a document describing the activities of the company. One particular form contains a comprehensive summary of a company’s financial performance. The essential part of this document involves a description of the business of the company. Additionally, companies must declare their industry class(es). Our client wanted to use NLP to test a hypothesis related to underreported industry classes.

Image-based fashion search engine


Filing Solution and Tech Stack

Craftinity developed a classifier loosely based on the Paragraph Vector model to help our client test the hypothesis.


The input of the classifier was an SEC document. The classifier used this document to identify the proper industry class of each company. These results were used to compare the outputted industry class list with the reported industry class list.