Shedding light on dark data: What does it mean for IP insights?

By Phil Arvanitis IP Solutions Consultant EMEA, CPA Global

With companies processing more and more data to manage their business better, comply with regulation, and provide services to customers, it’s not unusual for quite a bit of this ‘big data’ to remain unstructured and underutilised. Better understanding of the ‘dark data’ contained in a company’s data sets can provide all sorts of benefits for a business, not the least of which is in understanding and managing intellectual property more effectively.

 ‘Dark data’ is not as ominous or mysterious as it sounds! Companies collect all kinds of business data over time using different systems, people and methodologies, which can make it difficult to compare and analyse this information as one dataset.  Considering the size of many organisations’ universe of information assets, it is not unusual to find that up to 90% of a company’s big data is ‘dark’ – unreviewed, unknown, unused.  If dark data is to provide useful insights, these different types of data need to be identified, linked, compared and analysed.

Dark data that companies may already have could be very useful to review and connect up in many areas, including in relation to intellectual property management.  IP-relevant dark data may already exist in paper or electronic form in a company’s emails, research records (new ideas), meeting notes, and previous patent filings and notes.  In relation to IP, dark data may include information that is being collected only passively, is difficult to access or is not yet being utilised, along with data that a company knows exists but has not currently enabled access to.

Trade secrets and early stage ideas, conversion rates of ideas to inventions disclosures, and ratios of patent filings to grants are examples of the types of IP-relevant dark data that companies may well already have but cannot currently use, given that such data has been unstructured, non-normalised, or unreviewed.

Common types of dark data not being used by IP departments to full effect include:

- IP metrics:  What are the operational metrics of the IP lifecycle within your organisation? How many ideas are created every year? How many of these ideas are patented?

- Trade secrets: How many trade secrets does your organisation hold? What products do they relate to?

- Competitor activity: What IP applications are competitors filing? How much they are spending?  When are they changing their business strategies?

 - From product to IP: Which intellectual assets are not currently mapped to existing or future product lines?

- Public and private data: Are these datasets linked? Does your private data match patent office records for the same case?

Companies are increasingly looking to develop and improve their own internal intellectual property management systems to help keep more accurate, normalised, easily referenced records of their own IP assets, taxonomies, and third party IP, and are looking for ways to ‘mine’ their dark data – both internal data and external information such as patent offices’ records – in a more systematic way.

Commercial IP management firms such as our own are likewise trying to improve their analyses of external patent data, where even the publicly reported patent ownership information can be inaccurate up to 20% of the time.  Our Innography software, for example, uses 200 different data sources to piece patent, company, location, business, litigation, financial and a whole range of other public information to try to understand patent-relevant data in more detail.  We mine out information from patent office’s such as terminal disclaimers, undisclosed citations, and reasons for rejections, as well as examination practices and histories.  This not only includes over 40 different patent data sets, plus multiple data sets for companies, litigation and other information to enrich that data.  There might be a few hundred thousand patent records worldwide that get changed or updated in any given week, but using enriched dark data of the sort described here, we update our information on somewhere between 2m to 4m patents every week.

‘Dark data’ that a company already has may be incredibly useful to process and evaluate in managing its IP.  We are finding this is true on a global level within our own company.  It certainly seems to be the case that shedding light on previously dark data will enable IP departments and law firms to make more informed decisions and run leaner, more data driven operations in future.


For more information, see Phil’s other recent blog and Q&A on this and related topics.   Graphic ©  Elnur  | Dreamstime.  Used under licence.