Image for post
Image for post

Knowledge graphs are starting to gain a bit of buzz. In 2020, Gartner placed knowledge graphs at the peak of their hype cycle. And while jumping on every hyped-up tech should be avoided, knowledge as a service industries have learnt a thing or two from DAAS and IAAS. Namely, rather than building a monolithic data structure, knowledge graphs should be built into knowledge workflows helping to provide data and context for particular roles in particular industries. The automation of tedium and knowledge gathering for specific individuals is a good enough goal. And could collectively change our lives at work.

Full…


Why checking your content’s “machine parseability” matters

“Art” written on a small creation made to look like a machine next to a small painting.
“Art” written on a small creation made to look like a machine next to a small painting.
Source

“Featured snippets” are coveted by SEO specialists and marketers due to their high visibility and click-through rate. An estimated 8% of searchers click on featured snippets. And they’re typically easier to grab than the “natural” #1 result spot for competitive keywords.

Even if someone doesn’t click on your featured snippet result, they’ll need to scroll past your site and branding to get to the rest of the results.

One issue many marketers have found, however, is that Google hasn’t provided much guidance on how you can get into the featured snippet. Some informal research shows that featured snippets tend to…


What Makes A Human Important…To A Web Crawler?

Image for post
Image for post

What makes a human important? Their humanity, sure. But what makes you REALLY important? What would a balanced jury of your peers pull out about your life?

Maybe you try really hard as a parent. Or were part of the making of a product. Maybe you shook up the world in public or maybe just had some particularly happy moments with a few.

Emily Dickinson hardly left her house. And spent the last two decades of her life refusing visitors. But in the end left an indelible mark on literary history.

Inherent importance of life aside, there’s a potential underlying…


Image for post
Image for post
Speaking of hard to scrape…Photo by James Lewis on Unsplash

I get to work with a variety of web scraping products and techniques at my job at Diffbot. Aligned with Diffbot’s mission to “structure the world’s knowledge” is an initial step of first gathering the underlying data to be structured. Diffbot is one of three western entities that truly crawl the whole public web. This involves a pretty stellar stack of web crawling, extraction, and parsing tools.

Even with great tools, one of the challenges with crawling and extracting data from pages at a large scale is you don’t really know what structure a page is going to have before…


And why it matters for scaling your public web data sources

Image for post
Image for post

As with most forms of tech these days, web scrapers have recently seen a surge of claims that they’re somehow based on AI or machine learning tech. While this suggests that an AI will detect exactly what you want extracted from a page, most scrapers are still rule-based (there are some exceptions, such as Diffbot’s Automatic Extraction APIs). Why does this matter? Historically rule-based extraction has been the norm. In rule-based extraction, you specify a set of rules for what you want pulled from a page. This is often…


And a few ways you can get started with no programming knowledge

A picture of stock market numbers.
A picture of stock market numbers.
Photo by Markus Spiske on Unsplash

Unless you truly have a unique take or years of experience with what you’re presenting, the quickest and one of the best ways to provide value in content is to provide a unique take on data. And luckily that’s something the web is chock full of.

But have you ever been gathering facts on a site chock full of data and had to look elsewhere because there was no way to copy it all into a spreadsheet? Don’t try to tell me this hasn’t happened to all you content creators out there.

Or you could, but the formatting was all…


Rules of thumb and my three favorite low-code web scrapers for content

Person doing data science on laptop
Person doing data science on laptop
Photo by Campaign Creators on Unsplash

I started my career right out of college as a data journalist. This meant that I was primarily researching and making infographics. For me, it laid a new foundation for what content was. At least for many types of content, you can let the data drive the idea. And then flesh out observations of data with interesting facts, thought leadership, and concepts. Remember, good artists borrow, and great artists steal. And there’s never been more public web data to wrangle into your content than now.

Table of contents:


Where does the internet fit into the history of knowledge? And what does this mean for AI that learns from the web?

Image for post
Image for post
“Gnothi Saeuton” (Know Thyself) — an allusion to the famous quote above the entrance to the ancient Oracle of Delphi — on the the Via Appia in Rome

Gnosis (γνῶσις) is the ancient Greek word for knowledge and the root of the name for the learned members of Gnosticism.

While nearly every culture and time period has had some conception of what knowledge is, the implications of the term have changed drastically.

In ancient Greece, gnosis held a spiritual component. Within Gnostic sects the word meant something like insight into the divine nature of humans. …

Merrill Cook

Content Marketing Manager @ Diffbot

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store