Archive for February, 2011

We’re really excited about the improvements we just released to CloudPreservation.com that will help you better archive and preserve your Facebook data. With these changes, you’ll not only be able to start crawling Facebook profiles — one of our most requested features — you’ll be better able to search and manage your Facebook Fan Page data as well.

The fundamental change in this release is that we’ve shifted our data collection strategy.  We’re now pulling data straight from the Facebook Graph API. This means we’re fetching and storing the raw data that Facebook itself uses to build its web pages, instead of  crawling the Facebook web application. The end result of this change is that we now have access to more thorough, cleaner Facebook data — which is easier to search — as well as data that is less prone to accidental duplication — which saves on storage.

Cloud Preservation | Facebook Permission to Crawl ScreenUsing the Facebook Graph API also means we now have the ability to crawl private Facebook profiles in addition to public fan pages. As of today’s release we’re able to archive and preserve all of a user’s Facebook interactions — including all of the people on their friendlists, messages in their inbox, wall posts they’ve made, places they’ve checked-in, and just about everything else a user can do on the site. It’s easy to add a user’s profile to crawl, all you need is their email address. Enter it on CloudPreservation, and we’ll send them a link to a Facebook page on which they can activate the crawl. The authorization system is so straightforward, if your users can add Farmville to their Facebook account, they’ll have no trouble with this step. What’s great about the type of authentication the Facebook API provides is that CloudPreservation never needs to see your users’ Facebook login information (username or password).

One thing to note about using the Facebook Graph API is that the data we initially receive from Facebook is just that — unstyled, plaintext, data (here’s an example). For completeness, we store these original data files in the archive, but we wanted to go one step further to make that data easy to read and search.  So we used our own SmartCrawl® technology to apply a simple, well-organized interface — including loading and storing any photos and movies your users may have posted. This gives you the best of both worlds — you have access to the raw data originally provided by Facebook, and a terrific interface for viewing and searching.

We’re really excited about this release. These improvements to our Facebook crawls not only give you access to important profile data that was previously unreachable, but also give you vastly higher-quality data for the pages you were already archiving. And as always, we welcome any feedback.

Read Full Post »

Advanced Search has come to CloudPreservation.com.  The tool will guide you through the process of building more powerful queries while simultaneously providing you with the know-how to build the queries yourself in the “standard” search box.

Query strings are built for you as you make selections and enter text, helping you to leverage advanced features you may not have been aware of:
These tools are available for you immediately by selecting “Advanced Search” near your normal search box.

Read Full Post »

Today we launched a feature of CloudPreservation.com that will allow you to find pages or documents that didn’t exist in the previous crawl.  This is going to allow to do things like:

  • Find all pages that were added to the archive in during a specific crawl.
  • Use keywords to search the text or meta-data of pages that were added to the archive during a specific crawl.

To use this handy new feature, select a crawl from the “Feeds and Crawls” drop down under the search text field.  You’ll see a checkbox appear below and by clicking on that, your search will be limited to only pages and documents that were found for the first time for that crawl.

Limit Search To New Pages and Documents

Limit Search To New Pages and Documents

Read Full Post »