Feeds:
Posts
Comments

Posts Tagged ‘Tumblr Archive’

A very powerful feature of Cloud Preservation is its ability to collect external links. External links are links to web pages or documents that are outside of the website or social media feed being collected.

In terms of website feeds, Cloud Preservation determines if a link is external by comparing the address of the link to the addresses defined in your feed. In the context of Cloud Preservation social media feeds (such as Twitter or Facebook) an external link is a link that was found in a post from the social media feed.

Cloud Preservation provides you four configurable options for how it will manage external links. These options allow you to tailor your feeds to meet your collection needs and also provide a level of control over your feed’s storage use.

Option 1: Never collect external links

This option allows you to ignore offsite links entirely. When the Cloud Preservation crawler encounters a link that it determines to be external, it will record that link, but will not collect the web page at that link’s address. Since this option leaves these external pages out of your repository completely, these external links have no impact on your feed’s storage use.

When to use this option: There is no requirement to collect external pages, and/or there isn’t enough storage capacity for external pages in the Cloud Preservation repository for the selected plan.

Option 2: Never collect modified versions of external links

With this option selected, Cloud Preservation will look to see if it has ever collected this external link before, by comparing the address to all of the addresses of pages it has collected in the past. If it finds another page in the repository that bears this same address, then Cloud Preservation will simply link the existing page to the currently running crawl. Of all the options to collect external links, this has the lowest impact on storage for the repository.

When to use this option: There is a requirement to collect external pages, however the latest version isn’t important or of consequence. Often times for social media feeds like Twitter, the external page modifications aren’t relevant.  For example, the external link could be an article or blog post with constantly changing advertisements and user comments that aren’t important or relevant for your collection.

Option 3: Collect modified versions of external links for new or modified pages

If Cloud Preservation crawls an internal page that has not changed since the last collection, then it will not attempt to fetch the latest version of any external links. However, if the page has changed since the last collection, or is a page that has not been collected previously, then Cloud Preservation will check for new versions of all external links on that page. This option is slightly less efficient in terms of repository storage, but does offer savings over the final option.

Note: This is the default setting for new Cloud Preservation feeds, as we’ve found it to be the best choice for enhancing your collection with external links while keeping storage use in check.

When to use this option: There is a requirement to collect a “point in time” snapshot of both the internal pages and the external pages.

Option 4: Always collect the latest external link

Finally, this option will always attempt to fetch the latest version of the external link. If the link is found on a new internal page, modified internal page, or unmodified internal page, Cloud Preservation will crawl the external link to see if there is a new version. This option will have the largest impact on storage, as external pages frequently change due to rotating advertisements or images and changed content.

When to use this option: Useful when the latest version of offsite pages must be collected, always, and there is a surplus in storage capacity for the Cloud Preservation plan chosen. This option is also necessary for some advanced crawling techniques, such as using a single internal web page whose purpose is to provide an index of several external links.

The crawling process of Cloud Preservation can get complicated, just like the web, and we hope this sheds a bit of light on the subject of external links.

Read Full Post »

Good news everyone! We’re happy to announce the addition of Tumblr feed archival functionality at Cloudpreservation.com. Cloud Preservation users now have the ability to automatically archive Tumblr blogs.

Tumblr lets you effortlessly share anything. Post text, photos, quotes, links, music, and videos, from your browser, phone, desktop, email, or wherever you happen to be. You can customize everything, from colors, to your theme’s HTML.

Cloud Preservation archives all of Tumblr’s different post types while maintaining each blog’s customization.

Sample Tumblr Post from life.tumblr.com

Sample Tumblr Post from life.tumblr.com

Not only are Tumblr posts stored as they appear to website viewers, but Cloud Preservation also stores multimedia file resources used within posts. Photos from photo posts, videos from video posts and audio from audio posts are all automatically archived. Just as video files from sources like YouTube and Vimeo are viewable within the Cloud Preservation viewer, audio files shared on Tumblr can also be played without leaving Cloudpreservation.com

Audio Player

Cloud Preservation offers two different Tumblr feed archival options: Public and Authenticated. When using the Authenticated option, users archive all posts from every blog they access to as well as a list of followers from each blog. Authenticated feeds also archive basic user profile information. With the public feed option, users can archive all the posts from any public Tumblr blog.

As of November 14th, 2011, Tumblr had 33,318,876 Tumblr blogs and Tumblr users were posting at a rate of 38,000 posts per minute. With so much Tumblr data being shared, we’re glad to offer Cloud Preservation users the ability to fullfil their legal and compliance obligation needs.

Read Full Post »