February 20th, 2013
Since the launch of Open Images in 2009 there has been an increase in the reuse and reach of Open Images each year. To demonstrate this we will compare the quantitative results of 2011 and 2012 from Open Images in this blog. To measure is to know!
Visitors of Open Images
In 2011 there were almost 1,600 media files available on Open Images, this has now increased to more than 1,800. We can also see that the number of visitors has increased from 66,000 in 2011 to more than 105,000 in 2012. Of these visitors more than 53,000 were unique visitors in 2011, which increased to 89,000 in 2012. There was also an increase in the number of visited pages: in 2011 almost 207,000 pages were visited and in 2012 nearly 280,000. In 2011 nearly 11,000 videos were played, in 2012 this was close to 16,000. We also know that from July 2012 almost 2,400 media files were downloaded.
Reuse of the Sound and Vision Open Images dataset
Not only the impact generated on the Open Images platform itself is increasing, but the external reuse of material available through Open Images as well. The Sound and Vision videos from Open Images are, for instance, also available on Wikimedia Commons and in Europeana. Since these videos became available in Europeana in May 2012, they were visited 3,900 times by 3,200 unique visitors throughout 2012. Besides these numbers, we have particularly good insight in the external reuse in Wikimedia projects, such as Wikipedia. In 2011 as well as in 2012 nearly 1,600 media files from the Sound and Vision collection were made available for reuse on Wikimedia Commons through Open Images. In December 2011 these files were reused in almost 1,000 articles on Wikipedia, in December 2012 this number had increased to nearly 1,600. In the whole of 2011 these articles generated almost 19,000,000 page views. In 2012 this more than doubled to nearly 40,000,000 (!). In other words, this means that in 2012 a Wikipedia article containing reused media from Sound and Vision was viewed nearly 40,000,000 times.
Besides Wikimedia projects, the data and videos from Open Images are also used more and more for innovative applications. The API from Open Images makes it possible for computers to process the data from the openly available collections. In 2012, the API received 169,000 requests. Creative developers have become even more aware of the existence of Open Images as a great basis for new apps since the Open Culture Data initiative started in 2011. For the open data competitions Apps Voor Nederland (Apps for the Netherlands) and the Open Culture Data competition 2012, seven apps were submitted that used the Sound and Vision dataset on Open Images. Two of these apps won an award: Vistory (winner of Apps voor Nederland 2011) and Tijdbalk.nl (winner of the Dutch National Archives award during the Open Culture Data competition 2012). In recent years, a number of other applications have also been developed using the Sound and Vision subset of Open Images, such as Erfgoed in Beeld, Led it Up and Docs on the spot.
Putting the figures in perspective
The current size of the entire audiovisual collection of Sound and Vision is estimated at 750,000 hours. The Polygoon newsreel collection is one of the few subcollections of which Sound and Vision owns the required intellectual property rights to make the material available under an open content license. This subcollection forms the basis of the content that Sound and Vision selects for inclusion on Open Images and is estimated at 500 hours. Currently 110 hours of this collection are available via Open Images. This means that – based on the abovementioned estimated figures – at this point in time 22% of the newsreel collection is available as open content via Open Images, which translates to only 0.015% of the entire audiovisual collection of the institute. The impact of Open Images summarized in this blog post shows that even with a relative modest open content set, substantial impact can be obtained. Starting small in the case of Open Images already lead to great results. Imagine what would happen if we were able to even just release one percent of the entire audiovisual collection as open content. Based on our experience we suggest that institutions that haven’t yet opened (parts) of their collection at least experiment with a small content set, that can easily be made available without restrictions. By measuring the impact and actively promoting reuse, a lot can be learned by GLAMs about the potential of opening the digital doors of our institutions.
Metrics for measuring the impact of cultural datasets
The numbers show that the (re)use of the material on Open Images has increased substantially. The impact of Open Images has proved to be considerable and the external reuse of the open content also sees an increase. In response to the growing need within the cultural heritage field to receive statistics on the impact of the opening up of cultural data sets, Sound and Vision will perform impact analysis research together with Kennisland for Open Culture Data. In order to do so, the data providers from the Open Culture Network, but also international initiatives, are requested to provide data on the impact and reuse of their data sets by filling out a survey. The results of this impact analysis will be made public in the course of 2013.
November 9th, 2012
EUscreen, the platform for European television history, now has its own portal on Open Images. The portal contains a selection of almost 60 videos from the European television heritage. They are all available for reuse under a Creative Commons license. The videos are from the archives of Sound and Vision (Netherlands), VRT (Belgium), NAVA (Hungary), Cinecittá Luce (Italy) and Televisió de Catalunya (Spain). By creating this portal on Open Images EUscreen wants to give people the chance to reuse footage from the European television history in a creative way.
The main goal of EUscreen is to make European television heritage available online to the public. Since the start in October 2009 more than 40,000 videos, photos and articles on European television history were published on the freely accessible website. The materials are from 28 European partners from 19 countries. One of the four focal points of EUscreen is ‘reuse and creativity’. The launch of the EUscreen portal on Open Images makes it possible for the public to engage with a small selection of the material in a creative way.
The collection of videos in the portal contains a varied selection. For instance, a programme on women in the army from 1977 from the archives of the VRT:
From the archives of Sound and Vision there is a report on the awarding of the TV Prize to Carel Enkelaar in 1961. In the audience is also Dutch television pioneer Erik de Vries:
Besides historical footage the portal also contains more recent videos, like the beautiful images of different places in Catalunya from the archive of Televisió de Catalunya. For example a video on the marine life of the marine reservation of the Medes islands:
For examples of reuse also check the EUscreen blog. There you can find reports on the different remix workshops and the demo page containing data visualizations and ideas to get started yourself. The complete collection, including virtual exhibitions, can be watched online (for free) on www.euscreen.eu.
July 25th, 2012
In view of the upcoming Olympic Games the digital heritage portal Europeana also focuses on sports. With photos, videos and objects different aspects of (the history of) sports in Europe are shown in a virtual exhibit. The exhibition is divided into four themes: The Olympic and Paralympic Games, Famous European Sports, Football and Ancient Games.
The Sound and Vision set on Open Images has recently also been made a part of Europeana. In the virtual exhibit two videos from this Sound and Vision set were used for the exhibit, in the parts on tennis and cricket:
These two videos are only two of the many sports related videos that can be found on Open Images (and thus Europeana). You can find the virtual exhibition here: European Sport Heritage
July 10th, 2012
Fifty videos about the Dutch railway network have now been added to Open Images. In 65 years of Polygoon newsreels the railways have been a recurring news topic. In these years it´s not only about the problems because of delays and failures, but especially about the developments and growth of the railways in the Netherlands. Gradually, an increasing portion of the rail network is electrified. Steam trains are replaced by electric trains. Until these steam-breathing monsters are nowhere to be found in the Dutch landscape. In addition, there are cities and connections added to the rail network. This way, the rail passenger can visit an increasingly amount of spots in the Netherlands.
The electrification of the Dutch railway network is progressing nicely in 1938. Electric trains are driving at various connections and on certain routes diesel trains are being used. Over the years the steam train slowly disappears from the Dutch tracks. This Polygoon newsreel shows how the electrification of the railways is progressing in 1938:
Of course there sometimes is, even in earlier times, something wrong on the railways. Among other things, in 1926 and 1929, a passenger train derails. Accidents regularly occur on guarded, but mostly unguarded railway crossings. The following item shows how road users are being made aware of the dangers of an unguarded crossing.
Not only the trains and railways are topics in Polygoon newsreels. There is also attention for passengers and what objects they leave behind in the trains. From guitars to shoes, the most strange objects are found in trains. Different people are sniffing between the found objects looking for something of their liking:
In the railway museum in Utrecht, people can take a look at the past of the railways. Among other things you can see here the oldest Dutch steam locomotive, called the Eagle. You can admire an old timetable and you can see a model of an electric train driving back and forth on a scale model. Visitors also get a look behind the scenes of the Dutch Railways and the automation of the railways is illustrated.
May 31st, 2012
With a video on the opening of the Zuid-Beveland railway from 1927 there are now 2000 videos on Open Images. Since the last milestone of 1500 videos some beautiful videos have been added. From the collection of Sound and Vision several videos form the Polygoon archives with themes like women, shipping, typical Dutch, pets, health and care, Hilversum and railways were added. A good example is a video on ms. Versluys, who was the first female pilot to receive her Dutch pilot license in 1930:
Besides videos from the Polygoon archives, there were also videos added from the collection of educational films from Sound and Vision. These films were produced mid-twentieth century by the Stichting Nederlandse Onderwijsfilms (Foundation of Ducth Educational Films). One of the films from the collection is Giethoorn, suitable for geography lessons:
Among the new videos there were also some not from the Sound and Vision archives. For example, Eye Film Institute Netherlands added a number of films from their Bits and Pieces collection. These videos were also the base for the remix contest Celluloid Remix 2: Found Footage. The contest challenged creative persons to work with a.o. the material on Open Images to make new videos. In the winning video, Untitled by Dániel Szöllösi, a number of smartphones are used:
Besides Eye Film Institute Netherlands the Netherlands Media Art Institute also contributed to Open Images. In their own portal various videos on media art can be found, for example The Unified Field van Peter Bogers:
Because of the open licenses of the videos and data on Open Images, they could be used for several projects. During the International Documentary Festival Amsterdam the videos were used for Docs on the Spot. With the Docs on the Spot app visitors could experience the documentary Omzwervingen in de nacht (Marjoleine Boonstra, 2004) on location in a new way. Via Open Images this experience was enriched with images from the past. In January Glimworm won the Apps for the Netherlands competition using the data and videos from Open Images in their Vistory app.
The visibility of the videos from Open Images on the internet has also increased. For example, the Sound and Vision set from Open Images was added to the digital library Europeana. On Wikipedia the videos are used to provide visual images to more than 1,100 entries, not only on the Dutch Wikipedia, but also on more than 60 other language version of Wikipedia. These entries are viewed more than 2,5 million times a month.
May 16th, 2012
On 1 May a set of more than 1500 videos from the Netherlands Institute for Sound and Vision has been made available in Europeana via Open Images. Europeana brings the digitized collections of European libraries, galleries, museums, archives and audiovisual collections together online. The digital library gives access to 20 million books, films, paintings, museum objects and archival documents of around 2200 different providers.
The Sound and Vision set that has now been added contains a collection of newsreels from the Polygoon neswreels and several other films on the Netherlands in the twentieth century. By making the Sound and Vision set available in Europeana it is now part of a collection of millions of cultural objects, so interesting connections with other objects can now be made. With the API of Europeana the collections are also made available for reuse. During several hackathons the API will be used to develop interesting applications with the collections in Europeana.
February 22nd, 2012
In September 2011 the Wiki Loves Monuments contest took place. Thousands of pictures of European monuments were uploaded to Wikimedia Commons. To stimulate the use of videos on Wikimedia Commons, Open Images also made a prize available for the best video of a monument.
We are proud to announce that the winner of this prize is this French video about an old wallpaper printing machine (built in 1877). The video shows “the 26 colors machine”, famous for being the first one to use 26 colors for printing wallpapers. Such a machine was a moving piece, with gears, paint, paper and men around it: a video is the only way to make it live again. The video – one of a whole set videos – also unfolds various viewpoints, from the tiny details of a golden cylinder to a view of the surrounding building.
It provides something that cannot be shown in a static picture: a sense of both beautiful details and the surrounding building, which is typical of industrial age. Video is incredibly useful for the encyclopaedic purposes of Wikipedia. Whether it is to visualise a chemical reaction, watch lifeforms evolve in their environment, or see how a machine actually works. If an image tells a thousand words, then imagine what you can tell in 24 images a second.
January 13th, 2012
For Sound and Vision, in the context of the Dutch Open Data initiative “Nederland opent Data” (The Netherlands Opens Data), I created the basis for the demo that is described in this post. The demo shows how you can play a video in an enriched context, by linking open data sources to terms that are found in speech transcripts rendered from videos. For the Code Camping event, organized by Open Cultuur Data (Open Cultural Data) I extended the demo with newly linked data sets.
The starting point for this demo application was the reuse and linking of data sets to the Open Images collection, which contains more than 1,500 freely (re)usable videos containing mostly old news items from the ‘20 throughout the ‘80. All of these videos are published using Creative Commons licences.
The basis for the application lies in the use of the speech transcripts, which were generated by using automatic speech recognition (ASR) software (from X-MI) on these videos.
The main idea for the demonstration is to contextualise videos while they’re being watched, in order to provide the user with fun, interesting and unexpected background information about the things that are spoken in the video.
For example: when Philip Bloemendal (the presenter of the news items) – in a video titled: ‘Large parts of Holland completely snowed in’ – talks about: ‘(…) but on several places in Drenthe there (…)’, next to the video, several blocks of information about Drenthe (a province in The Netherlands) are shown. Each of these information blocks gets its data from a specific open data source. For the first prototype the data sources used were (amongst others): Google Maps and Wikipedia. To illustrate this some more: in the example where ‘Drenthe’ was recognized as a concept, the Wikipedia block shows an article about Drenthe; in the Google Maps block the map is zoomed in on the province of Drenthe in The Netherlands.
For the Code Camping event, organized by ‘Hack de Overheid’ (Hack the government), I added two new data sets to the demo: the collections from the Rijksmuseum and the Amsterdam Museum.
How it all works
As mentioned, the main building blocks for this demo are the Open Images videos and the corresponding speech transcripts that are used to link the words that are spoken (in the video) to an exact time code. (Note: Automatic speech recognition software is not perfect, which means that not every word in a speech transcript will exactly match the actual words that were spoken).
Because not every word in a sentence is particularly interesting, the first step is to filter out stop words from the speech transcript, such as: articles, prepositions and verb modifiers.
In the second step, a script is run on the remaining words to sort them by ‘importance’. Importance in this matter is calculated by combining a preset word score (coming from a special lexicon) with the frequency the word is spoken. In this way, words with a high score and a high frequency will end up high in the list.
After sorting, the words are used, in order of importance, as query input for the GTAA thesaurus (used by Sound and Vision) and also for Freebase. The latter is a Google service and offers a big collection of interrelated concepts, containing descriptions from a large variety of domains. Freebase can be seen as an extensive thesaurus containing information from a large number of areas of expertise.
When, after querying, the GTAA or Freebase webservice yields a concept, it is put in a list of candidates. After processing all the words, this list is filtered using a very simple disambiguation algorithm (i.e. whenever the yielded concept is comprised of more than one word, it is taken out of the list).
In step 4, each of the GTAA and Freebase concepts from the list of candidates is used for querying the open data webservices, which are:
- Google Maps (only queried for location type concepts)
- Amsterdam Museum
Each result returned, will be linked to the time code of the (spoken) word from the speech transcript that was used to find the eventual information.
(For those interested: the collection from the Amsterdam Museum has three different end-points: Adlib, OAI-PMH and SPARQL. For this demo, I used the latter, because, unlike OAI-PMH, it does not require to be harvested and indexed before it can be queried. In any case I thought it was a good idea to play around again with the Semantic Web and refresh my SPARQL skills. For the Rijksmuseum, I first harvested the collection from OAI-PMH and then indexed it with SOLR. This way the collection can be searched using Lucene queries.
The last step was to send back the time-coded contextdata back to the browser. I do this by using a JSON object, which in turn I use as input for Popcorn.js to generate events. These events are linked to an HTML5 video player and make sure the right (context) information is shown in the different blocks/panels in the user interface.
Because the processing of these five steps takes around 15-20 seconds per video, I store all of the results in .json files. When opening the demo these files are loaded instead of fetching the data live from the web.
There is still a lot to do
The demo shows what can be done by using concept detection (a.k.a. Named Entity Recognition) in combination with open data sources. For several aspects however (significant) improvements can be made:
Better concept detection
The concept detection as described in this demo could be improved much more. For instance, concepts that comprise of more than one word are not recognized, e.g.: ‘Amsterdam Museum’ now yields two concepts, ‘Amsterdam’ and ‘Museum’, but the actual concept ‘Amsterdam Museum’ is not found.
Moreover, specific Named Entity Recognition (NER) services like DBpedia Spotlight should be investigated (having good results for English) in order to improve results. For Dutch however, it seems it’s an ongoing search for a decent (open source) solution.
Selection of relevant sources for the user
Concerning the relevance of the ‘context information’ that is currently shown to the user, there is still much to think about how to make the best selection of data sources. For instance: why somebody who is watching a video about ‘Holland’s oldest steam-powered pumping station’ would be interested in ‘Hens chalice from the Company of Nine’ (found on the basis of the word ‘Gorinchem’, which is a town in The Netherlands) is something to think about.
Optimizing Popcorn.js usage
The demo was made with an older version of Popcorn.js (v0.7) and therefore doesn’t make full usage of all of the latest features and plugins Popcorn.js has to offer. Future releases of the demo will incorporate the newest version (currently v1.1.1).
In any case the demo does show how speech transcripts of videos can be combined with open data sources and how this can enable (mutual) contextualisation of these sources. For the ‘Nederland opent Data project’ this demo will be further enhanced. Any progress of this will be reported here!
Jaap Blom | Software engineer | R&D department, Netherlands Institute for Sound and Vision
August 25th, 2011
Access to the audiovisual content on Open Images is provided under Creative Commons licences. These licenses facilitate the reuse of content in different ways. One of the possible ways media from Open Images can be reused is on Wikipedia. For this purpose the videos on Open Images are transferred to Wikimedia Commons, the online repository where freely licensed media files used for Wikimedia projects like Wikipedia are stored. In the beginning this was done manually, but in the meantime this process has been automated through the Open Images API. Currently, there are more than 1500 media items from Open Images available on Wikimedia Commons. This means that Open Images is responsible for about 15% of the total amount of videos, which makes Open Images the largest supplier of videos on Wikimedia Commons.
The Wikipedia community uses the videos from Open Images to enrich the entries on the Wikipedia. For instance, the English article on the ‘Elfstedentocht‘ has a video of the Elfstedentocht of 1954:
Besides the reuse of complete videos, derivative works (such as screenshots) are also used. These are then for example employed in articles on famous people, for instance in this article on Dutch politician Pieter Oud:
3 million views
The reach of Open Images content on Wikipedia turns out to be substantial. In May 2011 the Wikipedia articles with media items from Open Images were viewed more than 3 million times. This is almost three times as much as the number of views in December 2010. Noteworthy is that the majority of the views are not on the Dutch Wikipedia, even though most of the videos on Open Images have Dutch subjects and are in Dutch. Of the 3 million views a mere 880,000 were on the Dutch language Wikipedia. The remaining 2.2 million views were on Wikipedias in different languages. The five Wikipedias where articles with Open Images content got the most views in May 2011 were:
- the English Wikipedia
- the Dutch Wikipedia
- the French Wikipedia
- the Portuegese Wikipedia
- the Japanese Wikipedia
More than 850 articles on the different Wikipedias make use of content from Open Images.
The article with the most views in May 2011 was Mother’s Day on the English Wikipedia, which was viewed almost 1.5 million views. The video used in this article is used on several Wikipedia sites. Besides the English and the Dutch Wikipedia, it is also used on for example the Tibetan and Persian Wikipedia. The Wikipedia articles containing Open Images media with the most views in May 2011 were:
- Mother’s Day (EN) 1,445,756 views
- AFC Ajax (EN) 121,322 views
- AFC Ajax (NL) 111,190 views
- Billy Graham (EN) 94,485 views
- Giro d’Italia (EN) 73,055 views
These statistics demonstrate that offering their material under a free license certainly has an added value for cultural heritage institutions. For the cultural heritage field it is a sound strategyfor opening up their collections to a large audience. It also gives the (internet) community a chance to enrich their projects with historic images. This reuse is of course not restricted to Wikipedia. By offering collections under a free license they turn into a rich source for (re)use fora large number of cultural, educational and creative purposes.
August 15th, 2011
At the moment Wikipedia articles don’t contain a lot of videos (less than 0,1% of all files on Wikimedia Commons are video files). Open Images would like to change this. Therefore, most videos from Open Images are already automatically mirrored to Wikimedia Commons. To stimulate users to use more video on Wikipedia, Open Images will be handing out a special video prize. The maker of the best video uploaded as part of Wiki Loves Monuments will be awarded a 2 year Premium subscription to Spotify, or alternatively an Amazon gift voucher.
Wiki Loves Monuments is a contest organised by Wikimedia, the movement behind Wikipedia. To be eligible for the video prize participants have to upload a video of one or more monuments to Wikipedia in September. The rules are:
- Self made and self uploaded
- Uploaded in September 2011
- Freely licensed
- Feature one or more monuments
So be creative and enter the contest! The people of Video on Wikipedia have a howto explaining how to post a video to Wikipedia. More information on Wiki Loves Monuments can be found on their website.