Sound and Vision on the future of video on Wikipedia


Source: Beeld en Geluid Wiki

Over the course of the summer the Netherlands Institute for Sound and Vision uploaded another 2.000 videos to Wikimedia Commons, the media repository of the online encyclopedia Wikipedia. The batch upload was done using the GLAMwiki toolset developed by Europeana and consisted of videos from the Open Images platform (a joint initiative by Sound and Vision and Kennisland). With this upload, Sound and Vision reinforced its position as the biggest contributor of video on Commons, providing over 3.8000 open videos. This is almost 8% of the total amount of videos available in this repository. While this is a great accomplishment (and we are very proud), it also shows that video content and Wikimedia are still in a somewhat troubled relationship.

Video on Wikipedia: still scratching the surface

In his 2010 whitepaper “Video for Wikipedia and the Open Web”, Peter B. Kaufman already noted that of the millions of files hosted on Wikimedia Commons, only a couple thousand were moving imagery. Four years later the situation is still pretty much the same: With over 22 million files on Commons, only 0.22% consists of video! More disturbing numbers: The English Wikipedia only has 5.800 articles that feature a video, 0.12% of all the English articles.

Based on these numbers we can conclude that with video on Wikimedia we are still just scratching the surface. Why are there so few video’s being used in Wikipedia articles, as illustration or historical reference for instance? How can it be that in a moving imagery-ridden world, the number one online encyclopedia is lacking behind? As mentioned above, this – in part – has to do with the availability of reusable video content, or rather lack thereof, on Wikimedia Commons.

Current challenges for providing video on Wikimedia Commons

The main potential providers of video content to Wikimedia one can images come down to two categories: (Amateur) producers of new video content (1) and GLAMs with audio-visual material in their collections (2). Each of these groups have their specific challenges.

Where previously the high cost and expertise required for video production would have prevented (amateur) producers from creating video for Wikimedia, developments in the last decade have made it a lot easier to create high quality video on a shoestring budget and with very little experience. The prevailing problem is that – so far – there is hardly an active community for the production of video content for Wikimedia Commons (Exceptions are small projects like Wiki Makes Video). This in stark contrast to the large number of active photographers contributing to Wikimedia Commons, resulting in very succesfull projects like Wiki Loves Monuments, which up till now generated more than one million photos. At Wikimania London this summer a group of people met to envisage how an active video-producing Wikimedia-community could be formed and facilitated. This could include services like a dedicated platform and a server for uncompressed and uncut video footage. Such a community could push the discussions about video on Wikipedia, leading to clearer policies and guidelines on why and how video can be used in articles.

For the second type of contributor, GLAMs holding audio-visual collections, the most obvious challenge (as one might guess) is copyright: Very few archives own all the rights to their audiovisual collections. And because multiple creators are almost intrinsic to the there production of audio-visual material, it proves very challenging to find out the legal status of a work, let alone obtain the permission required from all rights holders involved. However, with the experience Sound and Vision has acquired through making available what little material we do own rights to, we would highly recommend other GLAMs to put some time and effort into exploring their options. The advantages are numerous: We reach audiences that we would never reach through our own platforms (our content is uses in over 60 langues versions of Wikipedia). Millions of people around the world read the articles in which content of our institute is featured (over 50 million page views in 2013 alone). And not just that, contextual knowledge is added to our collections: biological, historical and scientific information. Knowledge that an audiovisual archive would never be able to collect or generate by itself.

Video as part of the sum of all knowledge

Another reason for the absence of video on Wikipedia is that it doesn’t seem to have the sort of notability for the average Wikipedian to consider adding video content to articles. Again, the availability of video might be the main problem, but it might also be a lack of awareness and knowledge about how to use video and what the value is of video on Wikipedia. What can video do that descriptions and static imagery cannot do? The Open Video Alliance states: “Motion! Videos can explain, clarify, and engage like nothing else.” Video captures behaviours and action, it conveys a lot of information in a short time-span and it makes the encyclopedia more appealing for generations growing up in a visually stimulating society. But more importantly: Video increasingly reflects the way in which we communicate today. It is both how we capture and create history, it is how we learn and teach, it is where politics takes place. It is therefore an essential part of ‘the sum of all knowledge’.

At Sound and Vision we are committed to making video openly available for others to use and reuse it. We hope that others will join us in that effort.


  • Slides from presentation “Four years of contributing and connecting to Wikipedia” at Wikimania London 2014

  • Blogpost about our first upload using the GLAMwiki Toolset

  • Collections of Sound and Vision on Wikimedia Commons

  • This blogpost is crossposted from Sound and Vision’s R&D blog

Join the Europeana Video Remix competition!

The Europeana Video Remix competition is launched! Students between 13-19 years and their schools are invited to participate in this remix competition for the most captivating video compilation of Europeana’s content. The deadline for submission is 25 May.

How to compete
Participants pick one of the four themes of the competition:

  • 100th Anniversary of World War I
  • 25th anniversary of transformation in Central and Eastern Europe
  • History of fashion and style
  • History of technology and media

Then they need to match this theme with relevant archive material from Europeana and related websites (images, pictures, sounds, videos, as well as other digital objects) and compile a video remix out of them. Participants may download the historical content available in the public domain or under Creative Commons for creative re-use. Submitted videos may be entirely or only partly based on the sources found in Europeana and related websites, for example Open Images. All kinds of artistic forms – animation, graphics, samples, fragments of own videos and private images – are very welcome.

Open Images
Open Images is one of the sources participants can use to search for relevant archive material, either on the Open Images website or in the Open Images collection on Europeana. The Open Images collection contains lots of historic newsreel footage. Just two examples of the sort of videos available on Open Images, one on the history of fashion and style and one on the history of technology and media:

The jury is comprised of representatives from the Europeana Foundation, the National Audiovisual Institute and Johan Oomen from the Netherlands Institute for Sound and Vision. They will select and award the top three remixes submitted individually or as a group work. Each of the authors of the winning remixes will be rewarded with a Fuji Instax mini camera with additional film packs. The most active school, whose students send most videos, will be rewarded with a Panasonic HC-V110 camera.

The competition is organised by the Polish National Audiovisual Institute (NInA) within the framework of the Europeana Awareness project, co-financed by the European Commission.

Impact metrics: Increase in reach and reuse of Open Images

Since the launch of Open Images in 2009 there has been an increase in the reuse and reach of Open Images each year. To demonstrate this we will compare the quantitative results of 2011 and 2012 from Open Images in this blog. To measure is to know!

Visitors of Open Images
In 2011 there were almost 1,600 media files available on Open Images, this has now increased to more than 1,800. We can also see that the number of visitors has increased from 66,000 in 2011 to more than 105,000 in 2012. Of these visitors more than 53,000 were unique visitors in 2011, which increased to 89,000 in 2012. There was also an increase in the number of visited pages: in 2011 almost 207,000 pages were visited and in 2012 nearly 280,000. In 2011 nearly 11,000 videos were played, in 2012 this was close to 16,000. We also know that from July 2012 almost 2,400 media files were downloaded.

Reuse of the Sound and Vision Open Images dataset
Not only the impact generated on the Open Images platform itself is increasing, but the external reuse of material available through Open Images as well. The Sound and Vision videos from Open Images are, for instance, also available on Wikimedia Commons and in Europeana. Since these videos became available in Europeana in May 2012, they were visited 3,900 times by 3,200 unique visitors throughout 2012. Besides these numbers, we have particularly good insight in the external reuse in Wikimedia projects, such as Wikipedia. In 2011 as well as in 2012 nearly 1,600 media files from the Sound and Vision collection were made available for reuse on Wikimedia Commons through Open Images. In December 2011 these files were reused in almost 1,000 articles on Wikipedia, in December 2012 this number had increased to nearly 1,600. In the whole of 2011 these articles generated almost 19,000,000 page views. In 2012 this more than doubled to nearly 40,000,000 (!). In other words, this means that in 2012 a Wikipedia article containing reused media from Sound and Vision was viewed nearly 40,000,000 times.

Besides Wikimedia projects, the data and videos from Open Images are also used more and more for innovative applications. The API from Open Images makes it possible for computers to process the data from the openly available collections. In 2012, the API received 169,000 requests. Creative developers have become even more aware of the existence of Open Images as a great basis for new apps since the Open Culture Data initiative started in 2011. For the open data competitions Apps Voor Nederland (Apps for the Netherlands) and the Open Culture Data competition 2012, seven apps were submitted that used the Sound and Vision dataset on Open Images. Two of these apps won an award: Vistory (winner of Apps voor Nederland 2011) and (winner of the Dutch National Archives award during the Open Culture Data competition 2012). In recent years, a number of other applications have also been developed using the Sound and Vision subset of Open Images, such as Erfgoed in Beeld, Led it Up and Docs on the spot.

Putting the figures in perspective
The current size of the entire audiovisual collection of Sound and Vision is estimated at 750,000 hours. The Polygoon newsreel collection is one of the few subcollections of which Sound and Vision owns the required intellectual property rights to make the material available under an open content license. This subcollection forms the basis of the content that Sound and Vision selects for inclusion on Open Images and is estimated at 500 hours. Currently 110 hours of this collection are available via Open Images. This means that – based on the abovementioned estimated figures – at this point in time 22% of the newsreel collection is available as open content via Open Images, which translates to only 0.015% of the entire audiovisual collection of the institute. The impact of Open Images summarized in this blog post shows that even with a relative modest open content set, substantial impact can be obtained. Starting small in the case of Open Images already lead to great results. Imagine what would happen if we were able to even just release one percent of the entire audiovisual collection as open content. Based on our experience we suggest that institutions that haven’t yet opened (parts) of their collection at least experiment with a small content set, that can easily be made available without restrictions. By measuring the impact and actively promoting reuse, a lot can be learned by GLAMs about the potential of opening the digital doors of our institutions.

Metrics for measuring the impact of cultural datasets
The numbers show that the (re)use of the material on Open Images has increased substantially. The impact of Open Images has proved to be considerable and the external reuse of the open content also sees an increase. In response to the growing need within the cultural heritage field to receive statistics on the impact of the opening up of cultural data sets, Sound and Vision will perform impact analysis research together with Kennisland for Open Culture Data. In order to do so, the data providers from the Open Culture Network, but also international initiatives, are requested to provide data on the impact and reuse of their data sets by filling out a survey. The results of this impact analysis will be made public in the course of 2013.

EUscreen portal on Open Images

EUscreen, the platform for European television history, now has its own portal on Open Images. The portal contains a selection of almost 60 videos from the European television heritage. They are all available for reuse under a Creative Commons license. The videos are from the archives of Sound and Vision (Netherlands), VRT (Belgium), NAVA (Hungary), Cinecittá Luce (Italy) and Televisió de Catalunya (Spain). By creating this portal on Open Images EUscreen wants to give people the chance to reuse footage from the European television history in a creative way.

The main goal of EUscreen is to make European television heritage available online to the public. Since the start in October 2009 more than 40,000 videos, photos and articles on European television history were published on the freely accessible website. The materials are from 28 European partners from 19 countries. One of the four focal points of EUscreen is ‘reuse and creativity’. The launch of the EUscreen portal on Open Images makes it possible for the public to engage with a small selection of the material in a creative way.

The collection of videos in the portal contains a varied selection. For instance, a programme on women in the army from 1977 from the archives of the VRT:

From the archives of Sound and Vision there is a report on the awarding of the TV Prize to Carel Enkelaar in 1961. In the audience is also Dutch television pioneer Erik de Vries:

Besides historical footage the portal also contains more recent videos, like the beautiful images of different places in Catalunya from the archive of Televisió de Catalunya. For example a video on the marine life of the marine reservation of the Medes islands:

For examples of reuse also check the EUscreen blog. There you can find reports on the different remix workshops and the demo page containing data visualizations and ideas to get started yourself. The complete collection, including virtual exhibitions, can be watched online (for free) on


Open Images videos in virtual sports exhibition Europeana

In view of the upcoming Olympic Games the digital heritage portal Europeana also focuses on sports. With photos, videos and objects different aspects of (the history of) sports in Europe are shown in a virtual exhibit. The exhibition is divided into four themes: The Olympic and Paralympic Games, Famous European Sports, Football and Ancient Games.

The Sound and Vision set on Open Images has recently also been made a part of Europeana. In the virtual exhibit two videos from this Sound and Vision set were used for the exhibit, in the parts on tennis and cricket:

These two videos are only two of the many sports related videos that can be found on Open Images (and thus Europeana). You can find the virtual exhibition here: European Sport Heritage

Europeana logo

From steam train to electric train

Fifty videos about the Dutch railway network have now been added to Open Images. In 65 years of Polygoon newsreels the railways have been a recurring news topic. In these years it´s not only about the problems because of delays and failures, but especially about the developments and growth of the railways in the Netherlands. Gradually, an increasing portion of the rail network is electrified. Steam trains are replaced by electric trains. Until these steam-breathing monsters are nowhere to be found in the Dutch landscape. In addition, there are cities and connections added to the rail network. This way, the rail passenger can visit an increasingly amount of spots in the Netherlands.

The electrification of the Dutch railway network is progressing nicely in 1938. Electric trains are driving at various connections and on certain routes diesel trains are being used. Over the years the steam train slowly disappears from the Dutch tracks. This Polygoon newsreel shows how the electrification of the railways is progressing in 1938:

Of course there sometimes is, even in earlier times, something wrong on the railways. Among other things, in 1926 and 1929, a passenger train derails. Accidents regularly occur on guarded, but mostly unguarded railway crossings. The following item shows how road users are being made aware of the dangers of an unguarded crossing.

Not only the trains and railways are topics in Polygoon newsreels. There is also attention for passengers and what objects they leave behind in the trains. From guitars to shoes, the most strange objects are found in trains. Different people are sniffing between the found objects looking for something of their liking:

In the railway museum in Utrecht, people can take a look at the past of the railways. Among other things you can see here the oldest Dutch steam locomotive, called the Eagle. You can admire an old timetable and you can see a model of an electric train driving back and forth on a scale model. Visitors also get a look behind the scenes of the Dutch Railways and the automation of the railways is illustrated.

2000th video on Open Images

With a video on the opening of the Zuid-Beveland railway from 1927 there are now 2000 videos on Open Images. Since the last milestone of 1500 videos some beautiful videos have been added. From the collection of Sound and Vision several videos form the Polygoon archives with themes like women, shipping, typical Dutch, pets, health and care, Hilversum and railways were added. A good example is a video on ms. Versluys, who was the first female pilot to receive her Dutch pilot license in 1930:

Besides videos from the Polygoon archives, there were also videos added from the collection of educational films from Sound and Vision. These films were produced mid-twentieth century by the Stichting Nederlandse Onderwijsfilms (Foundation of Ducth Educational Films). One of the films from the collection is Giethoorn, suitable for geography lessons:

Among the new videos there were also some not from the Sound and Vision archives. For example, Eye Film Institute Netherlands added a number of films from their Bits and Pieces collection. These videos were also the base for the remix contest Celluloid Remix 2: Found Footage. The contest challenged creative persons to work with a.o. the material on Open Images to make new videos. In the winning video, Untitled by Dániel Szöllösi, a number of smartphones are used:

Besides Eye Film Institute Netherlands the Netherlands Media Art Institute also contributed to Open Images. In their own portal various videos on media art can be found, for example The Unified Field van Peter Bogers:

Because of the open licenses of the videos and data on Open Images, they could be used for several projects. During the International Documentary Festival Amsterdam the videos were used for Docs on the Spot. With the Docs on the Spot app visitors could experience the documentary Omzwervingen in de nacht (Marjoleine Boonstra, 2004) on location in a new way. Via Open Images this experience was enriched with images from the past. In January Glimworm won the Apps for the Netherlands competition using the data and videos from Open Images in their Vistory app.

The visibility of the videos from Open Images on the internet has also increased. For example, the Sound and Vision set from Open Images was added to the digital library Europeana. On Wikipedia the videos are used to provide visual images to more than 1,100 entries, not only on the Dutch Wikipedia, but also on more than 60 other language version of Wikipedia. These entries are viewed more than 2,5 million times a month.

Open Images in Europeana

On 1 May a set of more than 1500 videos from the Netherlands Institute for Sound and Vision has been made available in Europeana via Open Images. Europeana brings the digitized collections of European libraries, galleries, museums, archives and audiovisual collections together online. The digital library gives access to 20 million books, films, paintings, museum objects and archival documents of around 2200 different providers.

The Sound and Vision set that has now been added contains a collection of newsreels from the Polygoon neswreels and several other films on the Netherlands in the twentieth century. By making the Sound and Vision set available in Europeana it is now part of a collection of millions of cultural objects, so interesting connections with other objects can now be made. With the API of Europeana the collections are also made available for reuse. During several hackathons the API will be used to develop interesting applications with the collections in Europeana.
Europeana logo

Winner of the Wiki Loves Monuments video prize

In September 2011 the Wiki Loves Monuments contest took place. Thousands of pictures of European monuments were uploaded to Wikimedia Commons. To stimulate the use of videos on Wikimedia Commons,  Open Images also made a prize available for the best video of a monument.

We are proud to announce that the winner of this prize is this French video about an old wallpaper printing machine (built in 1877). The video shows “the 26 colors machine”, famous for being the first one to use 26 colors for printing wallpapers. Such a machine was a moving piece, with gears, paint, paper and men around it: a video is the only way to make it live again. The video – one of a whole set videos – also unfolds various viewpoints, from the tiny details of a golden cylinder to a view of the surrounding building.

It provides something that cannot be shown in a static picture: a sense of both beautiful details and the surrounding building, which is typical of industrial age. Video is incredibly useful for the encyclopaedic purposes of Wikipedia. Whether it is to visualise a chemical reaction, watch lifeforms evolve in their environment, or see how a machine actually works. If an image tells a thousand words, then imagine what you can tell in 24 images a second.

Open Images videos enriched with Open Data

For Sound and Vision, in the context of the Dutch Open Data initiative “Nederland opent Data” (The Netherlands Opens Data), I created the basis for the demo that is described in this post. The demo shows how you can play a video in an enriched context, by linking open data sources to terms that are found in speech transcripts rendered from videos. For the Code Camping event, organized by Open Cultuur Data (Open Cultural Data) I extended the demo with newly linked data sets.

The starting point for this demo application was the reuse and linking of data sets to the Open Images collection, which contains more than 1,500 freely (re)usable videos containing mostly old news items from the ‘20 throughout the ‘80. All of these videos are published using Creative Commons licences.

The basis for the application lies in the use of the speech transcripts, which were generated by using automatic speech recognition (ASR) software (from X-MI) on these videos.

The main idea for the demonstration is to contextualise videos while they’re being watched, in order to provide the user with fun, interesting and unexpected background information about the things that are spoken in the video.


For example: when Philip Bloemendal (the presenter of the news items) – in a video titled: ‘Large parts of Holland completely snowed in’ – talks about: ‘(…) but on several places in Drenthe there (…)’, next to the video, several blocks of information about Drenthe (a province in The Netherlands) are shown. Each of these information blocks gets its data from a specific open data source. For the first prototype the data sources used were (amongst others): Google Maps and Wikipedia. To illustrate this some more: in the example where ‘Drenthe’ was recognized as a concept, the Wikipedia block shows an article about Drenthe; in the Google Maps block the map is zoomed in on the province of Drenthe in The Netherlands.

For the Code Camping event, organized by ‘Hack de Overheid’ (Hack the government), I added two new data sets to the demo: the collections from the Rijksmuseum and the Amsterdam Museum.

How it all works
As mentioned, the main building blocks for this demo are the Open Images videos and the corresponding speech transcripts that are used to link the words that are spoken (in the video) to an exact time code. (Note: Automatic speech recognition software is not perfect, which means that not every word in a speech transcript will exactly match the actual words that were spoken).

Step 1
Because not every word in a sentence is particularly interesting, the first step is to filter out stop words from the speech transcript, such as: articles, prepositions and verb modifiers.

Step 2
In the second step, a script is run on the remaining words to sort them by ‘importance’. Importance in this matter is calculated by combining a preset word score (coming from a special lexicon) with the frequency the word is spoken. In this way, words with a high score and a high frequency will end up high in the list.

Step 3
After sorting, the words are used, in order of importance, as query input for the GTAA thesaurus (used by Sound and Vision) and also for Freebase. The latter is a Google service and offers a big collection of interrelated concepts, containing descriptions from a large variety of domains. Freebase can be seen as an extensive thesaurus containing information from a large number of areas of expertise.

When, after querying, the GTAA or Freebase webservice yields a concept, it is put in a list of candidates. After processing all the words, this list is filtered using a very simple disambiguation algorithm (i.e. whenever the yielded concept is comprised of more than one word, it is taken out of the list).

Step 4
In step 4, each of the GTAA and Freebase concepts from the list of candidates is used for querying the open data webservices, which are:

  1. Google Maps (only queried for location type concepts)
  2. Wikipedia
  3. Amsterdam Museum
  4. Rijksmuseum

Each result returned, will be linked to the time code of the (spoken) word from the speech transcript that was used to find the eventual information.

(For those interested: the collection from the Amsterdam Museum has three different end-points: Adlib, OAI-PMH and SPARQL. For this demo, I used the latter, because, unlike OAI-PMH, it does not require to be harvested and indexed before it can be queried. In any case I thought it was a good idea to play around again with the Semantic Web and refresh my SPARQL skills. For the Rijksmuseum, I first harvested the collection from OAI-PMH and then indexed it with SOLR. This way the collection can be searched using Lucene queries.

Step 5
The last step was to send back the time-coded contextdata back to the browser. I do this by using a JSON object, which in turn I use as input for Popcorn.js to generate events. These events are linked to an HTML5 video player and make sure the right (context) information is shown in the different blocks/panels in the user interface.

Because the processing of these five steps takes around 15-20 seconds per video, I store all of the results in .json files. When opening the demo these files are loaded instead of fetching the data live from the web.

There is still a lot to do
The demo shows what can be done by using concept detection (a.k.a. Named Entity Recognition) in combination with open data sources. For several aspects however (significant) improvements can be made:

Better concept detection
The concept detection as described in this demo could be improved much more. For instance, concepts that comprise of more than one word are not recognized, e.g.: ‘Amsterdam Museum’ now yields two concepts, ‘Amsterdam’ and ‘Museum’, but the actual concept ‘Amsterdam Museum’ is not found.
Moreover, specific Named Entity Recognition (NER) services like DBpedia Spotlight should be investigated (having good results for English) in order to improve results. For Dutch however, it seems it’s an ongoing search for a decent (open source) solution.

Selection of relevant sources for the user
Concerning the relevance of the ‘context information’ that is currently shown to the user, there is still much to think about how to make the best selection of data sources. For instance: why somebody who is watching a video about ‘Holland’s oldest steam-powered pumping station’ would be interested in ‘Hens chalice from the Company of Nine’ (found on the basis of the word ‘Gorinchem’, which is a town in The Netherlands) is something to think about.

Optimizing Popcorn.js usage
The demo was made with an older version of Popcorn.js (v0.7) and therefore doesn’t make full usage of all of the latest features and plugins Popcorn.js has to offer. Future releases of the demo will incorporate the newest version (currently v1.1.1).

In any case the demo does show how speech transcripts of videos can be combined with open data sources and how this can enable (mutual) contextualisation of these sources. For the ‘Nederland opent Data project’ this demo will be further enhanced. Any progress of this will be reported here!

Jaap Blom | Software engineer | R&D department, Netherlands Institute for Sound and Vision