TAPe Tackles Every Computer Vision Task More Effective Than Any Other Tech Out There — Here’s Real World Evidence

We discovered and developed the Theory of Active Perception, or — in short, TAPe, — which models the operation of innate human perception mechanisms and mathematically describes the Language of Thought (to learn more of the concept, save this Stanford explainer for later). Technologies based on TAPe operate meaningful patterns, not arrays of structurally unrelated digits — zeros and ones — of the binary system, as it is done in standard IT systems. Such a pattern, or a T-bit, as we call it, unlike the binary element, is a subset of the most informative interconnected elements of information, which allows for much more significant information to be transmitted in a single T-bit, and for any class of tasks, computational operations are reduced by orders of magnitude.

T-bit is a subset of the most informative interconnected elements of information, which reduce computational resources by orders

of magnitudes

We have already applied TAPe principles in computer vision technologies, both in real products and pilot projects for various clients. With TAPe, we have solved the tasks that currently standard technologies would require so many resources — financial, technical, human — that it would become unprofitable and inefficient from a business perspective. Here are a few examples of how TAPe makes the impossible possible.
Searching video by video on a video streaming platform
Searching video by video on a video streaming platform
Task
Task
In response to a keyword request input by a user (e.g. a film name, a director, an actor/actress, a country of production, a genre, a year, etc.), online cinema wants to return a compilation of the most popular movie scenes covering that request.
How we solved the task using TAPe
How we solved the task using TAPe
The popularity of the scenes can be determined using YouTube as the largest video hosting site in the world. Popularity in this case is how often certain scenes from a movie are used in YouTube videos (video compilations, UGC videos, reviews, etc.).

To manually solve the task, one would have to go through all the links related to user request, count how many times each video has used particular scenes from the original movie/director/etc., and then assemble the compilations of the most popular ones.

For each movie we formed a semantic core, which is dozens of keywords for YouTube search returning together an average of 500−1000 links, or about 30 thousand minutes of video per movie. All the links were indexed by our system. Then the system compared all the linked videos with the reference movie, and selected the scenes from the reference movie that were used most often in the linked videos. Finally, it assembled a compilation of the 20−30 most popular scenes for each movie.

It takes TAPe-based search few seconds to look through all YouTube links related to user search request, compile and return a video rating

Indexing the video takes a certain amount of time, but the system takes only a few seconds to compare the template with the links and create the rating. The video index takes up 1 MB per hour of video. The system can operate on a regular server with ordinary specifications.
Digital Asset Management for adult content hosting
Digital Asset Management for adult content hosting
Task
Task
To check if a video uploaded by a user is original (unique) by matching in with other videos in the database in order to prevent uploading doubles and paying authors for unauthorized content.
The video database, according to the client, consists of 5 million videos. On average, one video is 15−20 minutes long. The videos are uploaded at a rate of 10−12 films per minute.

It is obvious that each video needs to be checked for uniqueness in the process of uploading — it is 5−6 seconds per video to make it happen. In this 5−6 seconds we need to compare an uploaded video with the entire database of 5 million other videos and detect a duplication of any duration, if it exists.

In addition to repetitions, within those 5−6 seconds we also needed to detect fraud attempts masking duplicates: edits, re-edits, mirroring frames, changing resolutions, aspect ratios, adding noises, etc.
How we solved the task using TAPe
How we solved the task using TAPe
First, we indexed the entire client video database. Because the client didn’t want to give the 3rd party access to its video content, we agreed on developing a converter that would transform the content into TAPe-based formatted index. The index couldn’t be converted back into video. The client passed the index to us, and we were able to work with it.

From a technical perspective, to solve the task we came up with an architecture requiring only 8 servers — it was enough to process each uploaded video in real-time and analyze for duplication comparing with other videos in the database. As soon as a new video was added, it was immediately indexed by the system: first on the client side, then the video index was sent to us, and we checked for a complete or partial match. All that took 5−6 seconds, in which the new video was compared to the entire ~5M titles video archive, and duplicates were (or were not) found.

In 5−6 seconds of user uploading a video, TAPe-based system compares it with ~5M videos for all kinds of plagiarism

Eight servers were enough to keep the entire 5M video index for quick access and parallel processing of the necessary computations. No additional requirements for the internet, hosting, etc. were required.
Real-time TV ad monitoring in multiple TV channels
Real-time TV ad monitoring in multiple TV channels
Task
Task
To monitor the broadcast of TV ads on TV channels airing in different cities, regions, and countries so that advertisers can control the media plan execution.

The specifics of the Russian market where we launched the tool is that up to 70% of regional TV ads were not monitored at all: there was a huge number of cities that were not included in the panel for television viewership tracking. In those cities, an advertiser could not control the media plan compliance of the ad broadcasting — as a result, according to our statistics, at least 20% of all the ads on regional TV were broadcasted with violations: either late, or not at all, hence — the lower numbers of target audience reached (or none at all). Moneywise, the total spending for showing those non-monitored ads was about $ 240 million. The advertisers had no effective answers to the questions, whether the ad was broadcasted, how many times and exactly when (what time) was it.

Of course, you could put 3−4 or even better 10 employees in each city who would watch TV and check the ads compliance with the media plan 24/7. However, this solution would be extremely expensive, slow and inefficient.

There are technical solutions on the market that allow you to monitor TV air in one way or another, but they have a lot of limitations. Firstly, they are most likely not related to video. For example, in China, they use watermarking technology: since the solution is industry-specific, you can agree with all broadcasters to use the same watermark. Others monitor and identify content by sound, by fingerprint technology, etc. All these methods have their own disadvantages and limitations. Secondly, such solutions are enterprise-scale, meaning they are heavy solutions (sometimes also called "refrigerators" due to their visual similarity): a rack of expensive servers that require special conditions for operation and are not available in all data centers. A whole software-hardware complex created to solve specific tasks. All of this is very expensive and complex — but is not always capable of efficiently solving the task of monitoring just one television channel in a small town where you can’t put a "refrigerator."
How we solved the task using TAPe
How we solved the task using TAPe
To solve this problem, we deployed infrastructure in 150 cities in five countries that allowed us to monitor TV air in uncovered regions and organized monitoring of 1000 TV channels. The system we built was easily scalable: at any time, it was possible to increase both the number of cities and the number of TV channels.

In each city, we installed servers that recorded from 1 to 10 channels. Each server created an index of the TV signal plus formed a video archive. Only the index was sent to the central server, where it was almost instantly compared with the reference database. In this case, the references were the client’s video ads. In total, the database had up to 50 thousand video ads, which posed no problem in terms of system’s performance. For client needs the indexed database was stored on the server for one year, the video database for no more than 3 months. This made it possible to check in real-time whether the client’s advertisement was aired and, if so, when exactly.

TAPe-based TV monitoring system processed 1000 mins of video per minute — 40% of what YouTube does — with a tiny fraction of YouTube’s resources

In this form, the system processed 1000 minutes of video per minute. For a reference, at that time, the YouTube video uploading volume was 2500 minutes of video per minute, meaning we processed the volume of video comparable to 40% of the YouTube load.

For Russia, this was a unique product. On the global market, there are solutions that also offer ad monitoring, however, the efficiency of TAPe allowed us to create a solution that is hundreds of times cheaper than our competitors in terms of cost.
National TV channel local (200 cities) rebroadcast monitoring
National TV channel local (200 cities) rebroadcast monitoring
Task
Task
A more specific task that arose from the previous case: monitoring and analyzing regional TV rebroadcasts of a national channel to check the compliance with the media plan of a specific TV company. Here, the client is no longer the advertiser, but a TV broadcaster.

It is necessary to monitor the TV air in 200 cities (as in the previous case, there may be more). The main — reference — TV signal is retransmitted from the center to regional cities, where local content — news, ads, etc. , — can also be inserted into the broadcast. For our client it was particularly important to make sure that specific ads were broadcasted at specific times, since this affected the company revenue, or fines in case of violations tracked by advertisers.

An additional condition was to report technical failures in the broadcast. There were four main (reference) TV signals due to the time zones.
How we solved the task using TAPe
How we solved the task using TAPe
We developed a system that connects to the TV frequency and instantly indexes TV signals, allowing real-time monitoring of broadcasts 24/7.

To operate the system, we deployed a quite complex infrastructure: satellite dishes were installed to receive four standard TV signals, and servers were installed in all 200 cities to record the broadcasts in each city. Digital, cable, and even analog TV signals were recorded. On each of the 200 servers, the TV signal was indexed into a TAPe format in real-time and sent to the central server, where it was checked for compliance with the media plan. The regional servers were also remotely managed from a central location.

Every minute we received 4 minutes of standard signal (one minute from each of the four reference sources) and 200 minutes of regional signal (one minute from each city), which needed to be compared to the references. Only the index was sent to the central server, while the video recordings of the broadcasts were stored on the regional servers.
***
***
We described four use cases that hopefully give some idea of the possibilities of TAPe-based technologies. This is just a tiny part — even if we only talk about the possibilities of TAPe in the field of working with video and solving any computer vision problem, including developing a new video and image format, codec.

We are confident that TAPe is applicable far beyond computer vision technologies: new principles of building and architecture of neural systems are possible, development of new IT devices, including video cards and computer processors, new data storage systems, etc.