AI agent to Solve Raven’s 2×2 Progressive Matrix

AI agent to Solve Raven’s 2×2 Progressive Matrix

The solving of Raven’s matrix is a problem faced by many computer science and psychology majors. Raven’s matrix is one of the popular test of the human IQ, what we may equate to human intelligence. Raven’s tests consist of a matrix of visual objects that are manipulated between pairs with the last image missing in the last pair, this is the one that needs to be determined from a set of multiple choice options. The key in most raven problems is to determine the transformation of the objects or a group of object to determine what the last object should be. As shown in “2×1 Basic Problem 02”, image A represents a small circle, which image B represents as a large circle. This is your first clue – what changed between image A to B, the size of the same shape. With that clue you would then infer the same on C to ?. The small square in C should likewise change to a large square, so now the obvious answer becomes option 6. In a 2×1 problem there is no need for correlations and grouping of boxes because the problem is in a lateral form, meaning A to B and C to ?. But when considering a 2×2 problem your reasoning needs to be altered. 2×2 problems require that you perform correlation and grouping of the problem space, meaning you need to determine which figure correlate with which other figure before determining the transformations. If you look at “2×2 Basic Problem 02”, one can say the the Fill changed from A to B row-wise, but one can also say the Shape changed from A to C column-wise and both would be correct but that may affect the outcome. This is the basis of this project, to build an artificial intelligence agent that can smartly apply reasoning and logic to solve a set of Raven’s matrix tests, in particular 2×2 matrices.2x2 Raven matrixWhen building an AI agent to solve these tests, It is important to first determine how the input for the test would be passed to the agent. Fortunately for this project the inputs will be done via textual representation, as shown in the illustration “2×1 Basic Problem 02”. All the visual tests have already been decomposed to a textual representation, this is parsed by the calling application and sent to the agent via objects that represent the problem set which is the entire set of figures and object contained in the problem set, including the answer options. The agent implements it’s solving methodology in five stages. Stage one, groups figures together by measuring correlative correctness. Stage two employs a smart generator to generate frames – referred to as “Comparison Sheets” throughout the paper. While stage three uses a tester to compare these comparison sheets for correlative correctness. Stage four works in concert with stage three by comparing extra non-intuitive observable traits, which is used mostly used as tie-breakers and step five compares the scores of the tester and picks the highest score as the answer. propositional representationWe begin our journey by first establishing some basic axioms – all comparison and most of the operations are done on pairs of objects, figures represent as a box (as in the illustrations) A,B, C, 1, 2 etc. Objects or Shapes represent the actual items inside the figures (boxes). The AI agent will first attempt to correlate figures and score them to determine the grouping of A to B row-wise vs C to ?, or  A to C column-wise vs B to ? or both. This correlation is done by looking at the the attributes on each object in the figure A then comparing these attributes one by one to all the attributes of the objects in the other two figures B and C, while scoring for correlative correctness and shape consistency. So a square in A, a square in B and a triangle in C means that A and B is more correlated than A and C because A->B maintains the shape: square. Once this is determined the AI agent can then conclude that the it needs to determine transformations from  A to B then infer these on C to ?. Once the AI agent determines the objects that needs to be grouped, it renames them in working memory using their ordinal for ease of processing by naming the first figure – 1 and second figure – 2. This removes the static naming of A & B and makes the agent more dynamic. This solves the problem of correlation and grouping.

The agent then continues by first observing the transformations that occur between the objects from 1 to 2 (A to B in this case), It uses these observable transformations and builds a “comparison sheet” in working memory. It then looks at the remaining figure in the question (C in this case) and renames it to 1 in working memory for ease of comparison processing. The agent then builds comparison sheets for C (now known as 1) to every answer option (C to 1, C to 2, C to 3 etc), which it also stores in working memory. Armed with all this knowledge in working memory it compares the comparison sheets from 1 to 2 (A to B) with the sets from 1 (C) to N[1,2,3,4,5,6], one at a time and scores those transformations and attributes that match exactly with 1 to 2 (A to B). Apart from that it also looks at a few other non-intuitive observable traits eg. did all the objects change to another type of object, did the location change, are all the object consistent between the question and the proposed answer, compares and scores those also. Finally the scores are ranked and the pair (C to ?) with the highest correlation score gets elected as the most likely answer option.

Comparison sheets are generated by the smart generator from observables and non-intuitive traits. If you look at “Comparison sheet from A to B”, you will notice the renaming of the figures and underlying objects to their ordinal 1, 2, 3 etc. This example in particular shows that there’s one object in both figure 1 and figure 2 denoted by 1.1 and 2.1 It also shows that three transformations were detected and added to the sheet prefixed with “tf-”. So between figure 1 and 2, the angle changed by -270 and the shape changed. Also notice the the type of shape for the shape change was not noted on the transformation as this offers no bearing on the answer, what is important here is because of the shape change the tester can then infer that the shape in the answer must be different from the shape in question.comparison sheetFor this demonstration I will use a longer sheet below and will also refer to the figures and objects with their original name for ease of explanation. Stage three, this is where the smart tester takes the “Comparison Sheets” and compares them against each other, while scoring them for correctness. It does this by comparing sheets from A to B with C to N[1,2,3,4,5,6]. So A.Z.fill: no on sheet (A to B) should match A.Z.fill on sheet (C to 6) and so forth. In stage four the smart tester uses logic and deeper reasoning to infer the answer. For example if “tf-count_changed = no” then the amount of objects in C should be the same amount of objects in the answer. Furthermore if “tf-count_changed = yes” then tf-objects_added and tf-object_deleted is consulted to infer the quantity of objects expected in the answer. If there’s a change in the angle between figures then the smart tester, does not compare explicit angles between objects on the sheets but instead compares the angle difference between the comparing objects and also tries to infer what the new angle should be. For example if the angle between related objects change from 45 to 135, the tester infers that the answer should also reflect a 90 degree angle change. These intuitive checks also augment the score by adding 1 for every positive test result.

comparison sheet 2

In the final stage five the tester ranks all scores from the previous stage and takes the highest one. In this question the #6 had the highest score of 9.

scoreIn summary the agent uses the generate and test method to solve these problems and employs the use of production rules to create a smart tester and a smart generator. With this methodology the agent was able to solve 65% of the 20 Basic problems tests in 2 seconds. The only thing that would increase the processing time is the size of the problem, if the problem contains many objects the agent would take a little longer (nanoseconds) but this is not noticeable to the user, this is because the agent solves the problems in a procedural fashion. If we were to compare the agent’s reasoning to human cognition, like human cognition the agent uses observations and forms base conclusions from these observations. It then augments these conclusions based on other tests, just like we do when we look at these problems. With humans we tend to first try to figure out what forms our base comparison group, A & B or A & C, the agent does that also. Once we determine that the group is lets say A to B, we then try to figure out what’s different between them, what are the transformations, the agent models this using the analogy of “Comparison Sheets”. We then tend to look at the answers and compare them with the remaining figure ( C ) and determine which one of the answers closely resemble the transformations from A to B. The option that has the closest correlation is normally the one we choose, the agent does the same, it even goes as far as to have a second possible answer, but unfortunately there is no option for a second guess in this project.
However, there are weaknesses to my design. One of these weakness is the agent does not use long term memory of past questions, it relies only on its production rules and working memory between the smart generator and tester to determine the answer, this I hope to change in the figure by storing the chosen answer to each problem and the problem itself so the agent can look up. Furthermore there are two problems where the agent scored all the answer options the same and there were no perceivable tie-breaker so by default it chooses the last option. This I believe is caused because of the ordinal naming of the objects in the figures. Conversely the strength of this design is the modularity in which it is implemented using definitive stages. There’s is a clear distinction between the stages and what should be passed between stages. The functionality of each stage can be improved independently and not adversely affect the other stages, because of the modular design and the use of the “Comparison Sheet” that is passed between the modules. I believe given more time the agent can be improved with long term memory, better object correlations and deeper knowledge and analysis of shapes and changes in angles. Most of all this was an excellent project that kept me on the edge of my seat, my fingers glued to the keyboard punching out code to make my agent smarter and more efficient and some moments of pulling out my hair and wanting to throw the computer out the window, but moreover it made me think deeper on how we think, use knowledge and reasoning to solve problems as humans.

Bus Tracking Estimation

Bus Tracking Estimation

Authors: Kimanii Daniel, Kaushal Mehta, Gregory Bell


All major cities, our schools, and even theme parks like Disney World use forms of mass transit to move people from one location to another. A bus system usually has some form of schedule printed on arrival times at certain stops. These schedules are best guess on times and creating accurate schedules with live feeds is an expense few transportation departments can afford. Our project uses crowd sourced GPS location to estimate the time it will take for a bus to arrive at a given bus stop. A bus route was followed to create GPS points, breadcrumbs, for the route in the project. The bus starts with a person using software, the first passenger being the bus driver. Additional GPS data from passengers adds to the weight of the location based on the breadcrumb. The information returns estimates to future passengers based on the buses location and direction from the breadcrumb points. The method provided improved the accuracy of arrival times over the scheduled time by an average of 2 minutes.


Real time tracking of movement and estimations is becoming more and more popular as devices utilizing the Global Positioning System (GPS) becoming more readily available. There are a multitude of proposed uses for this new technology from tracking pets, to tracking loved ones to even as advanced as guiding planes to their destinations. We decided to leverage this technology to solve a very difficult problem of tracking the location of a bus on a bus route and estimating the time of arrival to a bus stop location. This issue has plagued passengers for years and many organizations have purported various options to attain a solution. None of these solutions are perfect or accurate. Furthermore, they require a high initial investment by the transportation board or private transportation agency to install GPS devices on vehicles to increase accuracy of estimations.

We propose an alternative more cost effective approach using crowd sourced GPS devices in concert with official bus schedules to better estimate the location of the bus and the time of arrival to a bus stop. The important elements in this solution is relying on the altruistic nature of passengers to share information for the better of all. This will all work using the GPS enabled phones of the passengers and a centralized server. The idea is to build a mobile app that allows the passengers to look up information on the ETA of the bus and also let the server know when they have boarded the bus. The mobile app will then begin transmitting the location of the device every 10 seconds to the server. Using known GPS locations on the route and the GPS readings from the devices, the server will build a probability distribution be measuring each GPS reading against the known locations and modifying the distribution. The known locations on the bus route we refer to as breadcrumbs.

Design Methodology

We defined our solution to solve two problems:
1. Localization of the bus which includes position and direction
2. Estimation of arrival time to a specified bus stop

Localizing the bus was pivotal to our estimations of arrival time, after-all if you don’t know where the bus is it’s impossible to estimate how long it will take to arrive. To help with localization we employed a modified particle filter concept and Bayes theorem. A known trail of GPS points on the route which we coined the breadcrumbs would be used to represent weighted particles in the filter. The bus will be tracked by these

breadcrumbs, meaning we will estimate which breadcrumb the bus is closest to and use that breadcrumbs exact known location as the position of the bus. The breadcrumbs on the route were created by driving the route and using a Google location tracker app for android called “My Tracks”. This app allowed us to take GPS points every second, which was then used to form a breadcrumb of GPS locations on the route. The appropriate breadcrumb points closest to real bus stops were then tagged as bus stop breadcrumbs. This is used to estimate the distance from localized bus location to the bus stop in question. We also calculated the distance between breadcrumb points using the GPS Longitude and Latitude points, these calculations were stored to speed up overall estimations processing. This distance forms part of the equation to calculate the Estimated Time of Arrival.

Breadcrumb data pointsThis solution is heavily dependent on crowd sourced GPS readings from devices traveling on the bus. The idea here is that users will share their location with our system once on the bus. We will store poll and store these locations every 10 seconds. Localizing the bus was solved by first creating a uniform probability distribution for all the breadcrumb locations and setting the weight of each breadcrumb point to 1/count(breadcrumbs). So in our sample dataset we have 31 breadcrumb points on one route, this would weight each breadcrumb point to 0.0322581. We then employed a weighting formula to augment the uniform distribution by multiplying the prior weight of a breadcrumb particle to the distance between the device GPS reading and that particle, the result of which is then subtracted from that prior weight. We do this iteratively for each device and each breadcrumb point, slowly changing the probability distribution. This is done cumulatively using this formula:

Posterior weight = Prior weight – (Prior weight * breadcrumb distance)

This formula ensures that the GPS readings that are closer to a breadcrumb are weighted higher. The GPS data is re-sampled for the new position and checked with the breadcrumbs to weight the location as the bus travels the route. As the system iterate through all the readings applying this formula cumulatively our belief of where the bus is will be closer to the breadcrumb that they’re all clustered closer to. Over time and with more independent device GPS readings the better the estimation of the location of the bus and the ETA.

Breadcrum probability distribution

If you look at the sample data above, you will see the breadcrumb data points and the probability distribution. The distribution starts with a Uniform weight of 0.0322581, the distance is calculated for each passenger GPS point and the breadcrumb GPS point. This distance is then used to determine the posterior weight of that breadcrumb point (using the formula above). For passenger 2, the distance of their device from all the points are calculated and the weight of that point is factored into the formula using the prior weight from passenger 1. This is repeated until there are no more GPS device readings for the period. the breadcrumb point with the highest weight is essentially closest point to the bus.

The direction the bus is traveling is very important for ETA calculations. Since the breadcrumb points are ordered 1 to 31 on the route we can look at the order of the last five localizations by descending weight, this will let us know if the bus is going to or away from the bus stop in question.

Estimation of arrival time now becomes a process of transposing the physics formula for Average Speed:

Time (ETA) = Total Distance / Average Speed
Distance – Summation of breadcrumb intra-distances
Average Speed – The average speed on all the GPS device readings

Here, distance was calculated by summing the distances of each breadcrumb point from the localized bus location to the bus stop taking into consideration the direction of the bus and composition of the route. The composition of the route proved to be an important factor in our estimation, because if the route is cyclic the calculation is different compared to if its not. The route can be cyclic as in the bus ends at the same location it start but using non-overlapping roads where as non-cyclic or linear bus routes use the same roads to go and come back to the main station. Therefore if the bus is going away from the bus stop in question then the breadcrumbs leading away from the bus stop needs to be calculated before calculating the breadcrumbs towards the bus stop. Remember these distances are stored so a simple query can yield this results

Now there’s the problem of average speed, was solved this by tracking not only the GPS location of the devices but also the time it was taken. Using a transposition of the same formula before the speed was determined

Average Speed = Total Distance / Total Time

So for example if the bus took 10mins (600 seconds) to travel 1000 meters then the average speed would be 1.67 meters / second. The ETA will update and become more accurate over time and as more devices are sharing their locations on the bus.

Proposed Usage

The main interface for this system will be via a mobile app. Users will choose the bus stop they want to go to and the bus stop they’re currently on. This is transferred to a server that then uses this information to estimate where the bus is and estimated time of arrival. It will then display the time of arrival to the user. If the server has no GPS device to collect readings from (no one is sharing their location or no one is on the bus) then it will failover to the established bus route schedule. Once the bus arrives and the user board the bus, the app will detect that the bus is in close proximity to the user and ask the user if they’re on the bus. If the user answers in the affirmative, then the app will start sending the location of the device to the server which will be used in further localizing the bus for other passengers waiting for the bus.

Considerations and Limitations

There were a few issues we had to take into consideration when designing this system. The system was designed to track one bus, we have not tested the viability of tracking multiple buses using this system.


The tracking project provided a very practical approach to a real world problem that passengers experience when using public transportation. We learnt that by using crowd sourced GPS points and known GPS locations on the bus route we where able to transform a uniform probability distribution to solve the problem of estimating how long the bus will take to arrive. This solves a huge problem in the transportation industry. We have noticed pedestrians at bus stops checking their watch or running after a bus that was missed be seconds. Our approach is on a small scale but If transportation agencies used a larger approach to this project, then it would be beneficial to their customers and public opinion of the agency. This project determined the solution is viable and proposed a more accurate, low cost estimation system primarily for passengers.