AI agent to Solve Raven’s 2×2 Progressive Matrix

The solving of Raven’s matrix is a problem faced by many computer science and psychology majors. Raven’s matrix is one of the popular test of the human IQ, what we may equate to human intelligence. Raven’s tests consist of a matrix of visual objects that are manipulated between pairs with the last image missing in the last pair, this is the one that needs to be determined from a set of multiple choice options. The key in most raven problems is to determine the transformation of the objects or a group of object to determine what the last object should be. As shown in “2×1 Basic Problem 02”, image A represents a small circle, which image B represents as a large circle. This is your first clue – what changed between image A to B, the size of the same shape. With that clue you would then infer the same on C to ?. The small square in C should likewise change to a large square, so now the obvious answer becomes option 6. In a 2×1 problem there is no need for correlations and grouping of boxes because the problem is in a lateral form, meaning A to B and C to ?. But when considering a 2×2 problem your reasoning needs to be altered. 2×2 problems require that you perform correlation and grouping of the problem space, meaning you need to determine which figure correlate with which other figure before determining the transformations. If you look at “2×2 Basic Problem 02”, one can say the the Fill changed from A to B row-wise, but one can also say the Shape changed from A to C column-wise and both would be correct but that may affect the outcome. This is the basis of this project, to build an artificial intelligence agent that can smartly apply reasoning and logic to solve a set of Raven’s matrix tests, in particular 2×2 matrices.2x2 Raven matrixWhen building an AI agent to solve these tests, It is important to first determine how the input for the test would be passed to the agent. Fortunately for this project the inputs will be done via textual representation, as shown in the illustration “2×1 Basic Problem 02”. All the visual tests have already been decomposed to a textual representation, this is parsed by the calling application and sent to the agent via objects that represent the problem set which is the entire set of figures and object contained in the problem set, including the answer options. The agent implements it’s solving methodology in five stages. Stage one, groups figures together by measuring correlative correctness. Stage two employs a smart generator to generate frames – referred to as “Comparison Sheets” throughout the paper. While stage three uses a tester to compare these comparison sheets for correlative correctness. Stage four works in concert with stage three by comparing extra non-intuitive observable traits, which is used mostly used as tie-breakers and step five compares the scores of the tester and picks the highest score as the answer. propositional representationWe begin our journey by first establishing some basic axioms – all comparison and most of the operations are done on pairs of objects, figures represent as a box (as in the illustrations) A,B, C, 1, 2 etc. Objects or Shapes represent the actual items inside the figures (boxes). The AI agent will first attempt to correlate figures and score them to determine the grouping of A to B row-wise vs C to ?, or  A to C column-wise vs B to ? or both. This correlation is done by looking at the the attributes on each object in the figure A then comparing these attributes one by one to all the attributes of the objects in the other two figures B and C, while scoring for correlative correctness and shape consistency. So a square in A, a square in B and a triangle in C means that A and B is more correlated than A and C because A->B maintains the shape: square. Once this is determined the AI agent can then conclude that the it needs to determine transformations from  A to B then infer these on C to ?. Once the AI agent determines the objects that needs to be grouped, it renames them in working memory using their ordinal for ease of processing by naming the first figure – 1 and second figure – 2. This removes the static naming of A & B and makes the agent more dynamic. This solves the problem of correlation and grouping.

The agent then continues by first observing the transformations that occur between the objects from 1 to 2 (A to B in this case), It uses these observable transformations and builds a “comparison sheet” in working memory. It then looks at the remaining figure in the question (C in this case) and renames it to 1 in working memory for ease of comparison processing. The agent then builds comparison sheets for C (now known as 1) to every answer option (C to 1, C to 2, C to 3 etc), which it also stores in working memory. Armed with all this knowledge in working memory it compares the comparison sheets from 1 to 2 (A to B) with the sets from 1 (C) to N[1,2,3,4,5,6], one at a time and scores those transformations and attributes that match exactly with 1 to 2 (A to B). Apart from that it also looks at a few other non-intuitive observable traits eg. did all the objects change to another type of object, did the location change, are all the object consistent between the question and the proposed answer, compares and scores those also. Finally the scores are ranked and the pair (C to ?) with the highest correlation score gets elected as the most likely answer option.

Comparison sheets are generated by the smart generator from observables and non-intuitive traits. If you look at “Comparison sheet from A to B”, you will notice the renaming of the figures and underlying objects to their ordinal 1, 2, 3 etc. This example in particular shows that there’s one object in both figure 1 and figure 2 denoted by 1.1 and 2.1 It also shows that three transformations were detected and added to the sheet prefixed with “tf-”. So between figure 1 and 2, the angle changed by -270 and the shape changed. Also notice the the type of shape for the shape change was not noted on the transformation as this offers no bearing on the answer, what is important here is because of the shape change the tester can then infer that the shape in the answer must be different from the shape in question.comparison sheetFor this demonstration I will use a longer sheet below and will also refer to the figures and objects with their original name for ease of explanation. Stage three, this is where the smart tester takes the “Comparison Sheets” and compares them against each other, while scoring them for correctness. It does this by comparing sheets from A to B with C to N[1,2,3,4,5,6]. So A.Z.fill: no on sheet (A to B) should match A.Z.fill on sheet (C to 6) and so forth. In stage four the smart tester uses logic and deeper reasoning to infer the answer. For example if “tf-count_changed = no” then the amount of objects in C should be the same amount of objects in the answer. Furthermore if “tf-count_changed = yes” then tf-objects_added and tf-object_deleted is consulted to infer the quantity of objects expected in the answer. If there’s a change in the angle between figures then the smart tester, does not compare explicit angles between objects on the sheets but instead compares the angle difference between the comparing objects and also tries to infer what the new angle should be. For example if the angle between related objects change from 45 to 135, the tester infers that the answer should also reflect a 90 degree angle change. These intuitive checks also augment the score by adding 1 for every positive test result.

comparison sheet 2

In the final stage five the tester ranks all scores from the previous stage and takes the highest one. In this question the #6 had the highest score of 9.

scoreIn summary the agent uses the generate and test method to solve these problems and employs the use of production rules to create a smart tester and a smart generator. With this methodology the agent was able to solve 65% of the 20 Basic problems tests in 2 seconds. The only thing that would increase the processing time is the size of the problem, if the problem contains many objects the agent would take a little longer (nanoseconds) but this is not noticeable to the user, this is because the agent solves the problems in a procedural fashion. If we were to compare the agent’s reasoning to human cognition, like human cognition the agent uses observations and forms base conclusions from these observations. It then augments these conclusions based on other tests, just like we do when we look at these problems. With humans we tend to first try to figure out what forms our base comparison group, A & B or A & C, the agent does that also. Once we determine that the group is lets say A to B, we then try to figure out what’s different between them, what are the transformations, the agent models this using the analogy of “Comparison Sheets”. We then tend to look at the answers and compare them with the remaining figure ( C ) and determine which one of the answers closely resemble the transformations from A to B. The option that has the closest correlation is normally the one we choose, the agent does the same, it even goes as far as to have a second possible answer, but unfortunately there is no option for a second guess in this project.
However, there are weaknesses to my design. One of these weakness is the agent does not use long term memory of past questions, it relies only on its production rules and working memory between the smart generator and tester to determine the answer, this I hope to change in the figure by storing the chosen answer to each problem and the problem itself so the agent can look up. Furthermore there are two problems where the agent scored all the answer options the same and there were no perceivable tie-breaker so by default it chooses the last option. This I believe is caused because of the ordinal naming of the objects in the figures. Conversely the strength of this design is the modularity in which it is implemented using definitive stages. There’s is a clear distinction between the stages and what should be passed between stages. The functionality of each stage can be improved independently and not adversely affect the other stages, because of the modular design and the use of the “Comparison Sheet” that is passed between the modules. I believe given more time the agent can be improved with long term memory, better object correlations and deeper knowledge and analysis of shapes and changes in angles. Most of all this was an excellent project that kept me on the edge of my seat, my fingers glued to the keyboard punching out code to make my agent smarter and more efficient and some moments of pulling out my hair and wanting to throw the computer out the window, but moreover it made me think deeper on how we think, use knowledge and reasoning to solve problems as humans.

2 thoughts on “AI agent to Solve Raven’s 2×2 Progressive Matrix

  • September 10, 2015 at 11:17 am

    Could you help me out with this problem please? I’m stuck and can’t get to begin how to implement a solution to RPM. It’s just that there are so many knowledge representations and possible problem solving methods and none seem too suited to this to me.
    So i’d appreciate if you could help me out with pseudo code or something. Please let me know.

  • June 30, 2016 at 4:23 pm

    i am trying to get this solve but unable to do.. can you please help and let me know


Leave a Reply

Your email address will not be published. Required fields are marked *

This blog is kept spam free by WP-SpamFree.