Property-based testing of Plutus contracts

Plutus comes with a library for testing contracts using QuickCheck. Tests generated by this library perform a sequence of calls to contract endpoints, checking that tokens end up in the correct wallets at the end of each test. These sequences can be generated at random, or in a more directed way to check that desirable states always remain reachable. This tutorial introduces the testing library by walking through a simple example: a contract that implements a guessing game.

An overview of the guessing game

The source code of the guessing game contract is provided as an example here, and the final test code is here.

The game is played as follows:

  • The first player locks a sum of Ada in the contract, which is donated as a prize. The prize is protected by a secret password.

  • Any player can now try to guess the password, by submitting a ‘guess’ transaction that attempts to withdraw some or all of the prize, along with a guess at the password, and a new password to replace the old one. If the guess is correct, and the contract contains enough Ada, then the guesser receives the withdrawal and the remainder of the prize is now protected by the new password. If the guess is wrong, the transaction is not accepted, and nothing changes.

As an extra wrinkle, when the first player locks the prize, a new token is also minted. Only the player currently holding the token is allowed to make a guess–which gives us an opportunity to illustrate minting and passing around tokens.

The generated tests will exercise the contract by locking the prize, then moving the game token and making guesses at random, checking that the game token and Ada move as they should.

Emulated wallets

To test contracts, we need emulated wallets. These and many other useful definitions for testing can be imported via

import           Plutus.Contract.Test

Now we can create a number of wallets: in this tutorial, we’ll settle for three:

wallets :: [Wallet]
wallets = [w1, w2, w3]

Values and tokens

Wallets contain ‘values’, which are mixtures of different quantities of one or more types of token. The most common token is, of course, the Ada; we can import functions manipulating Ada, and the Value type itself, as follows:

import qualified Ledger.Ada                         as Ada
import           Ledger.Value

With these imports, we can construct values in the Ada currency:

> Ada.lovelaceValueOf 1
Value (Map [(,Map [(,1)])])

We will also need a game token. After importing the Scripts module

import qualified Ledger.Typed.Scripts               as Scripts

we can define it as follows, applying a minting policy defined in the code under test (imported as module G):

import           Plutus.Contracts.GameStateMachine  as G
gameTokenVal :: Value
gameTokenVal =
    let sym = Scripts.forwardingMintingPolicyHash G.typedValidator
    in G.token sym "guess"

The value of the token is (with long hash values abbreviated):

> gameTokenVal
Value (Map [(f687...,Map [(guess,1)])])

We can even construct a Value containing an Ada and a game token:

> Ada.lovelaceValueOf 1 <> gameTokenVal
Value (Map [(,Map [(,1)]),(f687...,Map [(guess,1)])])

If you inspect the output closely, you will see that a Value contains maps nested within another Map. The outer Map is indexed by hashes of minting policy scripts, so each inner Map contains a bag of tokens managed by the same policy. Token names can be chosen freely, and each policy can manage any number of its own token types. In this case the game token is called a “guess”, and the script managing game tokens has the hash f687… A little confusingly, the Ada token name is displayed as an empty string, as is the hash of the corresponding minting policy.

Introducing contract models

We test contracts using a model of the system, which includes the state(s) of all the agents involved–the on-chain state, the off-chain state on your computer, the off-chain state on anyone else’s computer–everything relevant to the contract(s) under test. The first job to be done is thus defining that model. To do so, we import the contract modelling library

import           Plutus.Contract.Test.ContractModel

and define the model type:

data GameModel = GameModel

This definition is incomplete: we shall fill in further details as we proceed.

The GameModel type must be an instance of the ContractModel class, which has an associated datatype defining the kinds of actions that will be performed in generated tests.

instance ContractModel GameModel where

    data Action GameModel = Lock      Wallet String Integer
                          | Guess     Wallet String String Integer
                          | GiveToken Wallet
        deriving (Eq, Show)

In this case we define three actions:

  • a Lock action to be performed by the first player when starting the game, containing the player’s wallet (from which the Ada will be taken), the secret password, and the prize amount.

  • a Guess action to be performed by the other players, containing the player’s wallet (to receive the prize), the player’s guess, a new password, and the amount to be claimed if the guess is right.

  • a GiveToken action, to give the game token to a player so they can make a guess.

A generated test is called Actions, and is, as the name suggests, essentially a sequence of Action values. We can run tests by using propRunActions_:

prop_Game :: Actions GameModel -> Property
prop_Game actions = propRunActions_ instanceSpec actions

When we test this property, quickCheck will generate random action sequences to be tested, checking at the end of each test that tokens are transferred correctly, and contracts didn’t crash.

Note

There is also a more general function propRunActions that allows the check at the end of each test to be customized.

But what is the instanceSpec in the code above? propRunActions_ creates the contract instances that are needed by a test, and the instanceSpec tells it which contract instances to create. A handle is created for each contract instance, which is used to invoke their endpoints from the test. Different contracts have different endpoints, of different types–and thus different schemas. When we invoke an endpoint, we need to know the schema of the contract we are invoking, and the type of errors it can return, so that the type-checker can ensure that the call is valid. We thus need to know the type of contract that each handle refers to.

To achieve this, every contract instance in a test is named by a ContractInstanceKey, another associated datatype of the ContractModel class; we talk to a contract instance by referring to its ContractInstanceKey. The ContractInstanceKey type is parameterised both on the type of the contract model, and on the observable state, schema, and error type of the contract it refers to. Since the same test may refer to contracts of several different types, ContractInstanceKey is defined as a GADT.

In this particular case, there is only one type of contract under test, and so it suffices to define a ContractInstanceKey type with a single constructor. There is one contract instance running in each emulated wallet, so we simply distinguish contract instance keys by the wallet they are running in:

    data ContractInstanceKey GameModel w schema err where
        WalletKey :: Wallet -> ContractInstanceKey GameModel () G.GameStateMachineSchema G.GameError

Once this type is defined, we can construct our ContractInstanceSpec:

instanceSpec :: [ContractInstanceSpec GameModel]
instanceSpec = [ ContractInstanceSpec (WalletKey w) w G.contract | w <- wallets ]

This specifies (reading from right to left) that we should create one contract instance per wallet, running G.contract, the contract under test, in emulated wallet w, and distinguished by a ContractInstanceKey of the form WalletKey w.

Now we can run tests, although of course they will not yet succeed:

> quickCheck prop_Game
*** Failed! (after 1 test and 1 shrink):
Exception:
  GSM.hs:65:10-32: No instance nor default method for class operation arbitraryAction

The contract modelling library cannot generate test cases, unless we specify how to generate an Action, which we will do next.

Generating actions

To generate actions, we need to be able to generate wallets, guesses, and suitable values of Ada, since these appear as action parameters.

genWallet :: Gen Wallet
genWallet = elements wallets

genGuess :: Gen String
genGuess = elements ["hello", "secret", "hunter2", "*******"]

genValue :: Gen Integer
genValue = getNonNegative <$> arbitrary

We choose wallets from the three available, and we choose passwords from a small set, so that random guesses will often be correct. We choose Ada amounts to be non-negative integers, because negative amounts would be error cases that we choose not to test.

Now we can define a generator for Action, as a method of the ContractModel class:

    arbitraryAction s = oneof $
        [ Lock      <$> genWallet <*> genGuess <*> genValue              ] ++
        [ Guess     <$> genWallet <*> genGuess <*> genGuess <*> genValue ] ++
        [ GiveToken <$> genWallet                                        ]

With this method defined, we can start to generate test cases. Using sample we can see what action sequences look like:

> sample (arbitrary :: Gen (Actions GameModel))
Actions
  [Lock (Wallet 2) "hunter2" 5,
   Guess (Wallet 3) "*******" "hello" 6,
   Guess (Wallet 1) "secret" "*******" 10,
   Guess (Wallet 3) "*******" "*******" 6,
   GiveToken (Wallet 3),
   Guess (Wallet 2) "hunter2" "hunter2" 15]
.
.

We can even run ‘tests’ now, although they don’t do much yet:

> quickCheck prop_Game
+++ OK, passed 100 tests:

Actions (2263 in total):
33.94% Lock
33.89% Guess
32.17% GiveToken

The output tells us the distribution of generated actions, aggregated across all the tests. We can see that each action was generated around one third of the time, which is to be expected since our generator does not weight them at all. Keep an eye on this table as we extend our generation; if any Action disappears altogether, or is generated very rarely, then this indicates a problem in our tests.

Modelling expectations

The ultimate purpose of our tests is to check that funds are transferred correctly by each operation–for example, that after a guess, the guesser receives the requested Ada only if the guess was correct. An important part of a ContractModel defines how funds are expected to move. However, it’s clear that in order to define how we expect funds to move after a Guess, we need to know more than just where all the Ada are. We need to know:

  • what the current secret password is, so we can decide whether or not the guess is correct.

  • whether or not the guesser currently holds the game token, and so is entitled to make a guess.

  • how much Ada is currently locked in the contract, so we can determine whether the guesser is requesting funds that actually exist.

These all depend on the previous steps in the test case. To keep track of such information, we store it in a contract state, which is the type parameter of the ContractModel class. (Note that this contract state is a part of the model, it may be quite different from the contract state in the implementation). In this case the contract state is the GameModel type, so let’s complete its definition:

data GameModel = GameModel
    { _gameValue     :: Integer
    , _hasToken      :: Maybe Wallet
    , _currentSecret :: String }
  deriving (Show)

makeLenses 'GameModel

Initially the game token does not exist, so we record its current owner as a Maybe Wallet, so that we can represent the initial situation before its creation. The locked funds are always in Ada, so in the model it suffices to store an integer.

Now we can define the initial state of the model at the start of each test case, initialState, and a nextState function to model the way we expect each operation to change the state. These are both methods in the ContractModel class.

The initial state just records that the game token does not exist yet, and assigns default values to the other fields.

    initialState = GameModel
        { _gameValue     = 0
        , _hasToken      = Nothing
        , _currentSecret = ""
        }

The nextState function is defined in the Spec monad

 nextState :: Action state -> Spec state ()

and defines the expected effect of each operation.

The Lock operation creates the contract, initializing the model contract state (using $= and generated Lens operations), mints the game token (using mint), deposits it in the creator’s wallet, and withdraws the Ada locked in the contract (using deposit and withdraw):

    nextState (Lock w secret val) = do
        hasToken      $= Just w
        currentSecret $= secret
        gameValue     $= val
        mint gameTokenVal
        deposit  w gameTokenVal
        withdraw w $ Ada.lovelaceValueOf val

A ContractModel actually tracks not only the contract model state (in our case the GameModel type), but also the quantities of tokens expected to be in each wallet, which are checked at the end of each test. It is these expectations that are manipulated by mint, deposit, etc… don’t confuse them with operations that actually mint or move tokens in the implementation. The ModelState type contains all of this information.

When making a guess, we need to check parts of the contract state (which we read using viewContractState), and then we update the stored password, game value, and wallet contents appropriately. (Here $~ applies a function to modify a field of the contract state).

    nextState (Guess w old new val) = do
        correctGuess <- (old ==)    <$> viewContractState currentSecret
        holdsToken   <- (Just w ==) <$> viewContractState hasToken
        enoughAda    <- (val <=)    <$> viewContractState gameValue
        when (correctGuess && holdsToken && enoughAda) $ do
            currentSecret $= new
            gameValue     $~ subtract val
            deposit w $ Ada.lovelaceValueOf val

GiveToken just transfers the game token from one wallet to another using transfer.

    nextState (GiveToken w) = do
        w0 <- fromJust <$> viewContractState hasToken
        transfer w0 w gameTokenVal
        hasToken $= Just w

At the end of each test, the ContractModel framework checks that every wallet contains the tokens that the model says it should.

We can exercise the nextState function already by generating and ‘running’ tests, even though we have not yet connected these tests to the real contract. Doing so immediately reveals a problem:

> quickCheck prop_Game
*** Failed! (after 3 tests and 3 shrinks):
Exception:
  Maybe.fromJust: Nothing
  CallStack (from HasCallStack):
    error, called at libraries/base/Data/Maybe.hs:148:21 in base:Data.Maybe
    fromJust, called at GSM0.hs:122:15 in main:GSM0
Actions
 [GiveToken (Wallet 1)]

Looking at the last two lines, we see the generated test sequence, and the problem is evident: we generated a test that only gives the game :term:`token` to wallet 1, but this makes no sense because the game token has not yet been minted–so the fromJust in the nextState function fails. We will see how to prevent this in the next section.

Restricting test cases with preconditions

As we just saw, not every sequence of actions makes sense as a test case; we need a way to restrict test cases to be ‘sensible’. Note this is not the same as restricting tests to ‘the happy path’: we want to test unexpected sequences of actions, and indeed, this is part of the strength of property-based testing. But there are some actions–like trying to give the game token to a wallet before it has been minted–that are not even interesting to test. These are the cases that we rule out by defining preconditions for actions; the effect is to prevent such test cases ever being generated.

To introduce preconditions, we add a definition of the precondition method to our ContractModel instance.

 precondition :: ModelState state -> Action state -> Bool

The precondition is parameterised on the entire model state, which includes the contents of wallets as well as our contract state, so we will need to extract this state as well as the fields we need from it. For now, we just restrict GiveToken actions to states in which the token exists:

    precondition s (GiveToken _) = tok /= Nothing
        where
            tok = s ^. contractState . hasToken
    precondition s _             = True

Now if we try to run tests, something more interesting happens:

> quickCheck prop_Game
*** Failed! Assertion failed (after 2 tests):
Actions
 [Lock (Wallet 1) "hello" 0]
Expected funds of W1 to change by
  Value (Map [(f687...,Map [(guess,1)])])
but they did not change
Test failed.
Emulator log:
[INFO] Slot 1: TxnValidate 4feb...
[INFO] Slot 1: 00000000-0000-4000-8000-000000000000 {Contract instance for wallet 1}:
                 Contract instance started
[INFO] Slot 1: 00000000-0000-4000-8000-000000000001 {Contract instance for wallet 2}:
                 Contract instance started
[INFO] Slot 1: 00000000-0000-4000-8000-000000000002 {Contract instance for wallet 3}:
                 Contract instance started

The test has failed, of course. The generated (and simplified) test case only performs one action:

Actions
 [Lock (Wallet 1) "hello" 0]

Wallet 1 attempts to create a game contract guarding zero Ada. Inspecting the error message, we can see that wallet 1 ended up with the wrong contents:

Expected funds of W1 to change by
  Value (Map [(f687...,Map [(guess,1)])])
but they did not change

Our model predicted that wallet 1 would end up containing the game token, but in fact its contents were unchanged.

In this test, we have actually performed actions in the emulator, as the log shows us: one transaction has been validated, and we have started three contract instances (one for each wallet in the test). But we have not created a game token for wallet 1, because thus far we have not defined how actions in a test should be performed–so the Lock action in the test case behaves as a no-op, which of course does not deposit a game token in wallet 1. It is time to link actions in a test to the emulator.

Performing actions

So far we are generating actions, but we have not yet linked them to the contract they are supposed to test–so ‘running’ the tests, as we did above, did not invoke the contract at all. To do so, we must import the emulator

import           Plutus.Trace.Emulator              as Trace

Then we define the perform method of the ContractModel class:

 perform :: HandleFun state -> ModelState state -> Action state -> EmulatorTrace ()

The job of the perform method in this case is just to invoke the contract end-points, using the API defined in the code under test, and transfer the game token from one wallet to another as specified by GiveToken actions.

    perform handle s cmd = case cmd of
        Lock w new val -> do
            callEndpoint @"lock" (handle $ WalletKey w)
                         LockArgs{ lockArgsSecret = secretArg new
                                 , lockArgsValue = Ada.lovelaceValueOf val}
        Guess w old new val -> do
            callEndpoint @"guess" (handle $ WalletKey w)
                GuessArgs{ guessArgsOldSecret = old
                         , guessArgsNewSecret = secretArg new
                         , guessArgsValueTakenOut = Ada.lovelaceValueOf val}
        GiveToken w' -> do
            let w = fromJust (s ^. contractState . hasToken)
            payToWallet w w' gameTokenVal
            return ()

Every call to an end-point must be associated with one of the contract instances defined in our instanceSpec; the handle argument to perform lets us find the contract handle associated with each ContractInstanceKey.

For the most part, it is good practice to keep the perform function simple: a direct relationship between actions in a test case and calls to contract endpoints makes interpreting test failures much easier.

Note

Helping shrinking work better by choosing test case actions well

In the definition of perform above, the GiveToken action is a little surprising: when we call the emulator, we have to specify not only the wallet to give the token to, but also the wallet to take the token from. So why did we choose to define a GiveToken w action to include in test cases, rather than an action PassToken w w', which would correspond more directly to the code in perform?

The answer is that using GiveToken actions instead helps QuickCheck to shrink failing tests more effectively. QuickCheck shrinks test cases by attempting to remove actions from them–essentially replacing an action by a no-op. But consider a sequence such as

PassToken w1 w2
PassToken w2 w3

which transfers the game token in two steps from wallet 1 to wallet 3. Deleting either one of these steps means the game token will end up in the wrong place, probably causing the next steps in the test to behave very differently (and thus, preventing this shrinking step). But given the sequence

GiveToken w2
GiveToken w3

the first GiveToken can be deleted without affecting the behaviour of the second at all. Thus, by making token-passing steps independent of each other, we make it easier for QuickCheck to shrink a failing test without drastic changes to its behaviour.

Shrinking Actions

Before starting to run tests seriously, it is useful to make sure that any failing tests will shrink well to small examples. By default, the contract modelling library tries to shrink tests by removing actions, but it cannot know how to shrink the actions themselves. We can specify this shrinking by defining the shrinkAction operation in the ContractModel class:

 shrinkAction :: ModelState state -> Action state -> [Action state]

This function returns a list of ‘simpler’ actions that should be tried as replacements for the given Action, when QuickCheck is simplifying a failed test. In this case we define a shrinking function for wallets:

shrinkWallet :: Wallet -> [Wallet]
shrinkWallet w = [w' | w' <- wallets, w' < w]

and shrink actions by shrinking the wallet and Ada parameters.

    shrinkAction _s (Lock w secret val) =
        [Lock w' secret val | w' <- shrinkWallet w] ++
        [Lock w secret val' | val' <- shrink val]
    shrinkAction _s (GiveToken w) =
        [GiveToken w' | w' <- shrinkWallet w]
    shrinkAction _s (Guess w old new val) =
        [Guess w' old new val | w' <- shrinkWallet w] ++
        [Guess w old new val' | val' <- shrink val]

We choose not to shrink password/guess parameters, because they are not really significant–one password is as good as another in a failed test.

Debugging the model

At this point, the contract model is complete, and tests are runnable. However, they do not pass, and so we need to adapt either the tests or the contract to resolve the inconsistencies revealed. Testing prop_Game now results in:

> quickCheck prop_Game
*** Failed! Falsified (after 6 tests and 3 shrinks):
Actions
 [Lock (Wallet 1) "hunter2" 0]
Expected funds of W1 to change by
  Value (Map [(f687...,Map [(guess,1)])])
but they did not change
Test failed.
Emulator log:
... 49 lines of emulator log messages ...

In this test, wallet 1 attempts to lock zero Ada, and our model predicts that wallet 1 should receive a game token–but this did not happen. To understand why, we need to study the emulator log. Here are the relevant parts:

...
[INFO] Slot 1: 00000000-0000-4000-8000-000000000000 {Contract instance for wallet 1}:
                 Receive endpoint call: Object (fromList [("tag",String "lock"),...
[INFO] Slot 1: W1: Balancing an unbalanced transaction:
                     Tx:
                       Tx 2542...:
                         {inputs:
                         outputs:
                           - Value (Map []) addressed to
                             ScriptAddress: d1e1...
...
[INFO] Slot 1: W1: TxSubmit: 2542...
[INFO] Slot 2: TxnValidate 2542...
[INFO] Slot 2: W1: Balancing an unbalanced transaction:
                     Tx:
                       Tx 1eba...:
                         {inputs:
                            - 2542...!0
                              Redeemer: <>
                         outputs:
                           - Value (Map []) addressed to
                             ScriptAddress: d1e1...
                         mint: Value (Map [(f687...,Map [(guess,1)])])
...
[INFO] Slot 2: W1: TxSubmit: 2d66...

Here we see the endpoint call to lock being received during slot 1, resulting in a transaction with ID 2542..., which pays zero Ada to the contract script. The transaction is balanced (which has no effect in this case), submitted, and validated by the emulator at slot 2. Then another transaction, 1eba..., is created, which mints the game token. This transaction is in turn balanced (resulting in a new hash, 2d66...), and submitted without error–but although no errors are reported, this transaction is not validated.

Since the transaction is submitted in slot 2, we would expect it to be validated in slot 3. In fact, the problem here is just that the test stopped too early, before the blockchain had validated this second transaction. The solution is just to delay long enough for the blockchain to validate all the transactions we have submitted.

Adding delays to test cases

To give the blockchain time to validate the transactions generated by a Lock call, we need to delay by two slots. Why two? Because the Lock contract endpoint submits two transactions to the blockchain. Likewise, we delay one slot after each of the other actions. (If the delays we insert are too short, we will discover this later via failed tests).

We can cause the emulator to delay a number of slots like this:

delay :: Int -> EmulatorTrace ()
delay n = void $ waitNSlots (fromIntegral n)

We add a call to delay in each branch of perform:

    perform handle s cmd = case cmd of
        Lock w new val -> do
            callEndpoint @"lock" (handle $ WalletKey w)
                         LockArgs{ lockArgsSecret = secretArg new
                                 , lockArgsValue = Ada.lovelaceValueOf val }
            delay 2
        Guess w old new val -> do
            callEndpoint @"guess" (handle $ WalletKey w)
                GuessArgs{ guessArgsOldSecret     = old
                         , guessArgsNewSecret     = secretArg new
                         , guessArgsValueTakenOut = Ada.lovelaceValueOf val }
            delay 1
        GiveToken w' -> do
            let w = fromJust (s ^. contractState . hasToken)
            payToWallet w w' gameTokenVal
            delay 1

This makes the emulator delay one or two slots, but we also need to delay in our model, to keep the model state in sync with the emulator. We do this using corresponding calls to wait in the definition of nextState:

    nextState (Lock w secret val) = do
        hasToken      $= Just w
        currentSecret $= secret
        gameValue     $= val
        mint gameTokenVal
        deposit  w gameTokenVal
        withdraw w $ Ada.lovelaceValueOf val
        wait 2

and similarly in the other cases.

Does this change fix the problem? To find out, we should rerun the same test case, after updating the code.

Rerunning a failed test

The best way to save and rerun a QuickCheck test case is to copy-and-paste it from the QuickCheck output into your code. Since prop_Game is just a function that takes the generated test as an argument, then we can rerun a test by passing it to the property. In this case let us define

testLock :: Property
testLock = prop_Game $ Actions [Lock w1 "hunter2" 0]

testLock is itself a Property, so we can test it using quickCheck. Testing it before adding the delays in the last section generates the same output as before. Testing it after the delays are added results in

> quickCheck testLock
+++ OK, passed 100 tests.

Actions (100 in total):
100% Lock

The test passes, and the problem is fixed.

Note

Since there is no random generation in this test, there is no real need to test it 100 times. This can be avoided by adding withMaxSuccess to the definition:

 testLock :: Property
 testLock = withMaxSuccess 1 . prop_Game $ Actions [Lock w1 "hunter2" 0]

Note

We save the failing test case, not the random seed used to generate it. This is the only way to be sure that we repeat the same test that just failed. Usually, a failed test that QuickCheck reports is the result of both random generation and shrinking, not random generation alone. Reusing the same random seed would usually regenerate a much larger test, which might well fail for a different reason, leading QuickCheck to report a different shrunk failing test. It is then impossible to know for sure whether or not the change just made to the code fixed the problem it was intended to fix–it might just have changed the way failed tests shrink. By rerunning exactly the same test case we can be sure that our change did fix that problem, at least.

Controlling the log-level

When we rerun random tests, they fail for a different reason:

> quickCheck prop_Game
*** Failed! Assertion failed (after 5 tests and 7 shrinks):
Actions
 [Lock (Wallet 1) "hunter2" 0,
  Lock (Wallet 1) "hello" 0]
Outcome of Contract instance for wallet 1:
  False
Failed 'Contract instance stopped with error'
Test failed.
Emulator log:
... 73 lines of emulator log messages ...

Looking at the failing test case,

Actions
 [Lock (Wallet 1) "hunter2" 0,
  Lock (Wallet 1) "hello" 0]

we can see that it does something unexpected: wallet 1 tries to lock twice. Our model allows this, but the error message tells us that the contract instance crashed.

The emulator log output can be rather overwhelming, but we can eliminate the INFO messages by running the test sequence with appropriate options. If we define

import           Control.Monad.Freer.Extras.Log     (LogLevel (..))
propGame' :: LogLevel -> Actions GameModel -> Property
propGame' l s = propRunActionsWithOptions
                    (set minLogLevel l defaultCheckOptions)
                    instanceSpec
                    (\ _ -> pure True)
                    s

then we can re-run the test and see more succinct output:

> quickCheck $ propGame' Warning
*** Failed! Assertion failed (after 7 tests and 4 shrinks):
Actions
 [Lock (Wallet 1) "hello" 0,
  Lock (Wallet 1) "*******" 0]
Outcome of Contract instance for wallet 1:
  False
Failed 'Contract instance stopped with error'
Test failed.
Emulator log:
[WARNING] Slot 4: 00000000-0000-4000-8000-000000000000 {Contract instance for wallet 1}:
                    Contract instance stopped with error: GameSMError (ChooserError "Found 2 outputs, expected 1")

Now we see the problem: an error in the game implementation that stopped the second contract call, because two unspent transaction outputs had been created. These two outputs are the Ada amounts addressed to the contract script that are created by the first transaction of each call to the Lock endpoint. The off-chain contract is not designed to cope with more than one such UTXO; it is now in a broken state. In fact, the Ada now locked in these UTXOs cannot be recovered by the present off-chain code–the only way to recover the money is to revise the contract so that it can accept multiple UTXOs. Arguably, this is a bug in the contract: if any wallet tries to start the game for a second time, the Ada will be lost (until the bug is fixed).

Refining preconditions

We just learned that a second Lock call puts the contract into a broken state. But this is not how the game was intended to be used, so the developer might reasonably respond “you shouldn’t do that”. There could also be other problems in the code that we cannot presently find, because they are masked by the double-lock bug. Since a test case with two Lock calls is easy to generate, then QuickCheck is likely to report this particular problem in almost every subsequent run–unless we explicitly prevent it from doing so.

We can easily avoid this by strengthening the precondition of Lock, so that it can only be performed once per test case. We do so by checking whether any wallet holds the game token:

    precondition s cmd = case cmd of
            Lock _ _ _    -> tok == Nothing
            Guess _ _ _ _ -> True
            GiveToken _   -> tok /= Nothing
        where
            tok = s ^. contractState . hasToken

Now the double-lock test case can no longer be generated. If we save the test case

testDoubleLock :: Property
testDoubleLock = prop_Game $
  Actions
    [Lock w1 "*******" 0,
     Lock w1 "secret" 0]

and try to rerun it, then QuickCheck will not do so:

> quickCheck testDoubleLock
*** Gave up! Passed only 0 tests; 1000 discarded tests.

When a precondition cannot be satisfied, then QuickCheck ‘gives up’ as we see here–the faulty test case was discarded (1000 times).

Rerunning random tests finds another ‘bug’:

> quickCheck $ propGame' Warning
*** Failed! Assertion failed (after 10 tests and 6 shrinks):
Actions
 [Lock (Wallet 2) "hello" 0,
  Guess (Wallet 1) "hello" "secret" 0]
Outcome of Contract instance for wallet 1:
  False
Failed 'Contract instance stopped with error'
Test failed.
Emulator log:
[WARNING] Slot 3: W1: handleTx failed: InsufficientFunds "Total: Value (Map [(,Map [(,100000000)])]) expected: Value (Map [(f687...,Map [(guess,1)])])"
[WARNING] Slot 3: 00000000-0000-4000-8000-000000000000 {Contract instance for wallet 1}:
                    Contract instance stopped with error: GameSMError (SMCContractError (WalletError (InsufficientFunds "Total: Value (Map [(,Map [(,100000000)])]) expected: Value (Map [(f687...,Map [(guess,1)])])")))

In this case, the contract instance in wallet 1 crashes, because the wallet contains ‘insufficient funds’. Reading the last line closely, we see that although the wallet contained 100 million Ada, it lacked the game token, and so making a guess was not allowed.

Arguably, the off-chain code should not have tried to submit the guess transaction without holding the game token, and the contract instance should not have crashed. Or we might take the view that no harm is done, since the transaction is rejected anyway. But the crashing contract does cause tests to fail, which–as before–is likely to prevent us discovering other problems.

We can strengthen the precondition of Guess to prevent this from happening.

    precondition s cmd = case cmd of
            Lock _ _ _    -> tok == Nothing
            Guess w _ _ _ -> tok == Just w
            GiveToken _   -> tok /= Nothing
        where
            tok = s ^. contractState . hasToken

With this change, the tests still fail, and we must study the entire log output to understand why:

> quickCheck $ prop_Game
*** Failed! Assertion failed (after 36 tests and 35 shrinks):
Actions
 [Lock (Wallet 1) "*******" 1,
  GiveToken (Wallet 2),
  Guess (Wallet 2) "*******" "hello" 2,
  Guess (Wallet 2) "*******" "hunter2" 1]
Expected funds of W2 to change by
  Value (Map [(,Map [(,1)]),(f687...,Map [(guess,1)])])
but they changed by
  Value (Map [(f687...,Map [(guess,1)])])
Test failed.
Emulator log:
... 52 lines of log output ...
[INFO] Slot 4: 00000000-0000-4000-8000-000000000001 {Contract instance for wallet 2}:
                 Receive endpoint call: Object (fromList [("tag",String "guess"),...Number 2.0...
... 25 lines of log output ...
[INFO] Slot 5: TxnValidationFail ab0d...: NegativeValue ...
[INFO] Slot 5: 00000000-0000-4000-8000-000000000001 {Contract instance for wallet 2}:
                 Receive endpoint call: Object (fromList [("tag",String "guess"),...Number 1.0...

In this case, we lock one Ada, and then wallet 2 makes two guesses, both with the correct password. The first guess tries to withdraw more Ada than are available, which our model predicts should be a no-op. Recall we defined:

    nextState (Guess w old new val) = do
        correctGuess <- (old ==)    <$> viewContractState currentSecret
        holdsToken   <- (Just w ==) <$> viewContractState hasToken
        enoughAda    <- (val <=)    <$> viewContractState gameValue
        when (correctGuess && holdsToken && enoughAda) $ do
            currentSecret $= new
            gameValue     $~ subtract val
            deposit w $ Ada.lovelaceValueOf val
        wait 1

Our model predicts that the second guess, with the correct password and a withdrawal of only one Ada, ought to succeed. That is why we expected wallet 2 to end up with the game token, and one Ada. However, wallet 2 did not receive the Ada, only the game token. Reading the emulator log reveals why: in slot 4 we called the guess endpoint to withdraw two Ada, which would leave -1 Ada locked by the contract, but the transaction submitted to the blockchain was not validated, and we see the error message NegativeValue. We made the second endpoint call, for the second guess, but nothing more happened. This is because the validation failure did not crash the off-chain contract instance (which would have provoked a test failure after the first guess), it just left it waiting for a result from the blockchain. As a result, the contract instance is hanging, and ignores the second guess.

We can avoid this problem too, by strengthening the precondition further:

    precondition s cmd = case cmd of
            Lock _ _ v    -> tok == Nothing
            Guess w _ _ v -> tok == Just w && v <= val
            GiveToken w   -> tok /= Nothing
        where
            tok = s ^. contractState . hasToken
            val = s ^. contractState . gameValue

Now the tests pass:

> quickCheck . withMaxSuccess 10000 $ prop_Game
+++ OK, passed 10000 tests.

Actions (241234 in total):
87.1324% GiveToken
 9.0854% Guess
 3.7822% Lock

It is good practice to run far more than 100 tests, once tests are passing.

In this section we discovered ways to crash the off-line contract instances, or leave them hanging. We debugged the problems by strengthening preconditions–but of course, the problems are still there. We have just avoided provoking them with our tests, which enabled us to continue testing and find more problems. But unless these problems are corrected, enabling our preconditions to be weakened again, then all we know from our tests is that the contract behaves correctly provided callers obey the preconditions.

Measuring and tuning distributions

Running successful tests displays statistics over the test cases generated. By default, testing a ContractModel just displays the distribution of types of action. Looking at the output above, we can see that the vast majority of actions were GiveToken actions; only 9% were guesses, and fewer than 4% were Lock actions.

It is not a surprise that there were relatively few Lock actions: our precondition guarantees that there can be at most one Lock per test case, and this is intentional, so of course the other actions are much more common. However, we almost certainly don’t want to test GiveToken almost ten times as often as Guess. What is going on?

The problem is this: after a Lock as the first action of a test case, every attempt to generate a GiveToken action will succeed; that is, the precondition of the generated action will be True. But for Guess actions, many randomly generated actions will not satisfy the precondition we ended up with, either because the wallet does not contain the game token, or because the amount to be withdrawn is greater than the amount available.

To achieve a better distribution of tests, we need to redefine the action generator so that Guess actions more often satisfy their precondition. The action generator is itself parameterized on the contract state, so we could guarantee that generated guesses satisfy their preconditions by redefining it as follows:

    arbitraryAction s = oneof $
        [ Lock      <$> genWallet <*> genGuess <*> genValue ] ++
        [ Guess w   <$> genGuess  <*> genGuess <*> choose (0, val)
        | Just w <- [tok] ] ++
        [ GiveToken <$> genWallet ]
        where
            tok = s ^. contractState . hasToken
            val = s ^. contractState . gameValue

With this change, Guess and GiveToken actions become equally frequent:

> quickCheck . withMaxSuccess 1000 $ prop_Game
+++ OK, passed 1000 tests.

Actions (23917 in total):
48.271% GiveToken
47.845% Guess
 3.884% Lock

Custom generators vs preconditions

It may seem like wasted effort to encode the form of valid Guess actions twice, once in the precondition, and then again in the generator. Would it not be sufficient to write the generator to target successful guesses in the first place, and omit the precondition?

The answer is no: it would not. By writing the generator carefully, we can ensure that the generated Guess actions are valid, but as soon as a test fails, and QuickCheck begins to shrink it, then the precondition becomes essential. Without it, QuickCheck might remove a GiveToken action that makes a subsequent Guess valid, and then report that the resulting test (not surprisingly) failed. It is only preconditions that ensure that shrunk test cases make sense.

Thus, the action generator cannot ensure that actions in test cases are valid; it can only skew the distribution of actions towards valid ones. This means there is no need for the action generator to guarantee that the actions it generates are valid; they will in any case have to pass the precondition before they are included in a test case. In fact, it is a little dangerous to define a generator so that only actions satisfying the precondition are generated, because we might later choose to weaken the precondition. If we do so, and forget to change the generator too, then we might end up with less thorough testing than we expect. So rather than generate guesses as we did above, it would be better to define

    arbitraryAction s = oneof $
        [ Lock      <$> genWallet <*> genGuess <*> genValue ] ++
        [ frequency $
          [ (10, Guess w   <$> genGuess  <*> genGuess <*> choose (0, val))
          | Just w <- [tok] ] ++
          [ (1, Guess <$> genWallet <*> genGuess <*> genGuess <*> genValue) ] ] ++
        [ GiveToken <$> genWallet ]
        where
            tok = s ^. contractState . hasToken
            val = s ^. contractState . gameValue

which generates valid guesses most of the time, with the occasional possibly-invalid one. This approach results in test cases with a reasonable balance between guessing and passing the game token, while ensuring that if the preconditions are later changed, then we can still generate every test case we could before.

Instrumenting contract models to gather statistics

It is possible to gather further statistics about the tests we are generating. For example, we might wonder what proportion of Guess actions are correct guesses. We can find out by defining the monitoring method in the ContractModel class:

 monitoring :: (ModelState state, ModelState state) -> Action state -> Property -> Property

This function is called for every Action in a test case, and given the ModelState before and after the Action. Its result is a function that is applied to the property being tested, so it can use any of the QuickCheck functions for analysing test case distribution or adding output to counterexamples.

To create a table showing the proportion of guesses which were right or wrong, we can define monitoring as

    monitoring (s, _) (Guess w guess new v) =
        tabulate "Guesses" [if guess == secret then "Right" else "Wrong"]
        where secret = s ^. contractState . currentSecret
    monitoring _ _ = id

This generates output such as this:

> quickCheck . withMaxSuccess 1000 $ prop_Game
+++ OK, passed 1000 tests.

Actions (23917 in total):
48.271% GiveToken
47.845% Guess
 3.884% Lock

Guesses (11443 in total):
75.417% Wrong
24.583% Right

Around 25% of guesses were correct in this test run, which is not surprising since we chose guesses uniformly from a list of four possibilities (and the precondition for guesses does not depend on the choice). Since correct guesses are probably at least as interesting to test as incorrect ones, a sensible next step would be to modify the guess generator to guess correctly more often–perhaps half the time. We leave this as an exercise for the reader.

It is always good practice to make measurements of the distribution of test cases like this, and then improve test case generation to that the distribution looks reasonable. Otherwise there is a risk of developing a false sense of security, engendered by running many thousands of trivial tests.

Goal-directed testing with dynamic logic

The tests we have developed so far test that ‘nothing bad ever happens’-the funds in a test always end up where the model says that they should. To put it another way, funds are never stolen. But this does not really cover everything we want to test: we also want to know that ‘something good eventually happens’, or at least, ‘something good is always possible’. Concretely, this will often mean testing that the funds in a contract can always be recovered–they cannot end up locked in a contract for ever. And indeed, in the case of the game contract, we would like to check that no matter what has happened previously, the Ada locked by the contract can always be recovered by a player who knows the password.

Here we are really identifying desirable ‘goal states’, namely those in which all the Ada have been recovered from the contract, and aiming to test that a goal state is always reachable. Obviously, random tests are quite unlikely to end in a goal state, so no particular conclusion can be drawn from one that does not. It is also hard to see how QuickCheck might determine automatically whether a goal state is reachable or not. So we test this kind of property by allowing the tester to specify a strategy for reaching a goal state; QuickCheck then tests that this strategy always works.

Introducing the dynamic logic monad

We write this kind of test using ‘dynamic logic’ wrapped in a monad, which just means that we write test case generators that can mix random actions, specified actions, and assertions. These generators are little programs in the DL monad, such as this one:

 unitTest :: DL GameModel ()
 unitTest = do
     action $ Lock w1 "hello" 10
     action $ GiveToken w2
     action $ Guess w2 "hello" "new secret" 3

This DL fragment simply specifies a unit test in terms of the underlying ContractModel we have already seen, using action to include a specific Action in the test. To run such a test, we must specify a QuickCheck property such as

propDL :: DL GameModel () -> Property
propDL dl = forAllDL dl prop_Game

which uses forAllDL to generate a test sequence from the dl provided, and runs it using the same underlying property as before. The execution is checked against the model, so we do not need to add any further assertions to this unit test. This gives us a very convenient way to define unit tests for a contract specified by a ContractModel.

We can run this test as follows:

> quickCheck . withMaxSuccess 1 $ propDL unitTest
+++ OK, passed 1 test.

Actions (3 in total):
33% GiveToken
33% Guess
33% Lock

Quantifiers in dynamic logic

As well as writing unit tests in the DL monad, we can add random generation. For example, if we wanted to generalize the unit test above a little to lock a random amount of Ada in the contract, then we could instead write:

 unitTest :: DL GameModel ()
 unitTest = do
     val <- forAllQ $ chooseQ (1, 20)
     action $ Lock w1 "hello" val
     action $ GiveToken w2
     action $ Guess w2 "hello" "new secret" 3

Here forAllQ lets us generate a random value using chooseQ:

 chooseQ :: (Arbitrary a, Random a, Ord a) => (a, a) -> Quantification a

forAllQ takes a Quantification, which resembles a QuickCheck generator, but with a more limited API to support its use in dynamic logic.

When this is tested, random values in the range 1-20 are locked… and a test fails:

> quickCheck $ propDL unitTest
*** Failed! Falsified (after 3 tests):
BadPrecondition
  [Witness (1 :: Integer),
   Do $ Lock (Wallet 1) "hello" 1,
   Do $ GiveToken (Wallet 2)]
  [Action (Guess (Wallet 2) "hello" "new secret" 3)]
  (GameModel {_gameValue = 1, _hasToken = Just (Wallet 2), _currentSecret = "hello"})

Dynamic logic test cases are a little more complex than the simple action sequences we have seen so far, and they give us a little more information. Every such test contains a list of Action, tagged Do, and witnesses, tagged Witness. The witnesses record the results of random choices made by forAllQ: in this case, the Ada value to be locked was chosen to be 1. The test proceeds by locking the Ada and giving the game token to wallet 2, but the third action we specified–making the guess–cannot be run, because its precondition is False. This is what the BadPrecondition tells us, and the action that could not be performed appears as

[Action (Guess (Wallet 2) "hello" "new secret" 3)]

The last component is the model state at that point: we can see that the gameValue is only 1 Ada, so of course we cannot withdraw 3.

Note

We saw earlier that when tests are generated from a ContractModel, then QuickCheck only generates actions whose precondition is satisfied. On the other hand, when we use dynamic logic to specify an action explicitly like this, then there is no guarantee that its precondition will hold, and so a ‘bad precondition’ error becomes a possibility. The problem here is really that this generalized unit test is inconsistent with our model.

Repeating a dynamic logic test

Once again, we can copy-and-paste the failed testcase into our source code:

badUnitTest :: DLTest GameModel
badUnitTest =
    BadPrecondition
      [Witness (1 :: Integer),
       Do $ Lock w1 "hello" 1,
       Do $ GiveToken w2]
      [Action (Guess w2 "hello" "new secret" 3)]
      (GameModel {_gameValue = 1, _hasToken = Just w2, _currentSecret = "hello"})

We can rerun the test using withDLTest, supplying the DL unitTest from which the test case was generated, as well as the underlying property:

> quickCheck $ withDLTest unitTest prop_Game badUnitTest
*** Failed! Falsified (after 1 test):

(No test case is displayed by QuickCheck because nothing was generated in this case–the test case badUnitTest was supplied explicitly).

If we now correct unitTest, for example by changing the range of val from 1-20 to 3-20, then the saved bad test case passes:

> quickCheck $ withDLTest unitTest prop_Game badUnitTest
+++ OK, passed 100 tests.

as do freshly generated random tests:

> quickCheck $ forAllDL unitTest prop_Game
+++ OK, passed 100 tests.

Actions (300 in total):
33.3% GiveToken
33.3% Guess
33.3% Lock

In this case the saved test ‘passes’ because it no longer matches the modified DL test, so it is not a counterexample to the property we are testing.

Something good is always possible

We saw above how to generate random parameters to actions in dynamic logic tests; what gives them their real power is that we can also include random actions.

Suppose we want to test that no Ada remain locked in the game contract for ever. We could try to specify this with a DL test that requires that no Ada remain locked in the contract after any sequence of actions. We can include a random sequence of actions in a DL test using anyActions_, and we can make assertions about the ModelState using assertModel. Thus we can define

 noLockedFunds :: DL GameModel ()
 noLockedFunds = do
     anyActions_
     assertModel "Locked funds should be zero" $ isZero . lockedValue

to assert that, after any sequence of actions, no funds should remain locked (lockedValue extracts the total value locked in contracts from the ModelState).

Of course, this test fails:

> quickCheck $ forAllDL noLockedFunds prop_Game
*** Failed! Falsified (after 1 test and 2 shrinks):
BadPrecondition
  [Do $ Lock (Wallet 1) "*******" 1]
  [Assert "Locked funds should be zero"]
  (GameModel {_gameValue = 1, _hasToken = Just (Wallet 1), _currentSecret = "*******"})

If all we do is lock one Ada, then obviously the locked funds are not zero. The failed assertion is reported as a BadPrecondition (for the assertion).

The property we wrote above is wrong: what we really intended to say was that after a correct guess that requests all the funds, then no locked funds remain. Let us write a property that says that any wallet can recover the funds by making such a guess. To program our strategy, we will need to read the secret password, and the value remaining in the contract, from the contract model:

 noLockedFunds :: DL GameModel ()
 noLockedFunds = do
     anyActions_
     w      <- forAllQ $ elementsQ wallets
     secret <- viewContractState currentSecret
     val    <- viewContractState gameValue
     action $ Guess w "" secret val
     assertModel "Locked funds should be zero" $ isZero . lockedValue

After a random sequence of actions, we choose a random wallet and construct a correct guess that recovers all the locked Ada to this wallet. But this property also fails!

> quickCheck $ forAllDL noLockedFunds prop_Game
*** Failed! Falsified (after 1 test and 2 shrinks):
BadPrecondition
  [Witness (Wallet 1 :: Wallet)]
  [Action (Guess (Wallet 1) "" "" 0)]
  (GameModel {_gameValue = 0, _hasToken = Nothing, _currentSecret = ""})

Here QuickCheck has chosen the arbitrary sequence of actions to be empty, so the contract has not even been locked–and of course, in that case, a Guess is not possible. To pass the test, our strategy must work in every situation. However, if the contract has not been locked, then there are no locked funds, so the assertion in this property would pass without our doing anything at all. Perhaps we should only make a Guess if there are actually funds to be recovered:

 noLockedFunds :: DL GameModel ()
 noLockedFunds = do
     anyActions_
     w      <- forAllQ $ elementsQ wallets
     secret <- viewContractState currentSecret
     val    <- viewContractState gameValue
     when (val > 0) $ do
         action $ Guess w "" secret val
     assertModel "Locked funds should be zero" $ isZero . lockedValue

This is better, but testing the property still fails:

> quickCheck $ forAllDL noLockedFunds prop_Game
*** Failed! Falsified (after 1 test and 1 shrink):
BadPrecondition
  [Do $ Lock (Wallet 1) "*******" 1,
   Witness (Wallet 2 :: Wallet)]
  [Action (Guess (Wallet 2) "" "*******" 1)]
  (GameModel {_gameValue = 1, _hasToken = Just (Wallet 1), _currentSecret = "*******"})

In this case we locked 1 Ada in the contract, chose wallet 2 to recover the funds, and then tried to make a correct guess–but the precondition for Guess still failed. And this is no surprise: the wallet does not hold the game token. This test case shows that, as part of our strategy for recovering the funds, we also need to give the game token to the wallet that will make the guess.

 noLockedFunds :: DL GameModel ()
 noLockedFunds = do
     anyActions_
     w      <- forAllQ $ elementsQ wallets
     secret <- viewContractState currentSecret
     val    <- viewContractState gameValue
     when (val > 0) $ do
         action $ GiveToken w
         action $ Guess w "" secret val
     assertModel "Locked funds should be zero" $ isZero . lockedValue

Now we expect the tests to pass:

> quickCheck $ forAllDL noLockedFunds prop_Game
*** Failed! Falsified (after 1 test):
BadPrecondition
  [Do $ Lock (Wallet 1) "hello" 5,
   Witness (Wallet 3 :: Wallet),
   Do $ GiveToken (Wallet 3),
   Do $ Guess (Wallet 3) "" "hello" 5]
  [Assert "Locked funds should be zero"]
  (GameModel {_gameValue = 5, _hasToken = Just (Wallet 3), _currentSecret = "hello"})

They do not! We can see from the last line that, in the final state, our model indeed says that there are still 5 Ada locked in the contract. This is the effect of the nextState function in our model, so let us inspect the relevant part of its code:

    nextState (Guess w old new val) = do
        correctGuess <- (old ==)    <$> viewContractState currentSecret
        -- ...

Comparing carefully with the failed test, we see that our strategy is supplying the empty string as the old password (the guess), and the correct password as the new one–so the guess is wrong, and the Ada was not recovered. Swapping the two password arguments to Guess does, at last, make the tests pass.

For this simple contract, recovering the locked funds is easy–but as we have seen, writing a property that says that it is always possible forces us to be precise about our strategy, and reveals anything we might have overlooked.

Monitoring and tuning dynamic logic tests

The dynamic logic test we have developed only uses our recovery strategy if there are locked funds remaining after a random sequence of actions. How often does that happen? Given that tests contain many more guesses than Lock actions, there is a risk that the contract is usually holding no funds before we even consider using our strategy. To find out, we can monitor the contract model during our tests. As in the monitoring method of the ContractModel class, we can use any of the QuickCheck operations for analyzing test cases, but instead of applying the monitoring at every action in a test case, we can monitor at selected points.

In this case, we choose to label test cases that actually invoke our fund recovery strategy:

 noLockedFunds :: DL GameModel ()
 noLockedFunds = do
     anyActions_
     w      <- forAllQ $ elementsQ wallets
     secret <- viewContractState currentSecret
     val    <- viewContractState gameValue
     when (val > 0) $ do
         monitor $ label "Unlocking funds"
         action $ GiveToken w
         action $ Guess w secret "" val
     assertModel "Locked funds should be zero" $ isZero . lockedValue

With the addition of the monitor line, QuickCheck tells us what proportion of our tests actually leave funds to recover:

> quickCheck $ forAllDL noLockedFunds prop_Game
+++ OK, passed 100 tests (31% Unlocking funds).

Actions (5112 in total):
49.24% GiveToken
48.81% Guess
 1.96% Lock

We can see that around 30% of generated tests leave some Ada in the contract for our strategy to recover. This is a bit low–it means that two thirds of our tests do not actually test the strategy. But it is easy to address: we can simply use the dynamic logic to specify the initial Lock action explicitly, and generate larger amounts for the initial funds locked in the game (lines 3-5 below):

noLockedFunds :: DL GameModel ()
noLockedFunds = do
    (w0, funds, pass) <- forAllQ (elementsQ wallets, chooseQ (1, 10000), elementsQ guesses)
    action $ Lock w0 pass funds
    anyActions_
    w      <- forAllQ $ elementsQ wallets
    secret <- viewContractState currentSecret
    val    <- viewContractState gameValue
    when (val > 0) $ do
        monitor $ label "Unlocking funds"
        action $ GiveToken w
        action $ Guess w secret "" val
    assertModel "Locked funds should be zero" $ isZero . lockedValue

With this addition, a much higher proportion of tests actually exercise our recovery strategy:

> quickCheck $ forAllDL noLockedFunds prop_Game
+++ OK, passed 100 tests (74% Unlocking funds).

Actions (5198 in total):
49.75% GiveToken
48.33% Guess
 1.92% Lock

More dynamic logic

Dynamic logic tests are much more expressive than we have seen hitherto. The DL monad is an instance of Alternative, so we can write tests with random control flow, weight choices suitably, and so on. For example, anyActions, which generates a random sequence of actions of expected length n, is defined by

 anyActions :: Int -> DL s ()
 anyActions n = stopping
            <|> weight (1 / fromIntegral n)
            <|> (anyAction >> anyActions n)

This code makes a random choice between three alternatives, expressed using (<|>). The first two alternatives terminate (and return ()), while the last alternative performs a random action followed by another random sequence of actions. The second alternative is weighted by 1/n, so the third is chosen n times as often, resulting in an expected length of n actions. The first alternative is guarded by stopping, which means it will be chosen only if the test case is ‘getting too long’; in this case anyActions will generate an empty sequence. We can exercise fine control over the way test cases are generated, including specifying strategies for bringing a long test case to a close. See the documentation for more details.

Limitations

The tests we have developed here suffer from three main limitations:

  • We test only via the off-chain contract endpoints

  • We test under favourable assumptions on timing

  • We don’t test for information leaks

What are the implications, and how can we address them?

Testing only via contract endpoints

We tested that the guessing game contract behaves as it should, provided transactions are submitted using the off-chain contract code. But what if a malicious actor writes their own off-chain code, submitting transactions that the contract’s own off-chain code cannot generate? Is it possible, for example, to steal the Ada locked by the game, without submitting a guess? Our tests do not cover this.

One way to mitigate this problem is to add additional ‘attack’ endpoints to the contract under test, that carry out a variety of conceivable attacks. Our contract model would then model these attack actions as no-ops, to represent the fact that the attacks fail; our tests would then check that the attacks fail in all circumstances, and with all parameters.

If we think of the guessing game, not as a game, but instead as a contract that protects funds with a password, then we might consider ‘guessing’ with the correct password as a correct withdrawal, and guessing with an incorrect password as an attack. Our model does test that these attacks always fail; this approach can be used in general to test that contracts are robust against anticipated attacks.

Test assumptions on timing

Our tests wait for the transactions generated by an Action to complete before performing the next Action (because we inserted calls to delay into perform, and wait into nextState). In reality, different wallets may perform actions simultaneously. This usually results in two or more transactions that try to spend the same UTXO, leading all but the first to fail.

Endpoint calls that submit several transactions are even more problematic, because such a call may fail part way through, leaving the blockchain in an intermediate, and possibly invalid state. Arguably, contracts should be written to undo the effects of earlier transactions if later ones fail–although, as we saw, the Lock endpoint of the game contract (which is implemented as two transactions) does not do this: a second call to Lock fails after the first transaction, and leaves the blockchain in a state that the contract cannot handle.

It is possible to test this kind of behaviour in our framework, by not delaying before the next action, so that several actions can be started in the same slot. Delays must then be included as an explicit Action in test cases instead. However, modelling becomes considerably harder, because the model must predict which transaction of several simultaneous ones succeeds, and must take into account how many transactions each end-point call results in, and which slot each is expected to be commited in. It isn’t clear that the extra modelling effort is really worthwhile.

Moreover, in reality transactions might be delayed, which means that endpoint calls that generate several transactions might end up being interleaved in unexpected ways. The emulator doesn’t currently simulate this, and so these kinds of tests cannot yet be run.

Testing for information leaks

We have tested that only a guess containing the correct secret can withdraw Ada from the game contract. So to steal the money, an adversary must discover the secret. But recall that an adversary can access everything on the blockchain, and also the contents of transactions.

There are least two serious bugs that the game contract could contain, that would permit an adversary to steal all the money:

  • The secret could be stored in plain text in the contract state, instead of encrypted. The contract state is stored on the blockchain. The adversary could simply read the secret from the blockchain, and then make a correct guess to steal the money.

  • The secret might be stored in encrypted form on the blockchain, but Guess transactions might contain an encrypted guess, rather than a plain text guess (as they do in this implementation). Then the adversary could simply read the encrypted secret from the blockchain, and submit it in a guess transaction to steal the money.

Neither of these bugs would be detected by our tests, nor is it clear how they could be.

This test framework is a powerful tool for testing that contracts behave correctly, when used as intended–but users should be aware of the limitations in this section, and be careful to avoid the pitfalls they expose.

Further Examples

In addition to the model for the Game contract presented in this tutorial, there are also models for the Prism and the Auction example contracts in the plutus-use-cases project.