Programming Languages – Page 12

Five Quines video

A quine is a program that prints out its own source code. I will describe five examples:

Arguably the greatest program ever written:

More info on quines:

Using the final keyword in interface method parameters does nothing

Consider the following Java code:

class FinalInInterface
{
    private static interface WithFinal
    {
        public void run( final int x );
    }

    private static class WithoutFinal implements WithFinal
    {
        public void run( int x )
        {
            x = 4;
            System.out.println( x );
        }
    }

    public static void main( String[] args )
    {
        new WithoutFinal().run( 3 );
    }
}

This code compiles, and when it runs, it prints “4”.

So adding “final” to the x parameter of the run() method in the interface WithFinal has no effect – the implementor of this interface, WithoutFinal is allowed to declare its own run() method without “final”, and modify x as much as they want.

This makes more sense if you realise that in Java every method argument is passed by value. When you pass an object reference, the method and the calling code are talking about the same object, but the reference is passed by value, so if you reassign it inside the method, the original object is unaffected, and the reference to it in the calling code is also unaffected. The semantics are basically identical to passing a pointer in C or C++ – the pointer is copied, but the pointed-to object is the same.

If you declare method parameters final in your interface, people implementing that interface don’t need to declare them final, and can modify them in their implementations. It’s possible that by declaring them final you are trying to communicate to implementors of the interface that they should also declare them final, but as the designer of the interface, it’s really none of your business how the implementor implements it.

There is no “const” in Java, so you have no way of preventing the implementor of your interface from modifying an object that is passed in (by copying a reference to it).

If that makes you sad, join the club.

Bash arrays

Bash arrays are a lot like Bash Associative Arrays, but with numbers as keys.

Here’s a quick reference.

Basics

$ declare -a MYARR  # Create an array
$ MYARR[3]=foo      # Put a value into an array
$ echo ${MYARR[3]}  # Get a value out of an array
foo
$ echo MYARR[3]     # WRONG
MYARR[0]
$ echo $MYARR[3]]   # WRONG
[3]

Creating, adding

$ declare -a MYARR    # Explicitly declare
$ MYARR[3]=foo        # Or this line implicitly makes it an array
$ MYARR[4]=bar        # Can add values one by one

$ declare -a MYARR=(a b c)   # Initialise all at once
$ echo ${MYARR[0]}
a
$ echo ${MYARR[1]}
b
$ echo ${MYARR[2]}
c

$ declare -a MYARR   # Or declare separately
$ MYARR=(a b c)      # Then initialise
$ echo ${MYARR[0]}
a
$ echo ${MYARR[1]}
b
$ echo ${MYARR[2]}
c

$ declare -a MYARR=(a b c)
$ MYARR=("${MYARR[@]}" d)  # Add an element
$ echo ${MYARR[@]}
a b c d
$ declare -a MYARR2=(e f g)
$ MYARR=("${MYARR[@]}" "${MYARR2[@]}")  # Concatenate arrays
$ echo ${MYARR[@]}
a b c d e f g

Keys/Indices

$ declare -a MYARR
$ MYARR[3]=foo
$ echo ${MYARR[0]}  # Unassigned values are empty

$ echo ${MYARR[4]}  # Unassigned values are empty

$ MYARR[seven]=bar     # A text index is treated as 0
$ echo ${MYARR[0]}
bar
$ echo ${MYARR[seven]} # A text index is treated as 0
bar

$ K=3
$ MYARR[$K]=baz      # Variables containing numbers work like numbers
$ echo ${MYARR[$K]}
baz
$ echo ${MYARR[3]}   # Obviously the value is accessible via the actual index
baz
$ K=foo
$ MYARR[$K]=bash     # Variables containing text are treated as 0
$ echo ${MYARR[0]}
bash

Length

$ declare -a MYARR=(a b c)
$ echo ${#MYARR[@]}  # Length of an array
3
$ echo $#MYARR[@]  # WRONG
0MYARR[@]
$ echo ${#MYARR}   # WRONG
1

$ MYARR[7]=x
$ echo ${#MYARR[@]}  # Only existing indices count in the length
4

$ declare -a MYARR=(a bb ccc)
$ echo ${#MYARR[0]}   # Length of an individual element
1
$ echo ${#MYARR[1]}
2
$ echo ${#MYARR[2]}
3

Looping

$ declare -a MYARR=("a 1" b c)
$ # Loop through array values
$ for V in "${MYARR[@]}"; do echo $V; done
a 1
b
c
$ for V in ${MYARR[@]}; do echo $V; done  #WRONG
a
1
b
c

$ echo "${!MYARR[@]}"  # Print all indices - quoted, but quotes removed by echo
0 1 2
$ echo "${MYARR[@]}"   # Print all values - quoted, but quotes removed by echo
a 1 b c

Clearing

$ declare -a MYARR
$ MYARR[3]=x

$ echo ${MYARR[3]}
x
$ unset MYARR
$ declare -a MYARR
$ echo ${MYARR[3]}

Deleting

$ MYARR[2]=foo
$ echo ${MYARR[2]}
foo
$ unset ${MYARR[2]} # WRONG
$ echo ${MYARR[2]}
foo
$ unset MYARR[2]    # To delete from an array, use "unset" with similar syntax to assigning
$ echo ${MYARR[2]}

$ MYARR[3]=quux
$ echo ${MYARR[3]}
quux
$ K=3
$ unset MYARR[$K]   # Can unset using a variable for the key too
$ echo ${MYARR[3]}

$ declare -a MYARR=(a b c d e f)
$ MYARR=("${MYARR[@]:0:3}" "${MYARR[@]:4}")  # Remove element 3, leaving no gap
$ echo ${MYARR[@]}

Cool stuff

$ declare -a MYARR=(a b c d e f g)
$ echo ${MYARR[@]:2:3}              # Extract a sub-array
c d e

$ declare -a MYARR=(a b c d e f g)
$ echo ${MYARR[@]/d/FOO}            # Replace elements that match
a b c FOO e f g

Scope

$ unset MYARR
$ function createmap() { MYARR[5]=bar; }  # Implicit creation puts it in the global scope
$ echo ${MYARR[5]}

$ createmap
$ echo ${MYARR[5]}
bar

$ unset MYARR
$ function createmaplocal() { declare -a MYARR; MYARR[3]=bar; }  # Explicit creation puts it in the local scope
$ echo ${MYARR[3]}

$ createmaplocal
$ echo ${MYARR[3]}

Goodness in programming languages, part 4 – Ownership & Memory

Posts in this series: Syntax, Deployment, Metaprogramming, Ownership

There is often a trade-off between programming language features and how fast (and predictably) the programs run. From web sites that serve millions of visitors to programs running on small devices we need to be able to make our programs run quickly.

One trade-off that is made in many modern programming languages (including Python, Ruby, C#, Java and JVM-based languages) is that the system owns all the memory. This avoids the need for the programmer to think about how long pieces of memory need to live, but it means a lot of memory can hang around a lot longer than it really needs to. In addition, it can mean the CPU has to jump around to lots of different memory locations to find pieces of dynamically-allocated memory in different locations. Where this jumping around causes caches to be invalidated that can really slow things down.

While these garbage collection-based languages have been evolving, C++ has been developing along a different track. C++ allows the programmer to allocate and free up memory manually (as in C), but over time the community of C++ programmers has been developing a new way of thinking about memory, and developing tools in the C++ language to make it easier to work in this way.

Modern C++ code rarely or never uses “delete” or “free” to deallocate memory, but instead defines clearly which object owns each other object. When the owning object is no longer needed, everything it owns can be deleted, immediately freeing their memory. The top-level objects are owned by the current scope, so when the function or block of code we are in ends, the system knows these objects and the ones they own can be deleted. Objects that last for the whole life of the program are owned by the scope of the main function or equivalent.

One advantage of explicit ownership is that the right thing happens automatically when something unexpected happens (e.g. an exception is thrown, or we return early from a function). Because the objects are owned by a scope, as soon as we exit that scope they are automatically deleted, and no memory is “leaked”.

Because ownership is explicit, we can often group owned objects in memory immediately next to the objects that own them. This means we jump around to different memory locations less often, and we have to do less work to find and delete regions of memory. This makes our programs faster.

Here are some things I like:

Modern C++’s clarity about who owns what. By expressing ownership explicitly we make clear our intentions, and avoid memory leaks.
Modern C++’s fast and cache-friendly memory handling. Allocating memory for several objects together reduces time spent looking for space, and means caches are more likely to be used.

In my experience, the most frequent performance problems I have had to solve have really been memory problems. Explicit ownership can reduce unnecessary memory management overhead by taking back the work from the system (the garbage collector) and allowing programmers to be explicit about who owns what.

C++14 “Terse” Templates – an argument against the proposed syntax

Today I attended two excellent talks by Bjarne Stroustrup at the ACCU Conference 2013. The first was an inspiring explanation of the recent C++11 standard, and the second, “C++14 Early thoughts” was an exciting description of some of the features that might go into the next standard.

One of those features, which Bjarne called “Terse” Templates, might be a good idea, but the syntax Bjarne proposed seems like a bad idea to me, because it leaks unwanted names into the namespace containing the function you are writing.

Allow me to explain.

Background – Concepts Lite

I attended another excellent talk before Bjarne’s, called “Concepts Lite-Constraining Templates with Predicates” by Andrew Sutton, introducing “Concepts Lite”, which is an attempt to salvage a manageable language feature from the very large “Concepts” feature that failed to make it into C++11.

My (so far very basic) understanding of Concepts Lite is that it is a way of defining conditions that state whether a template will be expanded for a given type.

So, in C++11 (and C++98), we can declare a (stupid) template function like so:

template<typename ListOfInt>
int first( ListOfInt& list ) { return list.size() > 0 ? list[0] : 0; }

The code in this function template assumes that list has a size method, and an operator[] method. We tried to “suggest” this, by naming our template parameter ListOfInt, but the poor programmer may not realise exactly what we meant.

If we do the wrong thing, and try to use the first function with an int argument:

int i = 3;
first( i );

It goes wrong, because ints don’t have a size method:

In function 'int first(ListOfInt&) [with ListOfInt = int]':
error: subscripted value is neither array nor pointer
error: request for member 'size' in 'list', which is of non-class type 'int'

This error is not too obscure, but in complex cases the errors can be extremely long, and point to problems that appear to be unrelated to the code we are writing.

Really what we want to know is that int is not a ListOfInt.

Concepts Lite give us the ability to define what a ListOfInt means, and only expand the template for types that match that definition.

In our example we would do something like this:

template<typename ListOfInt> requires SizeAndIndex<ListOfInt>()
int first( ListOfInt& list ) { return list.size() > 0 ? list[0] : 0; }

(There is actually a neater syntax, but we’ll do it like this for now because we need the more verbose form later.)

What this means is that this template function will only be expanded for types that satisfy the constraint.

The definition of SizeAndIndex is outside the scope of this article – it allows us to check whether types satisfy some conditions. In this case we assume it checks that the type contains the methods we use.

Now when we do the wrong thing:

int i = 3;
first( i );

We get a simple error message, that properly tells us what’s wrong:

error: no matching call to â€˜first(int list)â€™
note: candidate is â€˜first(ListOfInt& list)â€™
note: where ListOfInt = int
note: template constraints not satisfied
note: â€˜ListOfIntâ€™ is not a/an â€˜SizeAndIndexâ€™ type since
note: â€˜list.size()â€™ is not valid syntax

(The above is fiction, but Andrew assures us he gets real errors like this with his prototype.)

So Concepts Lite gives us the optional ability to check that our template parameters are what we expected them to be, giving a decent error message, instead of waiting for something to fail much later when we compile the instantiated template.

So far so utterly cool. (And, in my ill-informed opinion, the only bit of Concepts I really wanted anyway.)

There’s more information on this feature here: Concepts Lite: Constraining Templates with Predicates and here: Concepts-Lite.

Constraints on multiple types

The Concepts Lite feature as proposed allows us to specify constraints that describe how multiple types relate to each other, by doing something like this:

template<typename Victim1, typename Victim2> requires Lakosable<Victim1, Victim2>
void lakos( Victim1 a, Victim2 b );

Here the Lakosable constraint can specify conditions that describe how the two types relate to each other, for example that Victim1::value_type is equal to the type of Victim2.

This is very good.

Now, the bit I want to argue against.

“Terse” Templates – the syntax I don’t like

Bjarne gave us an example of the std::merge function, which has lots of arguments, and very complex constraints on them. He showed us that these could all be nicely wrapped into a single Mergeable constraint (similar to the Lakosable constraint above) but he argued that there was still too much repetition. The repetition comes from the fact that several functions in the standard library have the exact same template parameters, with the exact same constraints on them, and that you have to mention the whole list of template parameters twice: once after the template keyword, and once in the requires condition.

This led him to look for a terser syntax.

So, he proposed a modest new construct that looks like this:

using Lakosable{Victim1,Victim2}; // (1)

that allows a radical departure from everything that has gone before in terms of declaring templates. After we’ve made the declaration (1), we can declare the exact function we declared above with this little line:

void lakos( Victim1 a, Victim2 b ); // (2)

The using declaration in (1) makes the names Victim1 and Victim2 available in the current namespace, and gives them special powers that mean functions taking parameters of type Victim1 or Victim2 are automatically function templates, even though the template keyword is nowhere to be seen.

There was some resistance in the room to this proposal. Most of it focussed on (2), and the fact that templates were being declared without it being visible because of the lack of the template keyword.

I’m actually ok with (2). In fact, my ficticious programming language Pepper (which represents everything I think is Right in programming languages) provides a feature very much like this – all non-definite parameter types act as “implicit” templates in Pepper (see “implicit_templates.pepper” on the Examples page).

Bjarne made a reasonable defence of (2), arguing that we often want new features to be “signposted” by new keywords (he cited user-defined types as an example – apparently some people wanted to require “class MyClass” instead of just “MyClass” every time we referred to a user-defined type) but later when they are familiar we want less verbose syntax. (Presumably the “new” feature he was talking about here is templates.)

My problem is with (1).

As my neighbour in the talk (whose name I missed, sorry) pointed out, what (1) does is dump 2 new names Victim1 and Victim2 in the namespace containing the lakos function template.

No-one wants these names.

In fact, why are we doing any of this?

The sole purpose of the exercise is to constrain the lakos function template. Why is the result putting 2 names into the namespace?

More seriously, in the case of the standard library, these names will go into the std:: namespace, and there could easily be clashes. If the std::merge function uses the name For for one of its template parameters (a Forward_iterator), and std::copy wants to use one with the same name, but with different constraints, it will override the definition of For.

I.e. If we do this:

namespace std {
using Mergeable{For,For2,Out};
// define std::merge
}

// and somewhere else:

namespace std {
using Copyable{For,Out};
// define std::copy
}

then the (useless) value of std::For will be different depending on the order in which we import the header files.

I Think

I think.

Please correct me if I’m wrong.

Conclusion

If I’m right, this all seems bad and Wrong.

What was wrong with:

template<typename Victim1, typename Victim2> requires Lakosable<Victim1, Victim2>
void lakos( Victim1 a, Victim2 b );

anyway?

Category: Programming Languages