Archive for the ‘Tech’ Category

What is a string?

Saturday, August 23rd, 2014

Most programming languages have some wrinkles around unicode and strings*. In my ficticious language Pepper, there are no wrinkles of any kind, and everything is perfect.

*E.g. JavaScript, Java, Haskell, Ruby, Python.

There are several key concepts. The most important are an interface AnyString and the variable** String which is what you should use when you are writing code with strings.

**String is a variable that refers to a type, so you just use it like a type and don’t worry about it.

interface AnyString
{
    def indexable(CodePoint) code_points( implements(AnyString) string )
}

In Pepper an interface can describe what free functions exist as well as what member function a class must have, and here we just require that a code_points function exists that gives us a collection of CodePoint objects that may be indexed (i.e. is random-access).

When your Pepper program starts, the String variable will refer to something that implements this interface, and probably some other interfaces too. Most Pepper programs will use a String that is implemented as an array of bytes representing a string in UTF-8, but the programmer doesn’t need to be aware of that, and in a situation where something different is needed (e.g. where we know lots of non-Latin characters will be used and UTF-16 will be more efficient) String can be set to something different in the configuration settings used by the compiler.

When you want to do something with a string, there will be functions that only rely on the AnyString interface and deal with CodePoints internally, but there will be other overloads that are potentially more efficient, for example there are two versions of the standard print function:

def void print( implements(AnyString) string )
def void print( NativeUtf8String string )

The NativeUtf8String class is implemented as a std::string in the C++ code emitted by the Pepper compiler, and the most efficient way to represent an array of bytes when compiling onto other platforms, so the version of print that uses it can be quite efficient.

Because all these types are known at compile time, the C++ code generated by the Pepper compiler can use the native types directly (and be efficient), even though the programmer is writing code using just the AnyString and String types, meaning their code can be adapted to other platforms by using a different configuration.

The Pepper environment exposes standard-out and standard-in as UTF-8 streams, and takes care of converting to the platform encoding for you (at runtime).

Absolute Truth in programming languages

Friday, August 22nd, 2014

Is enforcing truthfulness the opposite of beauty?

Can 2 + 2 = 5?

Improvements, corrections, further contributions are welcome.

$ cat five.cpp 
#include <iostream>
int operator+( int x, int y ) { return 5; }
int main() {
    std::cout << 2 + 2 << std::endl;
}
$ g++ five.cpp 
five.cpp:2:29: error: ‘int operator+(int, int)’ must have an argument of class or enumerated type
$ python
>>> int.__add__ = lambda y: 5
TypeError: can't set attributes of built-in/extension type 'int'
$ cat five.hs
import Prelude hiding ((+))
x + y = 5
main = print ( 2 + 2 )
$ ghc five.hs && ./five
5
$ cat five.rb
class Fixnum
    def +(y)
        5
    end
end
print 2 + 2
$ ruby five.rb
5
$ mzscheme 
> (define (+ x y) 5)
> (+ 2 2)
5

Best GCC warning flags for compiling C++

Friday, July 18th, 2014

A recent discussion on ACCU-general gave people an opportunity to share the warning flags they like to use with g++.

I thought I’d write down the consensus as I understood it, mainly for my own reference:

-Wredundant-decls
-Wcast-align
-Wmissing-declarations
-Wmissing-include-dirs
-Wswitch-enum
-Wswitch-default
-Wextra
-Wall
-Werror
-Winvalid-pch
-Wredundant-decls
-Wmissing-prototypes
-Wformat=2
-Wmissing-format-attribute
-Wformat-nonliteral

We were advised by Jonathan Wakely that -Weffc++ is not very useful since it is mostly based on the first edition of the book Effective C++, many of whose recommendations were improved in the second edition, and also apparently GCC doesn’t do a great job of warning about them.

What git server should I use?

Tuesday, July 15th, 2014

At work we are considering whether we can use Git for our source control. I am a big fan of Git, so I’d like to see this happen.

We only need to work against a central repository most of the time, so I’m looking at what servers might work for us.

Update: This StackExchange question may help: Self-hosted replacement for Github.

Update: Added software from the StackExchange answers to the list.

Features we will need:

  • User management
  • Repository management
  • Browsing code and diffs via the web
  • Hosted in-house

Features we might want:

  • External user authentication e.g. via LDAP
  • Code review
  • Integrating with an issue tracker

Most of my use of Git so far has been against large servers like GitHub (which I really like) and SourceForge, but recently I set up a test Git server using gitolite and gitweb, which gives me my 4 “needs” above but not my 3 “wants”. It also requires command-line use of git to administer and SSH keys from users, so might not suit our system administrators or all our developers.

So, lazyweb, what server should I recommend?

Here is my research so far:

Free git server software

GitLab – looks a lot like GitHub, and appears to satisfy all 4 of my needs and all 3 of my wants. Might be a bit decentralised (ironically) for our usage e.g. the docs talk about using merge requests for code review whereas I’d expect we’d want a commit-gating style which is what I believe Gerrit provides.

SCM-Manager – looks very corporate. Likely it could satisfy my needs and my wants.

Gerrit + Gitblit – lots of code review features, used by major projects including the Android open source project. Weird that Gerrit doesn’t include a code browser and you have to add something like Gitblit. I think this will give me all 4 of my needs and all 3 of my wants.

Gitolite + gitweb – this is what I am using at the moment, and it works well, satisfying the needs above, but not the wants. Gitolite configuration is done by editing config files and pushing them into a special git repository on the server. Adding users means adding a user’s SSH key to the config repository, so requires tech-savvy users and admins. gitweb is fast and clear. My only complaint is that you don’t seem to be able to control the amount of context you see in a diff (often I want to see the full files).

Gitorious – the software behind a mature public site that some people really like. Doesn’t appear to do LDAP authentication, and may not integrate with issue tracking.

tuleap – project planning, chat, issue tracking, builds, document management, discussion board, news all in one product. Includes Gerrit for code reviews, Jenkins for build management. Supports LDAP and OpenID authentication.

Phabricator – code hosting and review, issue tracking, wiki, alerts, message boards, blogs, Q&A, polls all in one product. Supports LDAP and OAuth authentication.

GitPrep – explicitly a clone of GitHub. Seems to look nice, but a young project and not talked about much on the Internet yet.

GitBucket – explicitly a clone of GitHub. Couldn’t find much information beyond that.

GitList – only a repository viewer, but could possibly be used with gitolite instead of plain gitweb – may have more features such as full-file diffs, but I’m not sure. Looks pretty, and doesn’t have much documentation.

CGit – only a repository viewer, but definitely allows specifying the amount of context in diffs (and if you edit the URL directly you can ask for as much as you want). Very plain interface, and minimal documentation. Claims to be fast.

Cydra – may turn out to be good but no web site at the moment, so probably not mature enough to consider.

Gitosis appears to be a dead project.

Paid (in-house) git server software

Very unlikely that we will pay for anything, but here are the options I have found so far:

Atlassian Stash – one-time payment e.g. $6,000 for 100 users.

GitHub Enterprise – apparently they do in-house installations but I couldn’t find any information. GitHub has an excellent interface and features.

Microsoft Team Foundation Server – a larger system that offers Git integration as a feature.

RhodeCode – 50 users for $199/month, supports LDAP and Active Directory authentication, and code review. Claims to be highly secure.

Renewing self-signed certificate for ejabberd

Tuesday, July 15th, 2014

I run an ejabberd server on an Ubuntu 12.10 box and this week I started getting notified by my IM client that the server’s certificate had expired.

Here’s how I managed to generate a new certificate.

WARNING: this process backs up, deletes and then restores your ejabberd database, so it is probably fairly risky.

# Move any previous backups out of the way
sudo mv /var/backups/ejabberd-* ~/Desktop/

# Move the expired certificate out of the way
sudo mv /etc/ejabberd/ejabberd.pem /etc/ejabberd/ejabberd.pem.old

# Reconfigure the ejabberd package (WARNING: backs up and deletes your database!)
sudo dpkg-reconfigure ejabberd

# Make the database backup file readable
sudo chmod a+rx /var/backups/ejabberd-*/
sudo chmod a+r /var/backups/ejabberd-*/*

# Restore the backup
sudo ejabberdctl restore /var/backups/ejabberd-*/ejabberd-database

If you’re lucky, your server will now be back up with a new self-signed certificate.

In general, the policy of using dpkg-reconfigure to handle creating new self-signed certificate seems to work nicely.