armedguy@web:/# Johan Jatko | xr.gs

A realm of thoughts, solutions, and breaking things.

Making Debian CD image bootable via PXE

July 27, 2017      

Okay, this is somewhat of a hack, and certainly not recommended if your Debian CD is very large. Our scenario was a 150MB ISO in our build pipeline that we needed to PXE-boot and automatically install onto servers for automated testing. The ISO is already configured to autoinstall via a preseed configuration, so I will skip that part of the tutorial.

For PXE, we PXE boot iPXE and chainload a kernel+initrd, mostly because we have an web application that dynamically writes iPXE boot scripts depending on different states of the hardware. More on this system might come later.

 

Anyways then, to start off we have an image, lets call it debian.iso, fresh from our build pipeline. On it contains the neccesary files to install a debian system with your custom software as a package.

Now, if you try to PXE boot the ISO via a memdisk or such, you will run in to the “No CDROM found issue”, due to the installer not actually looking for memdisks. With our image being 150MB in total (and can’t enable internet on the installer because of missing drivers etc), we looked into embedding the whole ISO inside the installer initramfs file.

Hack away

Start by unpacking the iso. We don’t like loopback devices so losetup is out of question. 7z works fine for extraction in our case.

7z x debian.iso

Obviously run that in a folder inside /tmp/. The installer kernel/initrd was found in iso/install.386/ on our CD. Location may vary but should be one of the install folders.

We continue with unpacking the initrd file (if not run as root, it will complain about dev/console and dev/null).

zcat initrd.gz | cpio -i

To figure out how fool the installer, we dug through the depths of the debian-installer. A lot of the code can be found in initrd/var/lib/dpkg/info/*.postinst files, as well as initrd/usr/lib/base-installer*.
We found the code for CDROM detection/mounting, and found that it is pretty simple to spoof the CDROM being mounted.

In our version (Wheezy based), the install CDROM would be mounted to /cdrom, and a check was made before to see if /cdrom/.disk/info existed, in which case the installation would think the CDROM is already mounted and proceed.

Start the abuse

So, our first step was to copy the .disk directory from the extracted iso to initrd/cdrom/.disk.
We also copied the dists/ and pool/ folder as they are obviously needed for package installation.

We then packed the initrd with the following command:

find . | cpio -o -H newc | gzip > ../initrd-iso.gz

Fired it off in our test environment and BOOM, we got through almost the whole installation. But when the installer chroot:s to the installation target to install kernel and other packages, it was unable to find any kernels.

Sifting through /var/log/syslog in the installation (accessible through tty4 or via consoles on tty2/tty3 on normal debian installs) gave us the error that a bind mount from /cdrom to /target/media/cdrom had failed. We currently don’t know why this fails (exact command is mount -o bind /cdrom /target/media/cdrom) but it might have to do something with BusyBox:s implementation of mount.

More abuse

Anyways, to get past this we hacked up a quick script that just copied the contents of /cdrom to /target/media/cdrom and placed it as a base-installer routine.

Contents of initrd/usr/lib/base-installer.d/99copy-cdrom:

#!/bin/sh
set -e
. /usr/share/debconf/confmodule

cp -r /cdrom /target/media/cdrom

At the same time we also add a script to remove the cdrom when we are done, 150MB of already installed packages is somewhat of a waste.

Contents of initrd/usr/lib/finish-install.d/11delete-cdrom (a number higher than 20 will cause it not to be ran apparently):

#!/bin/sh
set -e
. /usr/share/debconf/confmodule

rm -rf /target/media/cdrom/*

chmod +x both files and pack the initrd again with

find . | cpio -o -H newc | gzip > ../initrd-iso.gz

and PXE boot the new kernel/initrd.

Behold, the installation runs the whole way and reboots when finished (as long as your preseed is correctly set).

 

Hope you enjoyed this quick hack and that it may help in some way.

Dynamically creating Pytest test fixtures in runtime

July 7, 2017    

(In a hurry? Go to solution)
At work we have started to redesign our test environment, which we run with Python 3 and pytest. The test environment is built for large scale system testing in simulated real life environments.

We have a yaml configuration file that describes resources that will be used in the test environment (they can for example describe 2 server providers, OpenStack and baremetal).

In our current environment whenever we want to use one of the resources, we have a pytest fixture called env that we include in a test case. In env you can find all neccesary resources tucked away in lists and dicts. For example:

def test_case(env):
    print(env.providers['openstack_provider'])

As our tests get more complex it gets harder to understand which resources a test case requires, and because a test case and a corresponding yaml file is separate you need to have good knowledge of which resources are used in a specific test case.

We also have an issue with some resources having to be setup/teardown more often that once per test session. Our current test environment doesn’t allow for that.

 

So we set out to modify our test environment to provide us with better control over our dynamic resources. The idea was to specify test cases the following way:

def test_case(openstack_provider, ubuntu_server1, coreos1):
    pass

Maximum dependency injection with the added benefit that you can easily see which resources needs to be defined in the environment yaml file.
How to achieve this with pytest then?

 

Pytest fixtures and Python decorators

The way pytest works with dependency injection is as mentioned earlier, test fixtures.
A test fixture is, as described on Wikipedia, something used to consistently test some item, device, or piece of software
In pytest are most oftenly defined using Python decorators in this form:

@pytest.fixture(scope='session')
def env(request):
    pass

This is unfortunately a very static way of typing a fixture, and seeing as how the yaml file could contain N amount of resources, we need to dynamically create fixtures. To do so we need to look at how a Python decorator works.
The above Python decorator applied on the function env will be rewritten into the following code by Python (more info on this can be found here):

def env(request):
    pass
env = pytest.fixture(scope='session')(env)

This allows for us to dynamically create new fixtures! Lets try then!

servers = ['server1', 'server2']
def create_server(env, request):
    s = env.servers[request.param]
    s.setup()
    yield s
    s.destroy()

for srv in servers:
    pytest.fixture(scope='session', params=[srv], name=srv)(create_server)

Looks easy enough, right? Well, we are almost there, but not really. When we ran this we realised that only the last fixture (server2 in this case) was available to our test cases.

After some digging in the pytest source code, we found out that pytest sets metadata in a special variable on the specified function. In our case of executing pytest.fixture on the same function twice, we were overwriting the old metadata which made that fixture disappear.

The solution we came up with resembles the pattern for decorators being described in the stackoverflow question linked earlier in this post. We call them function factories (might possibly not be the right name), and they are a handy feature in Python. The pattern also allowed us to inject the resource identifier without passing it via pytest parameters.

servers = ['server1', 'server2']
def create_server_factory(server):
    def create_server(env, request):
        s = env.servers[server]
        s.setup()
        yield s
        s.destroy()
    return create_server

for srv in servers:
    pytest.fixture(scope='session', name=srv)(create_server_factory(srv))

This works better, but the issue with this is that pytest can’t find any of the fixtures now. Why? Because of how pytest scans for fixtures, they need to be part of  certain modules that are scanned by pytest, such asconftest.py files or test case files.
We solve this by injecting the function into the current module:s (conftest.py in our case) internal variable dict. This gives us the final solution below.

The final solution

import sys
servers = ['server1', 'server2']
def create_server_factory(server):
    def create_server(env, request):
        s = env.servers[server]
        s.setup()
        yield s
        s.destroy()
    return create_server

for srv in servers:
    fn = pytest.fixture(scope='session', name=srv)(create_server_factory(srv))
    setattr(sys.modules[__name__],"{}_func".format(srv), fn)

def test_case(server1, server2):
    print(server1, server2)

Some other things to consider

In our old solution, we constructed our environment (reading the yaml and constructing objects) in a fixture called env. This was not possible now as it is too late to introduce new fixtures if a fixture has already started running. We solved this by moving all setup out to the pytest hook pytest_configure which is called much earlier into the process.

Why you shouldn’t trust crawlers

March 8, 2015       

Paywalls are a common concept on digital newspapers and content-rich sites. Often implemented to provide extra revenue where-as ads can’t provide enough, they divide a userbase into paying and non-paying visitors. But how do you allow “rich content”-linking and search-engine crawling when your content is behind a paywall?

Paywall @ The New York Times

A large number of sites are selectively allowing certain services (Google crawler, Facebook Open Graph crawler etc etc) to simply bypass the paywall, leaving their content open and readable. This may not seem much of an issue at first, as allowing Googlebot and others to visit your premium content is neccesary for SEO and social interactions, but the problem comes with developer tools.

Facebook Open Graph Debug Tool

Facebook provides a Debugger Platform for its Open Graph metadata system, available to anyone with a Facebook account to debug any site. It crawls your site using the same bots that are used in the live version (that provides pretty embeds in Facebook posts etc), and shows how it uses & compiles the metadata it finds. It also shows the page-source for the version of the site that the crawler got. All-in-all a very handy tool for optimizing your content for the social media-giant.

Problems arise when people decide to expose all their premium content to the crawlers, instead of just the neccesary metadata. Facebook has a small line in their documentation stating

Additionally, you also do not need to include all the URL’s regular content to our crawler, just a valid HTML document with the appropriate meta tags.

, but during my investigation the majority of the tested sites had decided to expose all the data. This could be due to a possible confusion because of a paragraph above the one mentioned, which states:

If your content requires someone to login or if you restrict access after some amount of free content has been consumed, you will need to enable access for the Facebook Crawler. This access is only used to generate previews, and Facebook will not publicly expose your private content.

Because of this, I decided to reach out to Facebook Security to get the documentation clarified, or possibly a redesign of the debug tools. Their response was that they were going to look over the documentation and clarify it, something that as of today (2015-03-08) hasn’t been done yet.

Google PageSpeed Insights

The PageSpeed Insights tools provide a small snapshot of the website rendered as desktop and mobile. While it isn’t as critical as the Open Graph Debug tools, a few sites still decided to filter and allow the Insights crawler (not in all cases, which seems to mean that Google are using Googlebots for Insights in some cases?), and spill all of its data. This can cause the content to be rendered and readable in the preview.

However I decided to take no action here, as private sites should be tested using their browser plugin instead.

Conclusion

Developers shouldn’t trust crawlers blindly with their premium content, as there seems to be no guarantee on who can access what. I am actively trying to contact those that I have found exposing data, but hopefully people can look over how they handle crawlers and stop giving them more than they what need.

This is not neccesarily an issue on the part of the crawler developers, but they should also help in telling content owners how much data they should give.

PS, don’t forget noarchive!

 

Fixing WordPress to work with CloudFlare’s free SSL

October 13, 2014      

UPDATE 2017-07-08: With WordPress introducing secure cookies, it is no longer possible to put the code in functions.php. Read below.

CloudFlare recently rolled out UniversalSSL to all customers, including free. This allows all CloudFlare customers to have a secure connection between their websites and their visitors (well, not entirely, but lets not go into that now). I personally just installed UniversalSSL on my blog, and everything was fine until I cleared my cache.

No stylesheets were loading.

By default, all webbrowsers block attempts to get HTTP resources over a HTTPS connection. The easiest solution for this is protocol-agnostic URL:s and they work really well.

BUT

WordPress by default does NOT apply protocol-agnostic URL:s to its generated links (such as from wp_head()), which causes issues when using a reverse proxy with SSL such as CloudFlare (this also applies to HAProxy, nginx and others). Because SSL terminates at the reverse-proxy, the actual webserver receives as normal HTTP request. WordPress constructs its generated URLs based on what type of request is coming to the webserver itself, and therefore creates normal http://-prefixed URLs that get blocked when they are loaded by the client that connected via HTTPS.

The internal request between the reverse-proxy and the webserver also causes issues if you wish to set the “Site Url” in your WordPress settings to the https://-prefixed link. Because EVERY request between these two are plain HTTP, WordPress will in turn attempt to redirect the user to https://, and cause a redirect loop.

The solution

To solve this, we use information provided (hopefully!) by the reverse proxy that tells us which protocol was used. If HTTPS was the case, we fool WordPress that the current connection is HTTPS.

The magic lines to do this is as follows:

if(isset($_SERVER["HTTP_X_FORWARDED_PROTO"]) && $_SERVER["HTTP_X_FORWARDED_PROTO"] == "https") {
    $_SERVER["HTTPS"] = "on";
}

Simply put it in your theme’s functions.php file or in any other file that is executed Because WordPress has introduced secure cookies that have a different name than regular cookies, it is no longer possible to put the code in your theme’s functions.php. It must be declared in your wp-config.php.  …before any URLs are printed, and it should generate https://-prefixed urls!

Webradio via Spotify and SHOUTcast

October 10, 2014      

SHOUTcast is one of the most popular software suites for web radios nowdays. But by default (read: without much hassle) the only way to broadcast sounds is via Winamp and its SHOUTcast plugin. However, when ludde from Spotify created his tribute Spotiamp, he included an embedded SHOUTcast server so you could stream Spotify via SHOUTcast to Sonos devices etc.

But this also makes it possible to setup a web radio that streams via Spotify, by using the stream relay function in SHOUTcast DNAS server.
(The reason Spotiamp alone can’t act as a SHOUTcast server is because it doesn’t support more than one connected client)

Step 1

Obviously, install Spotiamp. Then enable its SHOUTcast server on the default address.

Step 2

Install SHOUTcast DNAS from their website. Configure DNAS to your liking, but in your stream config, specify the streamrelayurl/relayurl as your Spotiamp SHOUTcast url.

(Look at your specific DNAS version for the correct way to do this)

 

After this Spotiamp should be feeding the SHOUTcast DNAS server live music and a title.

 

Late Night Ventrilo – Hi Mom

August 7, 2014   

 

Security patches for CoD4 servers

August 5, 2014    

Call of Duty 4 has been a popular game for many years, and its population has remained even though newer Call of Duty titles have been released.

Long time ago the CoD 4 servers I was maintaining were being targeted by hackers that had found a new method to become unbannable on Call of Duty 4 servers.
The exploit was based on the fact that cracked Call of Duty 4 servers never verified user GUID’s and therefore allowed all players, even cracked, to connect. As the server skipped verifying GUIDs towards a master server, anything could be sent to the server, while the server itself was only prepared to get a 32 char hash in the range of 0123456789abcdef.

This caused issues with external admin tools and banning players (which is done by GUID), because they in turn also only expected 0123456789abcdef, and hackers were sending all kinds of russian/hebrew/random characters.

Solution

To solve this the servers had to be patched with a custom routine that validated player GUID’s to their normal format [0-9a-z]{32}, and killed any connecting players not matching that.

Luckily, CoD4 is based on Quake 3 Arena which has an open source nowdays, so I found a pretty worthless function that normally validates if an IP is local or external, and overwrote it with my custom Assembly. =)

In short, it does:

  • Validation of player GUIDs
  • Use a special exception to allow for CoD4 master server listing, great for when you are running cracked servers
  • It also includes Aluigis buffer overflow fix for va()

 

Preview

iw3mp.exe

3.18 MB

 

Finally, my new website

August 3, 2014   

That took a while, but now I have a theme that I am satisfied with. yay

I will try to update it as often as a find something interesting.

RCon module for Battlefield 2

  

A small Node.js program that can be used to contact Battlefield 2 servers.

 

PS. I am sorry that I used a Battlefield 3 image, but the Battlefield 2 ones were crap.