How to Use Nutch From Java, Not From the Command Line

  • 468 words
  • two minutes to read
  • comments
badge

Apache Nutch is an open source framework written in Java. Its purpose is to help us crawl a set of websites (or the entire Internet), fetch the content, and prepare it for indexing by, say, Solr. A pretty useful framework if you ask me, however it is designed to be used only mostly from the command line. You download the archive, unzip it, and run the binary file. It crawls and you get the data. However, I’ve got a project where this crawling had to be embedded into my own Java app. I realized that there is a complete absence of any documentation for that. Hence this blog post. It explains how you can use Nutch from Java, not from the command line.

Stop Pitching, Beg Them!

  • 1066 words
  • four minutes to read
  • comments

You want your startup to be visible on TechCrunch, right? But you don’t have $15-20K per month to bribe a reputable PR firm to get you there? No worries. This blog post will give you a set of simple instructions on how you can get the attention of those tech journalists who are currently busy writing about Musk’s and Zuckerberg’s innovative ideas. They will definitely write about your baby, I promise you. Just do what I say.

Software Project Review Checklist

  • 488 words
  • two minutes to read
  • comments

A few years ago I wrote about the independent technical reviews any software project must regularly go through in order to make sure everything is under control. I even said recently that there is no excuse for not doing them. Moreover, the more we trust programmers, the higher the necessity to review their projects regularly. Here is a short summary of what a report from a reviewer must include.

How to Create a Java Web Framework from Scratch, the Right Object-Oriented Way

  • 957 words
  • four minutes to read
  • comments

How do you design a web application in Java? You install Spring, read the manual, create controllers, create some views, add some annotations, and it works. What would you do if there were no Spring (and no Ruby on Rails in Ruby, and no Symphony in PHP, and no … etc.)? Let’s try to create a web application from scratch, starting from a pure Java SDK and ending with a fully functional web app, covered by unit tests. I recorded a webinar no.42 about it just a few weeks ago, but this article should explain it all in even more detail.

Logging Without a Static Logger

  • 632 words
  • three minutes to read
  • comments

How do you organize logging in your applications? I mean web applications or command line apps, or even mobile apps. I bet you have some global variable or a singleton, known as Logger, which has a few methods like info(), error(), and debug(). You configure it when the app starts, or it configures itself via something like log4j.properties, and logs everything to the console or a file, or even a database. I was doing exactly that, or something very similar, for many years, until I finally realized how wrong this approach was. In one of my recent Ruby applications I did it all differently, and since then I’m much happier than I was before.

How Data Visibility Hurts Maintainability

  • 1421 words
  • 6 minutes to read
  • comments

I’ve been writing so much about object-oriented programming and its pitfalls, claiming that most of the design patterns and “good practices” which we are accustomed to are actually wrong and hurtful, that I totally forgot to explain the bigger picture problem. Someone asked me some time ago in the blog post about “naked” data: What is the problem we are solving and why exactly does maintainability suffer if we don’t encapsulate our data enough? Here is the answer.

Why I Want to Live in Silicon Valley

  • 1768 words
  • 7 minutes to read
  • comments

You remember my blog post about Why I Don’t Want to Live in Silicon Valley, don’t you? Read it first if you haven’t already. The gist of it is that Silicon Valley is a place with a lot of troubles. No one should want to live there, according to that previous post, right? That is what many of my readers concluded, but they were wrong. Despite the problems, the place is definitely unique and there are a lot of reasons why you may want to consider it as a great place to live, for a few years at least, especially if you are in the tech business.

Zache: A Simple Ruby In-Memory Cache

  • 200 words
  • a minute to read
  • comments

A month ago I stumbled upon a problem: I wasn’t able to find a Ruby gem which would do in-memory caching with the capability to expire on timeout. After some quick research I decided to implement my own and called it Zache (as in “zero cache,” since there is no back end). Here is how it works:

How to Deploy Maven Artifacts to CloudRepo via Rultor

  • 653 words
  • three minutes to read
  • comments
badge

In my previous article, I described how to set up a private Maven repository in Amazon S3 and deploy there via Rultor. This is a great solution if you’re familiar with managing Amazon Web Services (AWS), S3, and AWS Identity and Access Management (IAM). However, if you’re not comfortable administering an AWS account and all the related permissions, you may want to store your Apache Maven Artifacts in some cloud based repository manager instead. Here is how you make Rultor deploy your Maven dependencies to CloudRepo. I wrote this blog post together with Chris Shellenbarger, their founder.

My Recipe Against Dependency Hell

  • 651 words
  • three minutes to read
  • comments

Do you specify exact versions of your dependencies? I mean, when your software package depends on another one, do you write down, in your pom.xml, Gruntfile, Gemfile, or what have you, its version as 1.13.5 or just 1.+? I always thought that it was better to use exact version numbers, to avoid the so called dependency hell, and I was not alone. However, very soon I realized that dynamic versions, like 1.+, give more flexibility. Just a few weeks ago I realized that neither approach is right and found myself a hybrid formula. No suprise, I again saw that I wasn’t alone.

10x Paychecks for 10x Programmers

  • 801 words
  • three minutes to read
  • comments

You most definitely have heard about 10x programmers. The gist of this folklore is that some of us coders are very effective (10 or even 100 times more so than others), while the rest are just “normal.” It is definitely not a myth though.

What if the Architect is Wrong?

  • 1063 words
  • four minutes to read
  • comments

You most probably know what I think about the architect role on a software project—it’s that of a dictator who makes all technical decisions and who bears the entire responsibility for the final result. I wrote about it and even gave a talk Who is a Software Architect? at BuildStuff in 2016. However, the obvious question you may ask is: What happens if the architect is wrong? Does it mean the entire project is at risk of failure? And isn’t it better to make the whole team responsible for the result, instead of having one single point of failure?

Hazardous Enthusiasm

  • 710 words
  • three minutes to read
  • comments

On a daily basis I deal with many programmers who contribute to my open source projects, either as volunteers or for money via Zerocracy (and my software projects are all open source). Over the years I have realized that there is a pattern in their behavior, which I need to be scared aware of. I call it “hazardous enthusiasm.” Here are the symptoms.

Speaker Cheat Sheet

  • 2085 words
  • 8 minutes to read
  • comments

I speak at software conferences regularly. Over the last three years I spoke in 30 cities and 10 countries. I recorded almost all of them, you can see them here and on my YouTube channel. My principal rule is that I never give the same speech more than once. Every time it’s a new deck of slides and a new flow of thoughts. Of course, they all dance around the ideas I preach about, like Elegant Objects or rebellion against office slavery. I guess it’s time to share some of my secrets, mostly learned the hard way.

Why I Don't Want to Live in Silicon Valley

  • 1887 words
  • 7 minutes to read
  • comments

Silicon Valley is a great place to be … or maybe not. I’ve spent five years there, from 2011 till 2016. I did enjoy some parts of it, but others were not enjoyable at all. Here is a quick summary of what’s wrong with this territory. I can’t speak about the rest of the United States since, even though I’ve seen some other places, I’ve never lived there for more than a month. Long story short, the territory between San Francisco and San Jose, also known as Silicon Valley, is not the thing you see in the famous TV Series. It is absolutely different…

Unit Testing Anti-Patterns, Full List

  • 1068 words
  • four minutes to read
  • comments

I wrote some time ago about anti-patterns in OOP. Now it’s time to write about unit testing anti-patterns—because they also exist, and there are many. I will try to include every example I know in this list. If you know any others, please add them via a pull request or post a comment below. For each anti-pattern I will try to mention where it was found, if it’s not mine. Keep in mind that if I found it somewhere, that doesn’t necessarily mean it was invented there. If you spot an error, please comment.

sixnines availability badge